Go vs. Python (I/O)

Posted on
Python Go


At work we have a service-oriented architecture. We run microservices deployed via EC2 Container Service, which implies that our applications are built and deployed on Docker.

To keep all our services on the same “platform-level” runtime, we develop and maintain our own Docker base image, which our applications (scaffolded via a service template) are then layered on top of.

For example, our base image (which is itself layered on top of phusion/baseimage-docker) contains:

Application What it does
logrotate log file rotation
uwsgi WSGI application server
nginx web server
syslog-ng centralized log configuration (JSON formatting, log files)

Our base image didn’t always have all of the above components in working order, and it was only recently that we got everything truly working. And for various reasons (some of them organizational) currently only three of our services are on the latest base image version - and we have many more services than three.

This means that the overwhelming majority of our services

  • can only run as many instances as we have EC2s in our cluster
  • probably have broken log rotation
  • probably don’t all emit logs to the same place
  • probably don’t all emit logs for some of the applications mentioned above

Thus, I embarked on a quest to upgrade all of our services to our latest base image.

Finding services to upgrade

Before I started, I needed a list of which services needed to be upgraded. Generating this list is trivial, since each service exposes an HTTP endpoint /metadata that returns details about its runtime properties. One of these properties is the base image version.

I wrote a simple application metadata to query all our services, aggregate the results, and dump them to standard output, so I could further pipe it into a tool like jq.

Here’s how I found which services to upgrade.

🤔 ~/c/a/dev-ops-tools (f533c0c)|master✓
± [n]: metadata | jq \
  '[.[] | {service_name: .service_name, baseimage: .root_image}]
  | map(select(.baseimage != null))
  | map(select(.baseimage | contains("baseimage")))
  | sort_by(.baseimage)'

Here’s how I wrote metadata.


Here’s the Python version. Note that to get the service hosts, we use a static JSON file (which is generally used as our service discovery mechanism). This version contains 99 distinct service entries.

#! /usr/bin/env python
import os
import json
import sys
import click
import requests
import service_directory
from multiprocessing.dummy import Pool

pool = Pool(10)

def get_metadata(base_url):
        response = requests.get(base_url + '/metadata')
        return None
        return response.json()

@click.argument('services', nargs=-1)
@click.option('--stack', type=click.Choice(['prod', 'staging']), default='prod')
def cli(services, stack):
    """Dump a JSON list of metadata for a subset of services using the /metadata endpoint.

    Basic usage:

    # Prints metadata for a few services in prod
    $ metadata service-a service-b service-c

    # Prints metadata for all services in prod
    $ metadata

    # Prints metadata for all services in staging
    $ metadata --stack staging

    # Pipe the output to jq for additional filtering.
    $ metadata | jq \
          '[.[] | {service_name: .service_name, baseimage: .root_image}]
          | map(select(.baseimage != null))
          | map(select(.baseimage | contains("baseimage")))
          | sort_by(.baseimage)'

    :param services: space-delimited set of services, if None is given, we use all services from the directory
    :param stack:    staging or prod, prod is the default
    all_services = service_directory.all()
    if not services:
        services = all_services.keys()
        for service in services:
            if service in all_services:
            print('Unknown service: {}'.format(service))
    base_urls = [service_directory.url(service, stack) for service in services]
    metadata = pool.map(get_metadata, base_urls)

if __name__ == "__main__":

It’s fairly self-explanatory, except for the multiprocessing.dummy import. Don’t be fooled; this isn’t a process pool, it’s actually a thread pool. The import is from multiprocessing.dummy because the module entities use the nicer multiprocessing.pool APIs. Here’s a nice write-up of some of those APIs.

Here’s the Go version (with all the weird but conventional formatting and terse variable naming).

package main

import (

var sem chan bool = make(chan bool, 10)
var res chan metadata = make(chan metadata)

type metadata struct {
	ServiceName  string `json:"service_name"`
	Stack        string
	DeployedBy   string `json:"deployed_by"`
	GitSha       string `json:"git_sha"`
	GitCommitUrl string `json:"git_commit_url"`
	BuildUrl     string `json:"build_url"`
	BuildTime    string `json:"build_time"`
	RootImage    string `json:"root_image"`
	Python       string `json:"python_version"`
	Security     map[string]string
	Packages     map[string]string `json:"installed_packages"`

type serviceDirEntry struct {
	Staging string
	Prod    string

type serviceConf struct {
	ServiceName string
	Url         string

type serviceDir map[string]serviceDirEntry

func main() {
	var wg sync.WaitGroup

	scs := serviceConfs()
	for _, sc := range scs {
		go getMetadata(sc, &wg)

	go func() {

	allMetadata := make([]metadata, len(scs))

	for m := range res {
		allMetadata = append(allMetadata, m)

	out, _ := json.Marshal(allMetadata)

func serviceConfs() (scs []*serviceConf) {
	contents, err := ioutil.ReadFile("./arivale_service_directory/services.json")
	if err != nil {
		log.Fatalf("error reading file: %v", err)

	var dir serviceDir
	json.Unmarshal(contents, &dir)

	for serviceName, info := range dir {
		scs = append(scs, &serviceConf{serviceName, info.Prod})


func getMetadata(sc *serviceConf, wg *sync.WaitGroup) {
	sem <- true
	resp, err := http.Get(fmt.Sprintf("%s/metadata", sc.Url))

	if err != nil {
		fmt.Printf("error %v from host %s\n", err, sc.Url)

	defer func() {

	m := metadata{}
	res <- m

This deserves a bit more explanation.

First, there isn’t really a concept of threads in Go; instead one has “goroutines”, which are similar to threads, but certainly not the same. So we make our own worker pool of size 10 using a Go channel (a concurrency-safe primitive used for communication between goroutines) as a counting semaphore. It’s similar to a thread pool in that it prevents unbounded concurrency - goroutines acquire permission to proceed by pushing onto a fixed size channel, and release their permit by pulling from the same channel when they’re finished. Note that we still create much more than 10 goroutines, but goroutines are cheap (“it’s practical to create hundreds or thousands in a single program”), so this isn’t an issue.

The sync.WaitGroup object is a synchronization primitive that the main goroutine uses to ensure all goroutines have run to completion via the call wg.Wait() before proceeding. Given this, you might be wondering why it’s run in a separate goroutine! The reason is to avoid a race condition: since channel communication blocks, no getMetadata goroutine will ever complete, since the call to res <- m needs the main goroutine to read it via range. Thus wg.Done() is waiting for the goroutines to complete, but the goroutines each need it to return to complete.

We also can’t simply range over res before the wg.Wait(), since the channel will never close - and the only time to properly close the channel is after all the goroutines have finished sending to it, and hence only after wg.Done() returns - so in this case range will block forever.

To illustrate the latter mistake, in simple programs an uncloseable range should produce output like

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
	/tmp/sandbox753033996/main.go:19 +0x120

(Try it in the playground.)

Once the goroutines have completed, wg.Done() returns and closes the res channel, thus allowing range to finish and the program to complete.

This might seem a bit complex, but it’s a pretty standard technique in simple Go programs.


After a few trial runs:

🤔 ~/c/a/dev-ops-tools (9a64497)|master✓
± [n]: time bin/metadata
# ...tons of output
bin/metadata  2.58s user 0.54s system 65% cpu 4.795 total
🤔 ~/c/a/dev-ops-tools (9a64497)|master✓
± [n]: time go run whichservices.go
# ...tons of output
go run whichservices.go  0.64s user 0.25s system 42% cpu 2.103 total

There’s a 56% decrease in run time from Python to Go.

Of course, that’s not the full story; maintainability, cognitive overhead, ease of use, scripting speed all count. In these respects, for a simple task like this, Python mostly wins, as there’s certainly less code, and that code is straightforward.

But if I really cared about speed or performance, or was querying hundreds of services, or obviously, if Go were the standard on my team, I’d definitely be using it instead. It didn’t actually take me much longer to write the Go version, and it’ll scale much better in terms of speed.