Comparing Go and Java, Part 2 – Performance

By collin on 09 17 2012

Author: @collinvandyck

In Part 1, we looked at the code of two web services that implement an authentication web service. One written in Java, and one written in Go.  It’s time to beat up on them a little bit.

Testing Methodology

For generating load, I decided to use ApacheBench in a number of different scenarios:

  • 1,000,000 requests, concurrency = 1
  • 1,000,000 requests, concurrency = 2
  • 1,000,000 requests, concurrency = 5
  • 1,000,000 requests, concurrency = 20
  • 1,000,000 requests, concurrency = 50

Each request hits the /authenticate endpoint with the same credentials. Before each test suite is run, the service under test is send 10,000 consecutive queries to allow it to warm up.

Hardware

The service under test and the load generator run on separate but identical machines:

  • 48GB RAM
  • 12-core Intel Xeon 3GHz CPU with HT
  • Ubuntu 10.04.3 LTS

The machines are connected via a 10Gb link (was not fully saturated during any of the tests).

Services Under Test

We’ll test four different service configurations:

  1. Java 1.7 (8GB max heap) service configured with a Dropwizard default max HTTP threadpool of 254
  2. Go 1.0.2 service (GOMAXPROCS=1)
  3. Go 1.0.2 service (GOMAXPROCS=2)
  4. Go 1.0.2 service (GOMAXPROCS=12)
  5. Go 1.0.2 service (GOMAXPROCS=24)

Go by default will only utilize one CPU unless you specify a different value for GOMAXPROCS.  Most of the time, this is actually not a huge deal, as goroutines will yield control to the scheduler when performing IO operations, using a select statement, sending on a channel, or explicitly yielding using runtime.Gosched().  Since the JVM runtime will automatically distribute thread workload over the available CPUs, it’s reasonable to give the Go service the same capability.

To make this comparison fair, we must set the Java service’s threadpool to at least the level of concurrency so that requests do not start queueing up. The Go http dispatcher creates a goroutine for each incoming request, and due to the yielding nature of goroutines, this allows us to achieve concurrency with any number of CPUs being utilized.

Results

For each test, we measured average latency and throughput.  Minimum latency was not very interesting, as it was < 1 ms across all test scenarios.

When all requests are serial (C=1) the average latency is very low on both services (about 1ms). It gets interesting, of course, when you start increasing the number of concurrent requests. It’s here that we can see that the default Go configuration to use one processor starts to introduce a lot of latency as C approaches 50.

In this graph, you can also see the marginal benefit of HyperThreading in the average latencies between Go (MP=12) and Go (M=24).


Wrapping It Up

While performance is not everything, it’s usually something. The Java service has the upper hand in terms of latency and throughput for highly concurrent workloads for this particular service. Go is still relatively young, too, and I think we can expect to see incremental improvement out of the compiler as well as the runtime/GC; it offers a neat way of modeling concurrency which in my opinion has a lot of promise.