Site icon Teens Toons

Size visualization of Go executables using D3

My goal was to find some clarity into 123MB of inscrutable executable
data. I started without knowing exactly how to achieve that.

I knew about the standard Unix utility nm which can display the size
of individual entries in an excecutable file, and I knew that Go
provides its own half-baked re-implementation (go tool nm). However,
even a small program in Go will contain dozens of entries, and there
were tens of thousands in the particular files I was looking at. So I
needed an overview.

I also knew about tree maps, ever since the movie Jurassic Park
featured the fsn 3D browser in 1993. This visualization represents
sized hierarchical entries—for example files on disk, and in my case
also like entries inside an executable binary—using visual elements
whose size on the screen is proportional to their size in bytes on
disk.

I thus decided to connect the two: visualize Go binaries using tree maps.

Furthermore, reminding myself that I have potentially tens of
thousands of entries to visualize, I decided upfront that it would do
me no good to attempt to represent them all simultaneously on
screen. I thus went looking for zoomable tree maps.

Finally, I already had learned some D3 basics and wanted to learn
more, so I decided I would use D3 for this exercise too.

So I went to search for “zoomable d3 tree map” on my favorite search
engine and discovered that D3 has native supports for tree maps,
provided some input data in a suitable format.

I initially started to tinker with Mike Bostok’s zoomable treemaps
but then quickly ran into some issue where some D3 feature I wanted to
use was not available: Mike’s code uses D3 V3, “modern” D3 runs at V5,
and there were major API changes between V3 and V4. Converting the V3
example to V4 (or even V5) seemed non-trivial. Instead I set out to
find some examples built for V4+. I then discovered this example from
Jahnichen Jacques
, itself inspired from Mike Bostok’s, and finally
this simpler example from Guglielmo Celata inspired from both but
with a simpler implementation.

All these examples worked using D3 hierarchical data sets loaded from CSV
or JSON with a particular schema.

The main thinking exercise was thus to massage the output of nm
into a format suitable to load into D3. The rest of the work was
simpler, to adapt D3 examples I found online into something that
made sense for the type of data I was working with.

After decomposing the path components of each symbol, my program
creates nested Python dictionaries using a simple recursive
function
.

However, the result of this strategy for the path a,b,c is
something like this:

{'children':{
        'a':{'children':{
                'b':{'children':{
                      'c': ...
                      }}
             }}
}}

Whereas the D3 visualization code really wants this:

{'children':[
        {'name':'a', 'children':[
                {'name':'b', 'children':[
                       {'name':'c', ... }
                      ]}
             ]}
]}

For this, I built a separate simplification program that turns the former format into the latter.

The reason why I separated the code into two programs is that the
decomposition of symbols is rather expensive, and once I was satisfied
with the decomposition I wanted the ability to iterate quickly on the
tree transform without having to decompose over and over again.

Additionally, the simplification program collapses (“flattens”)
multiple hierarchy levels with a single child into just one level with a combined name.
For example, the hierarchy a/ → b/ → c/ → x,y becomes a/b/c/ → x,y.

We’ll use the following Go code:

package main

import "fmt"

var x = struct { x [10000]int }{}

func main() {
        fmt.Println("hello world", x)
}

I choose to use a large struct for variable x so that the main
package becomes significant in size compared to the imported
runtime objects.

The program can be compiled as usual:

$ go build hello.go

I then use the following commands:

$ go tool nm -size hello            >hello.symtab
$ python3 tab2pydic.py hello.symtab >hellodic.py
$ python3 simplify.py hellodic.py   >hello.js

The following HTML code is sufficient:

<html>
  <head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
       <link rel="stylesheet" type="text/css" href="treemap.css">
  </head>
  <body>
       <p class="chart" id="chart"></p>
       <script src="js/d3.v4.min.js" type="text/javascript"></script>
       <script src="js/d3-color.v1.min.js"></script>
       <script src="js/d3-interpolate.v1.min.js"></script>
       <script src="js/d3-scale-chromatic.v1.min.js"></script>
       <script src="app3.js" type="text/javascript"></script>
       <script type="text/javascript">
         viewTree("chart", "example-data/hello.js");
       </script>
  </body>
</html>

Which renders as follows (ensure Javascript is enabled):

Although this simple executable file appears to only contain Go
symbols, it actually does contain C/C++ symbols too. However, their
size is negligible and they initially appear as a mere line on the
right side of the tree map. By clicking on that line, you may be able
to zoom into them and obtain this:

The small programm above contains 6 lines of source code and compiles
to a 1.3MB binary. The breakdown of sizes is as follows:

Package

Size

runtime

900K / 71%

main

80K / 6.3%

unicode

77K / 6.1%

reflect

72K / 5.7%

fmt

38K / 3.0%

strconv

31K / 2.5%

sync

10K / 0.8%

internal

9K / 0.7%

syscall

6K / 0.5%

(others)

(remainder)

In addition to code compiled from source, 24K (1.9% of size) are
compiler-generated type equality and hashing functions. These are
accumulated in the box TYPEINFO in the tree map.

The following becomes clear quickly:

  • the Go standard library is not well modularized; importing
    just one function (fmt.Println) pulls in about 300KB of code.

  • even accounting for fmt.Println and its dependencies and the 80K
    of predictable code from the main program, we are still left to
    wonder about 900K (71%) of code from the runtime package.

  • Zooming into there, we see that 450K (35%) are taken by just one
    single entry with name runtime.pclntab. This is more than the
    code for the main program and all the supporting code for
    fmt.Println combined.

We’ll come back to this below.

❦❦❦

Applying the tool on a recent CockroachDB build, one can find the following:

By exploring this visualization, we discover:

  • 71M of total entries;

  • 8.4M (12% of size) in C/C++ entries, including 3M (4.3%) from RocksDB;

  • 62M (88%) in Go entries, including:

wait, what?

We established above, in the simple example, that runtime was
about 900K in size. Now it has become 26M—a 28X increase! What is
going on?

Zooming in, the mystery is revealed: all of the increase went into
that single object runtime.pclntab. The other entries in package
runtime do not appear to differ between programs.

We will come back to this pclntab object later below.

The visualization above was for a pre-release build of CockroachDB
v19.1. For reference, here is the data for CockroachDB v1.0:

This has:

  • 32M of total entries;

  • 2.5M (7.8%) in C/C++ entries, including 1.5M (4.7%) from RocksDB;

  • 30M (92%) in Go entries, including:

Recalling the initial problem statement, CockroachDB increased about
140% in source code size between v1.0 and v19.1. Thanks to the
visualization, we observe that the compiled code that directly
originates from the CockroachDB sources increased from 16M to 32MB,
which is about a 100% increase.

The amount of compiled code thus increases more slowly than the
increase in source code. This is a good thing! And it is expected:
CockroachDB’s sources contain many more comments in v19.1.

Meanwhile, runtime.pclntab increased from 7.9M to 26M—a 330%
increase!

❦❦❦

It is not too well documented however this comment from the Go source
code
suggests its purpose:

// A LineTable is a data structure mapping program counters to line numbers.

The purpose of this data structure is to enable the Go runtime system
to produce descriptive stack traces upon a crash or upon
internal requests via the runtime.GetStack API.

So it seems useful. But why is it so large?

The URL https://golang.org/s/go12symtab hidden in the aforelinked
source file redirects to a document that explains what happened between Go
1.0 and 1.2. To paraphrase:

  • prior to 1.2, the Go linker was emitting a compressed line table,
    and the program would decompress it upon initialization at run-time.

  • in Go 1.2, a decision was made to pre-expand the line table
    in the executable file into its final format suitable for
    direct use at run-time, without an additional decompression step.

In other words, the Go team decided to make executable files larger to save up on initialization time.

Also, looking at the data structure, it appears that its overall size
in compiled binaries is super-linear in the number of functions in
the program, in addition to how large each function is. We will come
back to this below.

The change in design between Go pre-1.2 and go 1.2 is a classic
trade-off between performance (time) and memory (space).

When is this trade-off warranted?

  • if a program is executed often (e.g. microservice, system tools),
    then it is worth accelerating its start-up time at the cost of an
    increase in size.

    If, moreover, the program is small, with relatively few functions,
    (a reasonable assumption for programs executed often, like
    microservices or system tools), the increase in size incurred by
    runtime.pclntab will be negligible and thus have no significant
    impact on usability: file transfers, deployments, orchestration,
    etc.

    In that case, the Go 1.2 design is sound and warranted.

  • if, in contrast, a program is executed rarely (e.g. a system
    service that runs continuously in the background, like, say … a
    database server), then the start-up time can be amortized throughout
    the lifetime of the process. The performance benefit of accelerating
    start-up is then not so clear.

    If, moreover, the program is large, with tens of thousands of
    of functions (a reasonable assumption for complex, feature-driven enterprise
    software like … database servers), the increase in size incurred
    by runtime.pclntab becomes inconveniently significant.

    This makes the Go 1.2 design … not so advantageous.

In the case of CockroachDB, runtime.pclntab could soon exceed the
entirety of code compiled from sources, even with a conservative
assumption of linear growth in compiled code size:

Year

2017

2019

2021 (projected)

2023

Total size

32M

71M

157M

350M

CockroachDB code

16M (50%)

32M (45%)

64M (41%)

128M (37%)

runtime.pclntab

7.3M (23%)

26M (36%)

85M (54%)

281M (80%)

❦❦❦

  • the Go compiler and linker always produce and keep the following
    entries, even when they are only used in functions elided by the
    linker because they are not used:

  • as discussed in my previous article, Go uses memory instead of
    registers to pass arguments and return values across function
    calls. On x86/x86-64, memory-accessing instructions use longer
    machine encodings than register-only instructions.

    In my experience, this is specifically the inflection point for the
    ratio between source code size and compiled code size in a
    monomorphic C-like language: when targeting fixed-length instruction
    encodings and/or a memory-heavy calling convention, the size of the
    compiled code grows larger than the source code (excluding comments
    and whitespace). We can see this with both C on ARMv6 (no Thumb) or
    Go on x86(-64).

    When targeting variable-length instruction encodings and the calling
    convention suitably utilizes the more compact instructions, the size
    of the compiled code becomes smaller than the source code. We can
    see this with C on x86(-64) with a decent compiler, but, as examined
    here, not with Go.

❦❦❦

To understand the internal structure of executable files compiled from
Go, I built a D3 visualization using zoomable tree maps. This
visualization is published on GitHub and has been tested to work with
any executable produced by Go versions between 1.4 and 1.12.

Using this tool, I have analyzed the space usage inside the monolithic
CockroachDB binary, cockroach. I discovered that the majority of
the growth of cockroach over time is concentrated in one object,
runtime.pclntab.

This object is automatically generated by the Go compiler to support
the generation of textual stack traces at run-time, for example in
error reports.

A design choice made in Go 1.2 causes this object to grow
super-linearly in the number of functions in a program, in addition
to the sum of their sizes. Between CockroachDB 1.0 and 19.1,
runtime.pclntab grew by 330% while the total amount of source code
only increased by 140% and compiled code by 100%.

This design choice was intended to lower the start-up time of
programs, and contrasts with the previous design using compressed
tables—which is, incidentally, the industry standard for other C-like
languages, even in contemporary compilers. This performance goal is
not relevant to server software with long-running processes, like
CockroachDB, and its incurred space cost is particularly inconvenient
for large, feature-rich programs.

Exit mobile version