Better Memory Management in Golang for Lookups Against a Static Data Set

Better Memory Management in Golang for Lookups Against a Static Data Set

When you want to check a value against a list of stored values, there are several ways to do it.

One way, which is not very performant, is to loop through a slice of values:

needle := "three"
haystack := []string{ "one", "two", "three" }
found := false
for _, v := range haystack {
  if needle == v {
    found = true
    break
  }
}

While that may sometimes present itself as the most obvious option, it comes with a time complexity of O(n), meaning it scales linearly with the number of values in haystack. With a slice of ten million values, the worst-case scenario means looping through all of those values before you realize that the needle you're looking for isn't there.

A more performant way of handling this is to use a map to achieve O(1) lookups. By storing your values as keys in a map, you can simply use Golang's built-in indexing to see if the needle exists among the keys in the map. The values, therefore, don't matter - we just need to know if the key exists.

Until recently, this is how I would have done that:

needle := "three"
var haybool = map[string]bool{
    "one": false,
    "two": false,
  "three": false,
}
_, found := haybool[needle]

I used false because I thought a bool would be among the most memory-efficient ways to populate a map.

That changed when I had a reason to dig into the actual memory allocation sizes of Golang data types. On the surface, some data type sizes are obvious, e.g. int32 is 32 bytes. Some others, like a map, are less obvious.

If we crack open the Golang source code, we can see how primitive, fixed-length data types are allocated memory:

var basicSizes = [...]byte{
    Bool:       1, // a bool is one byte
    Int8:       1,
    Int16:      2,
    Int32:      4,
    Int64:      8,
    Uint8:      1,
    Uint16:     2,
    Uint32:     4,
    Uint64:     8,
    Float32:    4,
    Float64:    8,
    Complex64:  8,
    Complex128: 16,
}

Just below that, we can see that an empty Array and an empty Struct each get zero-byte allocations:

func (s *StdSizes) Sizeof(T Type) int64 {
    switch t := under(T).(type) {
  // ...
  case *Array:
        n := t.len
        if n <= 0 {
            return 0 // an empty Array gets zero bytes
        }
    // ...
  case *Struct:
        n := t.NumFields()
        if n == 0 {
            return 0 // an empty Struct gets zero bytes
        }
    // ...
  }
}

That's interesting.

In theory, if I change the values type to a bool, I should be able to save some memory, especially as this map gets bigger:

needle := "three"
var haystruct = map[string]struct{}{
    "one": {},
    "two": {},
  "three": {},
}
_, found := haystruct[needle]

I had to do some digging to see how the underlying complexity of a map impacts memory, given that it's dynamically allocated. I ended up going with the approach listed in this SO answer, which I'll admit is disputed. Still, it made sense from a logical perspective.

The results were as expected - the empty struct ended up saving some memory, albeit not much:

Size of haybool 2000
Size of haystruct 1920

You can see the code in The Go Playground.

High five to Ashutosh Narkar for commenting on my PR, which prompted me to dive down this rabbit hole.