When you want to check a value against a list of stored values, there are several ways to do it.
One way, which is not very performant, is to loop through a slice of values:
needle := "three"
haystack := []string{ "one", "two", "three" }
found := false
for _, v := range haystack {
if needle == v {
found = true
break
}
}
While that may sometimes present itself as the most obvious option, it comes with a time complexity of O(n)
, meaning it scales linearly with the number of values in haystack
. With a slice of ten million values, the worst-case scenario means looping through all of those values before you realize that the needle
you're looking for isn't there.
A more performant way of handling this is to use a map
to achieve O(1)
lookups. By storing your values as keys in a map
, you can simply use Golang's built-in indexing to see if the needle
exists among the keys in the map
. The values, therefore, don't matter - we just need to know if the key exists.
Until recently, this is how I would have done that:
needle := "three"
var haybool = map[string]bool{
"one": false,
"two": false,
"three": false,
}
_, found := haybool[needle]
I used false
because I thought a bool
would be among the most memory-efficient ways to populate a map
.
That changed when I had a reason to dig into the actual memory allocation sizes of Golang data types. On the surface, some data type sizes are obvious, e.g. int32
is 32 bytes. Some others, like a map
, are less obvious.
If we crack open the Golang source code, we can see how primitive, fixed-length data types are allocated memory:
var basicSizes = [...]byte{
Bool: 1, // a bool is one byte
Int8: 1,
Int16: 2,
Int32: 4,
Int64: 8,
Uint8: 1,
Uint16: 2,
Uint32: 4,
Uint64: 8,
Float32: 4,
Float64: 8,
Complex64: 8,
Complex128: 16,
}
Just below that, we can see that an empty Array
and an empty Struct
each get zero-byte allocations:
func (s *StdSizes) Sizeof(T Type) int64 {
switch t := under(T).(type) {
// ...
case *Array:
n := t.len
if n <= 0 {
return 0 // an empty Array gets zero bytes
}
// ...
case *Struct:
n := t.NumFields()
if n == 0 {
return 0 // an empty Struct gets zero bytes
}
// ...
}
}
That's interesting.
In theory, if I change the values type to a bool
, I should be able to save some memory, especially as this map
gets bigger:
needle := "three"
var haystruct = map[string]struct{}{
"one": {},
"two": {},
"three": {},
}
_, found := haystruct[needle]
I had to do some digging to see how the underlying complexity of a map
impacts memory, given that it's dynamically allocated. I ended up going with the approach listed in this SO answer, which I'll admit is disputed. Still, it made sense from a logical perspective.
The results were as expected - the empty struct ended up saving some memory, albeit not much:
Size of haybool 2000
Size of haystruct 1920
You can see the code in The Go Playground.
High five to Ashutosh Narkar for commenting on my PR, which prompted me to dive down this rabbit hole.