Ben Lynn's Online Garbage: Crit-bit trees yet again

I’ve been pleased with my recent implementation of crit-bit trees based on Adam Langley’s code, but in the back of my mind, I’ve been bothered by Bernstein’s statement that the overhead on top of each string stored need only be two control words: one pointer and one integer.

To obtain relief, I updated my library so that internal nodes only have a single child pointer instead of one pointer each for the left and right child nodes. In a crit-bit tree, an internal node always has two children, so we may as well allocate sibling nodes in one contiguous chunk of memory.

Bernstein suggests storing strings directly in the crit-bit tree, which can be done by storing the left string backwards: it resides to the left of the integer control word. I’ve chosen an alternate scheme which he also mentions: expressing the strings as pointers in a separate storage pool.

Removing one pointer per internal node has several benefits on a 64-bit machine. Firstly, malloc is typically 16-byte aligned, which means that the 24-byte internal nodes of the original version were actually taking 32 bytes. In contrast, now we only need just one pointer and one integer, so we can fit internal nodes within 16 bytes.

Secondly, we no longer have to tag pointers. Instead, 8 bytes hold an unadulterated child pointer, and the other 8 bytes store the crit-bit position as well as one bit that indicates whether the current node is internal or not.

Thirdly, allocating two nodes at once only requires a single malloc() call. Before, we would call malloc() to allocate a new leaf node, and again to allocate a new internal node.

Again, "unrolling" the child pointer lookup produced better benchmarks on my system, namely, instead of p = p->kid + predicate(), we write p = predicate() ? p->kid + 1 : p->kid. This obviated the need for some fun but complex micro-optimizations. While I was at it, I unrolled a few other routines.

Benchmarks:

**Table 1.** Keys: `/usr/share/dict/words`
	old	new
insert	0.066135	0.056116
get	0.043584	0.040845
iterate	0.031851	0.015135
allprefixed	0.013285	0.011924
delete	0.048030	0.041754
overhead	3966824	3173464

**Table 2.** Keys: output of `seq 2000000`
	old	new
insert	2.700949	2.359999
get	2.304070	2.214699
iterate	0.912950	0.472594
allprefixed	0.295322	0.264326
overhead	79999984	63999992
delete	2.339102	2.177294

The insert benchmark of my old library is slightly worse than the one I originally posted because I have since tweaked the code to make it easy to carry out operations such as "insert or replace" or "insert if absent" in a single step.

https://github.com/blynn/blt

Ben Lynn's Online Garbage

Friday, April 18, 2014

Crit-bit trees yet again

2 comments:

About Me

Labels