Ben Lynn's Online Garbage: Crit-bit tree micro-optimizations

Long ago, I implemented crit-bit trees (one of the many names of this data structure) after skimming a few online descriptions. I made obvious choices and put little effort into optimization. It worked well enough for my projects.

Earlier this year, I studied Adam Langley’s crit-bit tree library, and was inspired to write a new crit-bit tree library from scratch. Micro-optimizations are fun! And surely my rebooted library would be faster thanks to ingenious techniques such as tagged pointers and fancy bit-twiddling (notably a SWAR hack to find the critical bit).

On my machine, the benchmarks show improvement in space and time. Iteration is much slower since I opted to forgo linked-list pointers as advocated by Bernstein. Without them, we must travel up and down the tree to go from one leaf node to the next.

However, an application wishing to visit every element should not do this: it should instead call the allprefixed routine with a blank prefix. The difference between allprefixed and the my original library’s iterate is small enough that I’m inclined to agree with Bernstein: we’re better off without auxiliary next and previous pointers.

When I modified the benchmark program to measure the classes of C++, I changed char* to string, and an array to a vector which are natural choices for C++ and should not significantly hurt running times. As theory suggests, map insertion and lookup is slower, while unordered_map is much faster; as long as the drawbacks of the latter are borne in mind, it may be a reasonable choice for certain applications.

I was pleased my rewritten library sometimes seemed a touch faster than the library that inspired it ("critbit" in the tables below), especially since my library implements a map and not just a set. The extra optimizations it performs are listed in comments in the C file. My guess is that most of them do almost nothing, and "unrolling" the child pointer lookup is largely responsible for the gains.

I’m naming my code the BLT library for now. I already used up my first choice in my old "cbt" library. Though immodest, "Ben Lynn Trees" is easy for me to remember. Also, the name coincides with a delicious sandwich.

In the following tables, all entries are in seconds, except for the overhead which is measured in bytes. I used tcmalloc to get the best numbers.

**Table 1.** Keys: `/usr/share/dict/words`
	BLT	cbt	critbit	map	unordered_map
insert	0.062939	0.085081	0.070779	0.086181	0.050796
get	0.044096	0.046323	0.053218	0.077201	0.034249
allprefixed	0.013312		0.015073
iterate	0.031755	0.006420		0.006167	0.004334
delete	0.048093	0.050687	0.051923	0.009572	0.011820
overhead	3966824	6346992

**Table 2.** Keys: output of `seq 2000000`
	BLT	cbt	critbit	map	unordered_map
insert	2.675797	3.048633	2.683036	3.175450	1.457466
get	2.328416	2.477317	2.510085	2.913757	0.919519
allprefixed	0.296389		0.303059
iterate	0.917041	0.171143		0.193081	0.131707
delete	2.352568	2.522654	2.250305	0.299835	0.316262
overhead	79999984	128000048

Try BLT now!

https://github.com/blynn/blt

Ben Lynn's Online Garbage

Sunday, November 3, 2013

Crit-bit tree micro-optimizations

No comments:

About Me

Labels