Wednesday, April 11, 2007
Trees, Hash Tables and Tries
After some thought I've resolved my difficulties, but now I feel many textbooks need to be rewritten. Either that, or I've made a monumental embarrassing mistake.
Let's say you want an associative array that maps strings to something.
I've known for years that hash table lookups take O(1) time. And that binary search trees have O(logn) lookup time. (In both cases I'm assuming certain conditions are satisfied, i.e. there aren't too many hash collisions and that the binary tree is fairly well-balanced.)
More recently, I learned about crit-bit trees, aka radix trees, Patricia trees or Patricia tries. (See D. J. Bernstein's crit-bit tree page for more references. Briefly, crit-bit trees are like binary trees with two major differences. Firstly, they are tries, which means keys are not stored in any non-leaf node and instead, the position of a node on a tree represents the prefix of the keys stored in its descendant leaves. Secondly, no node has only one child: a crit-bit tree is a trie where any node with a single child has been merged with its child.)
What's the cost of a lookup in a radix tree? It sounds like it must be O(logn) because you need a tree of that depth to store n things.
But you only need to touch each bit (or character, in a k-ary tree where k is the size of the alphabet) at most once during a lookup to decide which descendant of a node to traverse. Isn't this the same as determining the hash of a string? To compute it correctly we must involve every bit of the key. (If traversing the tree is cheap enough, then crit-bit trees are faster than hash tables. Furthermore, unlike a hashing operation, only certain bits, the aptly-named critical bits, need be examined in a crit-bit tree.)
Thus the lookup cost of a radix tree must also be O(1). Therein lies a problem. Why is it that a binary tree of the same size has an O(logn) lookup time? We travel along the same number of nodes!
The answer is that usually when computer scientists analyze these data structures, as an approximation, we view computing the hash function as a constant-time operation, and similarly for a comparison in the binary tree. But to refer to n different objects, the keys must have length at least O(logn), which means it actually takes O(logn) time to hash a key, or to perform a key comparison.
So what we've been saying is that it's okay to ignore a factor of logn. This is sloppy and inconsistent. Why can't we ignore the logn steps required to walk down a binary tree as well? Why can we turn a blind eye to one logn yet scrutinize the other?
We should say that hash lookups and crit-bit tree lookups cost O(logn) (or more accurately, O(m) where m is the size of the keys, which must be at least logn), and that binary tree lookups cost O((logn)2).
This inaccuracy caused me to initially overlook that crit-bit trees could match or even outperform hash tables. It's possible that others have suffered from the same affliction: despite being described (twice, independently) in 1968, the courses and textbooks from my time do not mention radix trees at all, and I suspect the teachers and authors responsible may also not have realized how fast these data structures can be.
This is a shame because they have other advantages over hash tables too: crit-bit trees are deterministic with good worst-case times, grow and shrink as needed (they never require an expensive rehashing nor let vast tracts of an array lie fallow), and support successor and predecessor operations.
The good news is that they seem to be gaining prominence. I feel they are mentioned more on the web (and I'm helping this as well!). I've heard evidence that some CS undergraduates these days are at least aware of their existence. The fact that even I know about them now means they can't be that obscure.
Thursday, March 29, 2007
Prisoner Problems
This joke prisoner problem is amusing but impossible, as its rather surreal solution involves English homonyms.
I also won't bother describing the famous prisoner's dilemma from game theory.
Then there's the paradox about the prisoner who's told he'll be executed some time next week and that it will be a surprise. So he reasons thus: "They can't kill me on Friday, because I would know by Thursday night and it would not be a surprise."
"But this implies if they haven't killed me by Wednesday night, then I know I'm dead on Thursday because I can't be killed on Friday. Which means it wouldn't be a surprise. Hence I cannot be executed on Thursday."
"Repeating this argument a few times shows that I cannot be executed on any day next week!" he concludes triumphantly.
But on Tuesday morning he is executed, much to his surprise! What's going on?
And there's the tale about the prisoner who is to choose his method of execution: if the last statement he utters is true then he is executed in some gruesome fashion, and on the other hand, if it is false then he is executed in some different but equally gruesome fashion. (The particular methods of execution change from telling to telling. I just can't think of any right now.) What should he say?
A tougher puzzle involves one-hundred prisoners in a prison that contains one-hundred cells and a single room with a single light bulb that can be turned on or off. Every day a prisoner is chosen at random and placed in the room. At any time, any prisoner can ask for freedom. When he asks, if every prisoner has spent at least one day in the room with the light bulb then everybody is set free. (The warden has been keeping track.) Otherwise everybody is executed.
The prisoners can talk amongst themselves before entering this bizarre jail, but once inside they are kept isolated, and the light bulb becomes their only means of communication. How can they free themselves?
Lastly, there is a similar problem that is the main reason for this post. A friend told me this puzzle a few months ago and I don't want to forget it.
Again, there are one-hundred prisoners and a special room, but this time the room contains one-hundred identical boxes, each containing exactly one slip of paper bearing a prisoner's name. Every prisoner's name is present, but no one knows which box contains which name.
The prisoners are taken to the room one at a time. Once inside they are allowed to open and examine the contents of up to fifty boxes. They must then leave the room as they found it, so the boxes they chose are closed when they are done.
After all prisoners have visited the room, if each prisoner has seen his name on a slip of paper then they are all free to go. Otherwise they are all executed.
As before, they can all examine the room and boxes (but not open them) and devise a strategy together beforehand, but no communication is permitted once the process begins.
Clearly if a prisoner picks fifty boxes at random he has only a 50% chance of finding his own name, and if every prisoner does this then the chance that all of them find their names is 1/2^100. That is, they will almost certainly be executed.
How can they obtain at least a 30% chance of surviving?
Wednesday, March 7, 2007
Shiny Colourful Buttons
Not only is it stylish, but it frees up precious toolbar real estate for more bookmarks. Also I find myself drawn to the pretty little pictures. I visit those enticing links more often nowadays.
One might worry about forgetting which icon does what, but presumably the links on one's toolbar are frequently used, making this unlikely. Anyway, to find out where they go, just click on them! (i.e. the standard action following the thought: "I wonder what this button does...")
The sole drawback is that some webpages don't have icons, or have one that looks identical to another, or looks horrible. To solve this, use the Favicon Picker 2 extension to assign a good icon to that bookmark.
A related idea is to hide text on certain tabs, which can be achieved using the FaviconizeTab extension.
By the way, googling for "favicon picker" finds many of the aforementioned blog posts [1, 2, 3, 4, 5], some containing great screenshots. Not everyone is enthralled by the idea however, and some even want the opposite.
Wednesday, February 21, 2007
A New Beginning
This is also the first new post at this blog's new home at Blogger. It's good to have everything automated: timestamps, formatting, archives, permalinks, labels, etc. but part of me misses writing my own scripts to do all this, and the total control I had over everything.
Converting my old blog entries was much easier than I had anticipated, though it was not entirely smooth. For example, I have to keep source code on a separate server because this site only lets you put up HTML and images. Also, I never kept the time the old entries were posted, only the date, so I had to fabricate them.
In retrospect, I realize I've taken a rather roundabout route to blogging. I would never have gone out of my way to sign up on some website to start writing articles, out of the blue. Instead, a peripheral page on my website intended to help me keep track of changes gradually mutated into this.
I plan to relocating other parts of my site. The PBC Library already lives on a more permanent server. I recently moved xmmspipe to Google Code. This service seems quite nice but I dislike the fact that they only support Subversion: I am now an ardent git fan.
Wednesday, February 14, 2007
Chase the Pig
I was surprised to find that Wikipedia currently lacks an entry for the card game 拱豬 (Gong Zhu). This four-player trick-taking Hearts-like game is often played by my relatives on my mother's side, and I assume it's relatively well-known amongst the Chinese.
John McLeod maintains a page describing various rulesets for Gong Zhu (in particular the variants involving exposing cards sound highly intriguing to me), but none of them exactly match the one I was taught. For posterity I'll record my family's rules here.
For the first deal, the player with the seven of spades leads the first trick. The lead can be any card, not necessarily the seven of spades. In subsequent deals, the player that took the Queen of Spades in the previous deal leads the first trick. This player is sometimes nicknamed "The Pig". As in Hearts, there are no trumps, and the winner of the trick leads the next trick.
Why the seven of spades and not the two? Because another card game popular amongst my aunts and uncles involved building sequences starting with seven, and the first card played in that game had to be a seven. They adopted this rule for consistency.
My family always played a predetermined number of hands (e.g. 4) and the winner was the player(s) with the highest total score, but the more standard convention seems to be that one keeps playing until someone has a total score of -1000 or lower, and then the player(s) with the highest score is the winner. (I'm guessing they did this because when playing for stakes, payoffs occur more frequently with this scheme.)
The values of the cards are as follows:
- Jack of Diamonds, or goat (羊): +100
- Queen of Spades, or pig (豬): -200
- Ten of Clubs: doubles your score, unless you have won no other cards in which case it is worth +50
- Hearts: two to ten are worth minus their pip value, except for four which is worth -10. The Jack is -20, Queen -30, King -40 and Ace -50. The other cards are worth nothing.
I was taught a couple of mnemonics for the Four of Hearts being -10. Firstly, 4 is an unlucky number amongst the Chinese (I was told this is because the word for 4 sounds similar to the verb "to die" in Mandarin, and also in other variants of Chinese), so that's why its penalty value is worse than it should be.
Secondly, the word for 4 and the word for 10 in Mandarin sound similar. If you know a Mandarin speaker, get them to say "44 is 44" to hear for yourself!
If a player wins all the hearts, then the values of the hearts are reversed in sign, that is, they are positive instead of negative, and it can be checked that they are worth 200 points in total. Furthermore, if that same player also wins the pig, the pig is now worth +200 points.
If you win every card with a value, then you receive 100 points for the goat, 200 for the sheep, 200 for all the hearts, and finally your score gets doubled by the Ten of Clubs, giving a total of 1000 points. This is is the best possible score. The worst possible score is -796, when all but the Jack of Diamonds and Two of Hearts is taken.
Naturally, the name of the game refers to the practice of leading low spades in an attempt to flush out the pig: eventually the holder of the Queen of Spades may be forced to play it and take the trick.
Comparison with Hearts
I prefer this game to Hearts. There is something special about every suit in this game, whereas in Hearts, the clubs and diamonds tricks feel like filler. Winning the Jack of Diamonds is always good, and winning the Ten of Clubs sometimes helps, so even if you're not going for all the hearts there is something to do.
I never liked the Hearts rule preventing players from "breaking into" hearts, and I'm glad that Gong Zhu is free from this restriction.
Scoring is more complicated as each heart has a different penalty value. However I found after several hands I enjoyed the less trivial mental arithmetic, even missing it when playing Hearts.
Generally, I feel the game experience is richer because there are more important cards than Hearts, yet the additional complexity is not overly random nor overwhelming. In contrast, while playing Hearts if I'm not trying to shoot the moon, my play feels almost forced, as my only goal is to avoid the Queen of Spades and as many hearts a possible. Sometimes I can choose who to penalize more by playing in a certain way, but this aspect of the game is limited and unrewarding.
Even when shooting the moon, the strategy is often clear, a drawback that I feel is exacerbated by the prohibition on breaking into penalty cards.
I'm ambivalent about the Hearts convention of passing three cards before the game starts. Sometimes it's extremely helpful since some people I've played with are rather predictable, allowing me to mold hands which I can easily shoot the moon with. On the other hand, this does reduce the challenge, and in general getting extra information about opponent's hands detracts from the game experience.
The internal conflicts are more intense and exciting. I can feel my greed fight my fear. Do I save high cards to win the goat? What if this causes me to wind up with the pig and/or a bunch of high hearts as well? Do I go for the double? It's 50 points on its own, but how sure am I that I won't pick up any hearts later on?
Other Fun Card Games
I recall being enthralled when introduced to playing cards as a child. Nothing but fifty-two bits of pasteboard, yet the possibilities are legion. Games of pure skill. Games of pure chance. Games that fell in between, and it seemed that a game existed for any given skill/luck ratio. And they can be played almost anywhere, with any number of people, even alone. Such power, and I could hold it in one hand. Merely shuffling and performing sleights at once soothed and inspired me. To this day I often carry a pack of cards on my person.
One of my well-loved books from my childhood was "The Book of Games" by Richard Sharp and John Piggott (ISBN 0883653893). I read it cover to cover countless times, though not necessarily in order, marvelling at the illustrations and fascinated by the history. I tried out many of the games within, goading whoever was around me to learn their obscure rules so I could play.
(I took offence to one sentence however: in the entry on Mah Jong it describes the Chinese as "devious". I don't think they're inherently better at games than other races!)
If only the internet and Wikipedia existed back then! I was frustrated by rules they omitted, not to mention the games they left out. Sometimes the descriptions were too concise due to space limitations. Today the rules of just about any game with some following are at one's fingertips, and one can sometimes even find free programs for an instant opponent, instructor and umpire. Unfortunately, this came after my heaviest gaming days (at least, games that don't involve computers!), so the games below are mostly from the above book.
Though I'll mention card games of many species, some bias will be noticeable since I'm fond of lightweight teamless trick-taking games. Player ability varies wildly in some circles, making partner selection problematic in games such as bridge and canasta.
If there are four players, I almost always enjoy playing Hearts and Chase the Pig. Big Two is another favourite.
When there are three players, Knaves is a good choice if I'm in the mood for something like Hearts. It is also a no-trump trick-taking game with penalty cards, but also gives one an incentive to win as many tricks as possible. Sergeant Major is an entertaining diversion for three, especially for those newer to trick-taking games. My friends and family found David Parlett's Ninety-nine refreshingly unique and challenging. (We played the original rules, as described in The Book of Games.)
I never found two-player card games appealing, though I feel like I could if I dedicated more time to them. In particular, I feel like I should like Cribbage, Bezique and Piquet. I've tried some novel two-player games described in "The Pan Book of Card Games" by Hubert Philips, but never grew attached to them.
For five players or above, I'd play poker, or in a less serious crowd, Bartog.
Also, Napoleon is a five-player trick-taking game whose gimmick is the existence of a secret partnership: the identity of highest bidder, Napoleon, is known, but his secretary is determined by the holder of a particular card that is not revealed until played during a trick. Thus it is a two-versus-three game where initially no one knows the composition of the teams.
I had forgotten the details of this game. But thanks to Google, I rediscovered this old friend, and learned a lot more. Apparently, my parents left out at least half the rules when they taught it to me. Also, it seems Napoleon is a popular Japanese card game.
As for patience/solitaire for one, I'd recommend downloading a program like PySol and exploring.
Saturday, November 25, 2006
Inkscape and Valgrind
I spend a lot more time at the PBC (Pairing-Based Cryptography) library site than I would like since I maintain those pages. I bet most webmasters have similar complaints.
I began to tire of seeing page after page of text, so I broke the monotony by adding a site logo. It serves no other purpose, but who knows? Maybe one day I'll be hawking coffee mugs and T-shirts emblazoned with this graphic!
I went through several ideas before selecting a shape, during which I gained respect for professional logo designers. Some famous logos seem so simple that I feel I could have designed them. But on further examination, they insiduously invade and reside in one's memory, are pleasing to the eye (or at least are not eyesores), and in fact satisfy many constraints.
After a few attempts I discovered that it is extremely diffcult to find symbols with the necessary properties, and in my case, to even judge how effective a given image is. Thus I settled on a design as soon as I sketched one that I could tolerate.
I recall the path I took. At one point I was experimenting with a stylized "P(B,C)". I removed the letters to get "(,)" and played around with the parentheses and comma until I arrived at the current logo.
As for the colour scheme, to pay homage to my current university, I let Stanford's design guidelines on colour dictate my choices. Thus the logo is predominantly cardinal, with sandstone and black playing secondary roles. On most desktops, many screen elements are white and hence all four Stanford "Identity Colors" appear.
The result reminds me of a symbol featured in the movie "The Incredibles". I hope this causes people to think the PBC library is incredible, at least subconsciously!
I used Inkscape to realize my idea. I had never tried Inkscape before. I was surprised at how easy it was for a newbie to use and how quickly I became comfortable and efficient with its interface.
(Strictly speaking, this is false. Many moons ago I fired up Inkscape's ancestor Sodipodi, but I was unable to do much without reading documentation. Either Sodipodi has since made great advances, or the fork was worth the trouble.)
While I'm on my soapbox lauding software, let me also praise Valgrind. I first heard of this program years ago. If only I had tried it years ago.
Finding memory leaks in my C projects had always been troublesome. Thanks to geek machismo [isn't this an oxymoron?], I chose the hard way, namely, to code wrappers around standard memory allocation functions that recorded statistics which I would then analyze to detect leaks.
Since my leak detection code would need to be bug-free itself for this to work, I would restrict myself to primitive designs to simplify implementation details, ultimately leaving much of the sleuthing to the developer. Furthermore, toggling leak detection was so annoying I rarely checked for leaks.
As I recently discovered, Valgrind magically gives detailed reports on where the leak is and what is being leaked [insert HP joke here]. And to use Valgrind one only needs to invoke gcc with different options during compilation.
Valgrind has other features too, but I haven't tried them yet.
Sunday, October 29, 2006
Pi Music
I must have spent most of my life facing one keyboard or another. Over a year ago, while facing a musical kind, I was in a weird whimsical mood and I wrote a tune based on the digits of pi.
It did not surprise me when I discovered I was on a well-beaten path, as a search on Google quickly reveals. After all, many pi-inspired feats have been accomplished, such as setting Edgar Allen Poe's poem "The Raven" to the digits of pi. See also the honourable mention in this ray-tracing competition.
The pi music I found was quite abstract, whereas I want mine to sound more standard. I chose to have the digits 1-7 represent the tones of the C minor scale, altered if necessary, while 8 and 9 wrap around, that is, I encoded 8 and 1 the same way, as well as 9 and 2. Accordingly, I would have treated a 0 as a 7 but the tune ends just before reaching the first 0. (Thus 31 decimal places are encoded, this number fortunately corresponding with the first two digits of pi.)
This is akin to jianpu, a numbered musical notation system which I see in many Chinese books, but have never encountered in the Western world though supposedly it has seem some use in Europe. My scheme differs in that with jianpu, one would normally notate a minor key starting from 6, not 1.
It's a shame jianpu is not more widespread. It is well-suited for quickly and concisely jotting down simple melodies. A blank piece of paper suffices: there is no need to rule a music staff first. Writing numbers and dashes can be done extremely fast and most people are accustomed to this, unlike drawing geometric figures. More sloppiness is tolerated, as it is easier to identify scrawled digits and dashes than it is to figure out which line or space a hastily-scribbled note lies on, or whether a certain dot belongs to a note or was just a slip of then pen. Jianpu is also easy to learn due to its logic and simplicity. (Of course, jianpu performs terribly for even a slightly complex piece.)
(To be precise, jianpu-like notation is in fact employed in Western music. For example, when one writes "C6" to denote a chord, the "6" refers to the sixth note of the major scale.)
I added extra notes in order to produce a repeated motif typical of tunes. Following the digits exactly produces a melody that sounds a little too random for my tastes.
I chose conventional chord progressions. The A section uses a chromatically descending bassline, and the bridge mostly follows the cycle of fifths.
The time signature is 3/4, as this resembles 3.14 superficially. Also, in early (Western) musical notation, this time signature was represented with a circle.
I had intended to post it only once I had made some recordings (ideally after working on Bliss) but I have enough on my plate as it is. Maybe next Pi Day.
Anyway, it feels like an auspicious time to release it, as I happened to have recently released version 0.3.14 of the PBC library. And not too long ago Akira Haraguchi broke the world record after memorizing 100,000 digits of pi.
The lead sheets were created using Lilypond. The Lilypond website convincingly argues that their software automatically produces better sheet music than their major competitors, including commercial ones. It's also easy to integrate its output with HTML:
Incidently, around the same time I also penned a tune that loosely encodes a friend's name, but using a different method:

