Saturday, July 31, 2010

The timeless beauty of shell scripts

Long ago, Doug McIlroy wrote: “This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”

Years later, Rob Pike observed: “Those days are dead and gone and the eulogy was delivered by Perl.” His statement mostly stands, except Python is the new Perl.

I refuse to abandon the old ways. So when a friend pointed me to the intriguing Python Challenge, I avoided Python as much as I could. Instead, I used the Bash shell to string together special-purpose tools.

The FAQ states the purpose of the challenge is to “provide an entertaining way to explore the Python Programming Language”, and “demonstrate the great power of Python’s batteries”. However, I feel it is better suited for training shell script muscles: most of the problems are of the one-shot trivial variety that suit Unix tools.

A full-featured language is often overkill. Abraham Maslow’s quote comes to mind: “It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” I don’t mean to disparage Python, but I feel shell scripts are often overlooked and under-appreciated, especially as they are so accessible. Technically, you’re already shell programming when you run a program from the command-line. Why not learn a bit of Bash or similar, and increase the power available at your fingertips?

Brevity is the soul of wit

While general-purpose scripting languages have their place, judging by the posted solutions, for typical riddles in the Python Challenge, a short Python script is often outdone by a shorter still Bash incantation. In fact, in the first challenge you can stay in your shell. Did you know Bash natively handles (fixed-width precision) arithmetic? For example:

$ echo $((2**42))

Naturally, if arbitrary precision were needed, we could invoke a specialized tool:

$ echo 10^100 | bc

Humble Unix tools yield the most succint solution for several challenges. For example, a Caesar shift is probably terser with tr than any popular language:

$ tr a-z l-za-m

Or extracting lowercase letters from a file:

$ tr -cd a-z

When regular expressions are involved, even though the code may look similar, the old guard such as awk, sed, and grep that feature regularly in Bash scripts have an inherent advantage over Python (and Perl, PHP, Ruby, …). Python takes exponential time to match some regular expressions whereas the classic Unix tools take polynomial time to match the same expressions.

On the downside, Bash makes some tasks tiresome. I can’t think of an easy way to convert an decimal number to an ASCII character. This Bash FAQ suggests the cumbersome:

$ for a in 66 101 110; do printf \\$(printf '%03o' $a); done; echo

Another chore is repeating a character a given number of times. Other than a loop, perhaps the easiest hack is something like:

$ printf "%042d" 0 | tr 0 x

A tiny elegant Haskell solution exists for problem 14, thanks to the transpose function and the language’s concise notation for recursion and composition. A search revealed Bash fans often employ a simple but tedious Awk script for matrix transposition, suggesting a Bash solution is necessarily significantly longer.

Happily, these blemishes are dwarfed by the successes of the Unix philosophy. More than once, my script has been simpler and briefer than any other posted solution because the complexity is hidden within a tool that does one thing, and does it well. My proudest achievement is a one-liner to compute the look-and-say sequence [hint: uniq -c].

Sunday, July 11, 2010

Chinese Input

Google Translate supplies a clumsy but straightforward means for entering Chinese characters with a US keyboard, especially for those learning the language. Simply type the English meaning, and copy the result. You can check the character is indeed the one you want by clicking on "Show romanization", or on the speaker icon to hear a synthesized reading.
However, sometimes I have a particular character in mind. A short-term solution is to use an online rendition of a traditional dictionary ordered by radical and stroke count, or a pinyin dictionary. Additional speed and convenience requires investment; one must learn one of the many fascinating methods for entering Chinese characters on a computer.
I’ve read that the Wubizixing input method is fastest, though as one might expect, it requires the most investment. Proficiency demands much practice with a suitably annotated keyboard.
For now, I’ve opted for the Wubihua method, which mimics how humans write characters. It may be slower, but it can be learned quickly. Also it applies when sending Chinese text messages on mobile phones.
I supplement Wubihua with a pinyin method, as I’m practically illiterate in Chinese.
Chinese input in Linux
To setup Chinese input methods in Ubuntu, I installed the scim and scim-pinyin packages, then modified my .xsession as follows. I prepended:
export XMODIFIERS="@im=SCIM"
export GTK_IM_MODULE="xim"
and appended:
scim -d
after which pressing Ctrl+Space toggles Chinese input.
"Stroke 5" is Wubihua mode. Stroke types are mapped to 5 keys along the bottom row. From right to left:
/: | (vertical; top-to-bottom)
.: \ (downwards left-to-right)
,: / (downwards right-to-left)
m: - (horizontal; left-to-right)
n: other stroke types, e.g: 乙
Perhaps one can remember this as follows: 乙 looks like a rotated N. A lowercase M takes more horizontal space than most letters, so it corresponds to the horizontal stroke. On the next two keys, the less-than and greater-than signs point left and right, so they correspond to the left and right downward strokes. Lastly, the stem of the question mark suggests a vertical stroke.
Although the pinyin method is found in the Simplified menu (智能拼音; "smart pinyin"), it also offers Traditional characters. I originally learned zhuyin (aka bopomofo), a more traditional pronounciation-based method for producing Traditional characters, but it is ill-suited for a US keyboard. Fortunately converting from zhuyin to pinyin is trivial.
Chinese to English
In a pinch, I’ll use Google Translate to learn Chinese phrases. However, browser plugins are much handier; see the Chrome Zhongwen extension and the Firefox Perapera-kun add-on.