Showing posts with label android. Show all posts
Showing posts with label android. Show all posts

Wednesday, February 11, 2009

Compiler Compilers

Long ago I sat a set of exams designed to test the breadth of our computer science knowledge. I failed Compilers. Luckily, they had just changed the rules, and it was acceptable to fail Compilers provided you passed the Databases exam. Fortune smiled upon me, and I passed the Databases exam despite possessing even less knowledge in that field. Or perhaps fortune was actually laughing at me: I stayed away from compiler theory for the rest of my degree to my detriment. It's scandalous I still know so little.

I was especially dispirited by my failure since I had thought I knew compilers well. When I was a child, I felt moved after learning about recursive descent parsers. I had been mystified: how could a program could possibly read an expression like (1+2)*((3+4)/5)? All I could think of was to use a simple loop to iterate over the tokens in the expression and assigning them to local variables before processing them. This led nowhere, because the nesting of parentheses can be arbitrarily deep. Thus I was awestruck by the elegance of recursive descent; after the first reading the technique was indelibly etched in my mind.

Much later, in university, basic compiler theory (grammar, languages, automata, and so on) came easily to me. I implemented a compiler for a toy language which produced the smallest and fastest code in my class. I'd also written the odd recursive descent parser for pet projects.

So I thought I was a compiler expert. But I should have paid attention to my weaknesses. I found LR(1) parsers unintuitive. I leafed through their proofs of correctness and promptly forgot their details, especially since the textbook stated these parsers are rarely coded by hand and instead generated by programs such as yacc or bison. Instead of embracing these powerful tools, I stuck with trusty hand-coded parsers because I understood them completely.

Ultimately this caused me to fail my Compilers exam, which was peppered with yacc questions, and more seriously, to waste copious amounts of time coding and debugging parsers and lexical analyzers. I have at last invested energy in seriously learning flex and bison, and I urge anyone in a similar situation to do the same. All these years, instead of writing recursive descent parsers unless there was a compelling reason not to, my default attitude should have been to use flex and bison unless there was a compelling reason not to.

I wrote a compiler that converts a toy language (essentially a tiny subset of Limbo) to Dalvik bytecode. I also wrote a Dalvik assembler along the way. In both cases, flex and bison were invaluable: gone are the finicky routines I've written and rewritten a million times, like handling an unexpected end of input, and fine-tuning mutually recursive functions and logic to get the desired operator precedence and associativity.

I've put my efforts on hold. I was hoping for a relatively functional and less object-oriented input language, but it seems such a language would be tricky to shoehorn into Dalvik bytecode: function calls are abstracted, and with no control over stack frames I doubt I can implement trampolining.

Also, I greatly underestimated my ignorance of compiler theory. I know nothing about the static single assignment (SSA) form, which appears too new for the textbooks I had. Moreover, I need to refresh my memory on older optimization techniques.

On the other hand, I don't want to discard my code, particularly as it will forever remind me to consider flex/bison before coding yet another parser by hand. Below is the git repository containing my primitive compiler and assembler that turns this:
init() {
print("Hello, World");
}

on_input(arg : string) {
a: int;
a:= parse_int(arg);
print("\n");
print(int_to_string(a * (a + 1) / 2));
}
into an Android app.

http://cs.stanford.edu/~blynn/bender.git

Friday, February 6, 2009

Minimal Dalvik Executables

When I was a kid, the Windows era had yet to arrive. Floor-model PCs in shops ran MS-DOS. They'd often take steps to stop little imps like me messing around with their systems. I recall one where they even deleted the standard DOS utilities. But they had left COMMAND.COM unmodified which is enough for plenty of mischief. One can create executables via:
 $ copy con: hello.com
and then enter machine code by holding Alt while typing a number on the keypad. (It doesn't work on DOSBox by the way.) Some numbers, such as 0, must be avoided, but there are workarounds; the book from which I learned the trick, Ralph Burger's "Computer Viruses: a High-tech Disease", recommended first creating a program that read numbers in ASCII and output the corresponding binary, a sort of proto-assembler, which in turn could be used to create more complex executables.

I created a HELLO.COM program by typing the following, where the numbers are typed on the numberpad with the Alt key held:
 180 9 190 9 1 205 33 205 32 hello$
which is perhaps more recognizable to DOS programmers when converted to hex:
 B409 BA0901 CD21 CD20 hello$
It is the machine code for:
MOV AH, 09
MOV DX, 109h
INT 21h
INT 20h
This instructs DOS to print the '$'-terminated string at address 0x109 and then exit. When .COM files are loaded, DOS places their contents at address 0x100 (locations 0 to 0x100 are populated with miscellaneous information such as the command-line), hence the 0x109.

Thus with 9 bytes plus the size of the message, plus a byte for the terminator, we get a valid DOS program. We can type raw machine code from the keyboard to produce a full program.

Linux on x86 systems is less straightforward. The machine code is similar, but there must also be an ELF header (not strictly true: in some circumstances, the entire machine code can be placed within a truncated ELF header!), making binaries more tedious to construct by hand. However, certain kernel configurations allow different header types, some of them brief enough to be easily memorized and hand-rolled.

Minimalist Android

How about Android? An APK file is a ZIP file containing, among other files, the Dalvik executable in a DEX file. We're interested in minimizing this DEX file, whose full specification is well-documented.

All Android code must lie in some class, and Android applications typically inherit from a library class such as Activity, so we must also record information such as the class name, access rights, and inheritance. This, in turn, requires a strings section so we can store the class names, as well as invoked library functions and function signatures. Other overheads include the header and the map section. We can cheat a little by defining the UI with XML, so it won't contribute to the size of the DEX file.

After compiling to Java class files, most or all the R*.class files can be removed before making a DEX file,since the constants they contain are hardcoded in other class files by javac. For some projects, R$styleable must be retained, though this class appears to have since been removed from the public API.

I chose to write a program that parsed the DEX file from the HelloAndroid example and output only the necessary sections rather than attempt to create a DEX file from scratch. I found I could safely elide the source filenames, debug information and annotations. The result consists of 632 bytes, though we could trivially shrink this further by using a shorter name for the class.
0000000: 6465 780a 3033 3500 2f4f 153b 3623 8747  dex.035./O.;6#.G
0000010: 6d02 4697 5b1e 959d a8b1 2f0f 9c3a a14f m.F.[...../..:.O
0000020: 7802 0000 7000 0000 7856 3412 0000 0000 x...p...xV4.....
0000030: 0000 0000 f001 0000 0a00 0000 7000 0000 ............p...
0000040: 0500 0000 9800 0000 0300 0000 ac00 0000 ................
0000050: 0000 0000 0000 0000 0500 0000 d000 0000 ................
0000060: 0100 0000 f800 0000 6001 0000 1801 0000 ........`.......
0000070: 6201 0000 6a01 0000 6d01 0000 8501 0000 b...j...m.......
0000080: 9a01 0000 bc01 0000 bf01 0000 c301 0000 ................
0000090: c701 0000 d101 0000 0100 0000 0200 0000 ................
00000a0: 0300 0000 0400 0000 0500 0000 0500 0000 ................
00000b0: 0400 0000 0000 0000 0600 0000 0400 0000 ................
00000c0: 5401 0000 0700 0000 0400 0000 5c01 0000 T...........\...
00000d0: 0100 0000 0000 0000 0100 0200 0800 0000 ................
00000e0: 0300 0000 0000 0000 0300 0200 0800 0000 ................
00000f0: 0300 0100 0900 0000 0300 0000 0100 0000 ................
0000100: 0100 0000 0000 0000 ffff ffff 0000 0000 ................
0000110: e101 0000 0000 0000 0100 0100 0100 0000 ................
0000120: 0000 0000 0400 0000 7010 0000 0000 0e00 ........p.......
0000130: 0300 0200 0200 0000 0000 0000 0900 0000 ................
0000140: 6f20 0100 2100 1500 037f 6e20 0400 0100 o ..!.....n ....
0000150: 0e00 0000 0100 0000 0000 0000 0100 0000 ................
0000160: 0200 063c 696e 6974 3e00 0149 0016 4c61 .....I..La
0000170: 6e64 726f 6964 2f61 7070 2f41 6374 6976 ndroid/app/Activ
0000180: 6974 793b 0013 4c61 6e64 726f 6964 2f6f ity;..Landroid/o
0000190: 732f 4275 6e64 6c65 3b00 204c 636f 6d2f s/Bundle;. Lcom/
00001a0: 616e 6472 6f69 642f 6865 6c6c 6f2f 4865 android/hello/He
00001b0: 6c6c 6f41 6e64 726f 6964 3b00 0156 0002 lloAndroid;..V..
00001c0: 5649 0002 564c 0008 6f6e 4372 6561 7465 VI..VL..onCreate
00001d0: 000e 7365 7443 6f6e 7465 6e74 5669 6577 ..setContentView
00001e0: 0000 0001 0102 8180 0498 0203 01b0 0200 ................
00001f0: 0b00 0000 0000 0000 0100 0000 0000 0000 ................
0000200: 0100 0000 0a00 0000 7000 0000 0200 0000 ........p.......
0000210: 0500 0000 9800 0000 0300 0000 0300 0000 ................
0000220: ac00 0000 0500 0000 0500 0000 d000 0000 ................
0000230: 0600 0000 0100 0000 f800 0000 0120 0000 ............. ..
0000240: 0200 0000 1801 0000 0110 0000 0200 0000 ................
0000250: 5401 0000 0220 0000 0a00 0000 6201 0000 T.... ......b...
0000260: 0020 0000 0100 0000 e101 0000 0010 0000 . ..............
0000270: 0100 0000 f001 0000 ........
Only the 4 16-bit words at 0x128 and the 9 16-bit words at 0x140 are actual Dalvik bytecode. They belong to the constructor and the onCreate method respectively.

Thursday, January 22, 2009

Developing Android without Eclipse or Ant

The Android documentation recommends using Eclipse for development, but this is sluggish on my laptop, and feels cluttered on compact screens. A stated alternative is to use Ant, but this is also unnecessary as the build commands it invokes can be easily extracted and executed manually. This suits my preference for a minimalistic or practically non-existent IDE, especially on laptops, as well as my distaste for editing XML. As a bonus, on Ubuntu, I install one package:

$ sudo apt-get install sun-java6-jdk

along with the Android SDK and I'm ready to develop for Android. In contrast, the Ant package has several dependencies, and the required version of Eclipse is newer than the currently available Ubuntu package.

Here are the details:

$ SDKDIR=~/android-sdk-linux_x86-1.0_r2 # Where I extracted the SDK.
$ PATH=$SDKDIR/tools:$PATH
$ activitycreator -o somepath some.project.name
$ cd somepath

At this point Ant can build the project, but masochists like me will prefer to shun Ant and run the underlying commands themselves. The following command generates R.java from the resources:

$ aapt p -m -J src -M AndroidManifest.xml -S res -I $SDKDIR/android.jar

Next, the source is compiled to .class files. This is why we need the JDK:

$ mkdir bin/classes
$ javac -encoding ascii -target 1.5 -d bin/classes \
-bootclasspath $SDKDIR/android.jar src/some/project/*
# When libs exist, append -classpath=libs/*.jar

These files contain instructions for the JVM. They are converted to Dalvik bytecode via:

$ dx --dex --output=bin/classes.dex bin/classes # If libs exist, append libs/*.jar

The resources and assets are packaged:

$ aapt p -f -M AndroidManifest.xml -S res -I $SDKDIR/android.jar -F bin/projectname.ap_
# When assets exist, add -A assets

This package is itself packaged with the Dalvik bytecode to make the final product:

$ apkbuilder bin/something.apk -z bin/projectname.ap_ -f bin/classes.dex -rf src -rj libs
# Use -u for unsigned builds.

Now if I could only roll my own Dalvik compiler for a language I like, I could avoid the Java SDK, as well as Java itself.

Sunday, January 4, 2009

Encoding videos for the G1

Soon after Android's debut, initial reviews were complaining about the G1's inability to play videos. At the time I felt it was a trifling matter. Since YouTube already works on the G1, a more general video player ought to require only minor tinkering. And why is it a big deal anyway? Are people in a rush to experience Hollywood blockbusters on a tiny screen and tinny speakers?

Sure enough, video playing apps appeared quickly on Android Market. I found at least three in my last search. As for their utility: now that I've played with them, I understand why some see a video player as a must-have, and not merely yet another vehicle for demonstrating the G1's capabilities.

While breathtaking cinematography, award-winning soundtracks and state-of-the-art special effects translate poorly to a handheld device, movies that derive their strength chiefly from factors such as a witty dialogue lose little. Informational clips are another good use case. And even with scenes that only work well with a big screen: if they're my favourite scenes of all time, it's still comforting to have them at my fingertips, even if they are greatly reduced.

Which leaves the problem of converting videos to a G1-friendly format. I knew almost nothing about the black art of video encoding, but after hours of false starts and trawling the net for help, I still know almost nothing. However, I picked up enough to get started.

An unofficial FAQ states the G1 currently supports the H.264, 3GP and MPEG4 video codecs. I skimmed a few Wikipedia entries to decode this alphanumeric soup, and my interpretation is that this really means the G1 recognizes .MP4 and .3GP files (which are container formats and not video encoding methods) and understands the H.263, MPEG-4 ASP ("roughly similar" to H.263) and H.264 (aka MPEG-4 AVC) video compression standards. [Trivia: ASP expands to the somewhat oxymoronic "Advanced Simple Profile".]

The G1 only speaks the BP (Baseline Profile) dialect of H.264; videos that use more advanced features of H.264 play incorrectly, or not at all. [At last this post makes sense to me!]

On Ubuntu, the easiest way to encode videos was to use avidemux (sudo apt-get install avidemux). For audio, I only tested the AAC codec (usually at 96kbps), but several others should also work. For video, selecting MPEG-4 ASP produced videos that seemed to tax the CPU heavily on playback, often to the point of unwatchability. Perhaps this could be remedied by lowering the number of frames per second via an appropriate filter.

In any case, I prefer selecting the MPEG-4 AVC video codec and dumbing it down for the G1 by disabling the appropriate H.264 options: I changed the Max Consecutive B-Frames to 0, unchecked all options related to B-frames, and disabled CABAC along with the 8x8 Transform and 8x8 Intra search options. As before, I added the MPlayer variant of the resize filter: the G1 has a 480x320 screen in landscape mode. At 384kbps, this produced videos that played smoothly though higher bit rates should be feasible.

Now that I appreciate video more, I eagerly await the "Cupcake" Android update which is rumored to have video recording. I also hope the next generation of hardware will have video out and enough power to handle any popular codec.

Wednesday, November 26, 2008

The T-Mobile G1

I had intended to code more for my GP2X, starting with tweaking my version of Netwalk. But I almost certainly never will, now that I recently received a T-Mobile G1 phone. Though less suitable for gaming, its smaller size along with its phone and WiFi mean I'll actually carry it everywhere and run its apps.

I had thought it likely that I'd end up reverting to my old simple phone and only access the Internet with my laptop, as I have always done, but the G1 has enough right that I finally see why so many people are effectively surgically grafted to their smartphones. Judging by the last few days, I've become one of those addicts I formerly mocked. I've even caught myself scrolling the home view (or whatever it's called) back and forth, just to enjoy the smooth parallax action at my fingertips.

Even now I get that "shiny new toy" delight from discovering a new G1 trick. Some "Wow!"-worthy free apps:
  • Twisty: Interactive fiction on the go. (Not on Android Market at the moment; I installed this by allowing unknown sources and running ApkInstaller.)
  • ShopSavvy, CompareEverywhere: When did you last own a phone that could scan barcodes for price comparisons?
  • WikiTude: Try this futuristic app in a photogenic location.
  • StreetView: Digital-compass-enhanced deja vu on demand.
  • Amazed: Hone your tilting skills.
  • Voice Dialer: I have to put on an American accent to get digits working, but I'm impressed it recognizes my atrocious pronunciation of certain names. When will we see more voice control and text-to-speech apps?
  • Flashlight: Unexpectedly handy. If only I could find a colour that brings out the printer dot codes. Or even better, an app that could decode the patterns for the secret agent appeal.
Full disclosure: I work for Google, though not on Android. Not that it matters as you should approach everything you read with healthy skepticism! Anyway, I'm not completely happy with the G1...

The n things I hate about Java

If it weren't for Java, I'd already be writing Android applications instead of writing about them. I've never gotten along well with the language. I'm old enough to remember the hype surrounding its release. I instinctively distrust excessive marketing (which indirectly caused an unrelated language to be named "JavaScript") and my experiences have only reinforced my initial disdain.

When I first examined Java, I was dismayed that autoboxing was unsupported (now fixed) resulting in the "int versus Integer" mess. I also felt they had missed a golden opportunity to leave the most confusing parts of C syntax behind, instead of superficially pandering to veteran programmers. Also, my pro-multiple-inheritance stance obviously clashed with Java. It seemed to me Java was made public before it was ready. Resources spent on marketing may have been better spent on improving the language before release.

Years later, another encounter cemented my opinion. In a project where I had to use Java, I wanted a thread to perform a nonblocking read on a channel. I was mildly perturbed when I found the Java threading API describing certain functions as deprecated because they were impossible to implement safely. How could they get this wrong? Surely they could have studied any of the many previous threading APIs?

More distressing was my lengthy trial-and-error discovery that my desired nonblocking read was impossible, due to some API design detail which I've gladly forgotten. I do remember documentation claiming the issue was addressed in a newer version, but unfortunately I was constrained to an older version. When I eventually worked around the problem, I felt especially frustrated since I could do exactly what I wanted in a few lines of C.

Shifts in my beliefs have further lessened Java's appeal. I was once an object-oriented fanatic, even submitting an object-oriented entry to a programming contest in a futile attempt to "prove" it was the right way to design software. (To clarify, I'm not saying the attempt was futile because it was object-oriented; the blame lies with me.) In those days I shunned Java because it wasn't object-oriented enough. Since I renounced my object-oriented ways years ago, I now feel Java is too object-oriented!

Now, to develop for Android, I must face Java again. That I'm willing to consider this is itself a glowing endorsement of the G1. I remain unimpressed: Java's primary advantage appears to be its fame, not its features. In fact, developers are urged to avoid common object-oriented practices for performance reasons.

Additionally, the downloads are heavyweight; a good net connection is a must. The "Hello, Android" example somehow requires a nontrivial amount of code as well as an XML file. In contrast, for C, a kinda sorta C compiler fits in a few hundred lines, and "Hello, World" fits right here: int puts(const char *);int main(){return puts("Hello, World!");}

This comparison is grossly unfair, as C libraries for such a sophisticated phone would also be nontrivial, but all the same, if the Bell Labs researchers of yore were behind the drawing board, the programming interface would be terse and elegant.

One bright spot is Android's register-based Dalvik virtual machine. A prevailing attitude was (and still is?) that Java could replace C/C++ in most situations: how can this be when the standard JVM is a stack-based snail? This is yet another detail Java should have got right from the start; compare with the design of Inferno's Dis virtual machine.

I hope Dalvik bytecode compilers for more agreeable languages will spring forth soon, as that will eliminate my biggest gripe.

Hardware Hankerings

Some deficiencies can be resolved with a little money. At the top of my wishlist is a standard stereo jack so I can use it as a stand-alone music player; I'll probably get an HTC 3-in-1 USB adapter for this purpose. (New G1 buyers will get an adapter free, but that's little consolation for me.) I'll probably also fork over dough for a bigger micro SDHC card, preferably at least 8GB so I can laugh at standard 4.7GB DVDs.

Other features will have to wait for the next generation. For gaming, I'd love to have an Android device with a nice D-pad or two; see the Pandora. While I'm at it, a video camera as well. One that can face the user so you could videophone others like in many a sci-fi flick. And then I may as well ask for a video out too, so it could be a stand-alone video player. Rather than build in all these gadgets, it may be better to have a few standard ports (perhaps USB) to attach oddball devices. I can't wait to see what they'll come up with.