Thursday, September 4, 2008

AsciiDoc and MathML

Nowadays I use AsciiDoc to author HTML. I'm even converting old pages to use this convenient and presentable text format.

My mathematics notes present an interesting challenge. For MathML the AsciiDoc user guide recommends using double-dollar passthroughs around equations, but this implies at least three characters are required to delimit every equation, since ASCIIMathML or equivalent itself needs at least one character.

I'd like to have something like:
= Introduction =

Let $E: Y^2 = X^3 + a X + b$ be an elliptic curve.
to just work, so I settled on the following solution. I use itex2MML because
  • My old-fashioned, superstitious, purist side prefers bare-bones JavaScript-free static documents.
  • Familiarity.
  • It has a handy syntax for equations in display mode, i.e. equations in their own center-justified paragraphs as seen in mathematics texts. An invaluable feature, as I can't figure out how to center a paragraph with AsciiDoc.
It should be simple to substitute an alternative such as ASCIIMathML. Some may prefer its approach because documents are smaller, and the page source is intelligible. But I argue that most never bother viewing the source, the savings are slight, and one can always provide the AsciiDoc source if these factors matter.

Write these rules to a file named "macros":

The first is optional. I just abhor the CR LF abomination.

Then feed the source through these commands to produce the final product, which should be a file with an .xhtml extension:
sed 's/\$\([^$]*\)\$/+++$\1$+++/g' \
| sed '/\\\[/i@@@@' \
| sed '/\\\]/a@@@@' \
| asciidoc -b xhtml11 -f macros - \
| itex2MML | sed '/<!DOCTYPE/c \
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"\
"" [\
<!ENTITY mathml ""> ]>
The above gobbledygook [careful with some of those newlines; Blogger may have put extra cuts in my lines] performs the following:
  1. Put the "+++" AsciiDoc inline passthrough around equations delimited by "$".
  2. Surround display-mode equations, delimited by "\[" and "\]", with our newly defined "@@@@" block passthrough macro.
  3. Put the resulting mess through AsciiDoc, which converts everything but our equations to an HTML 1.1 document.
  4. Run itex2MML to convert the equations to MathML.
  5. Fix the DOCTYPE declaration.
  • Inline equations must be closed on the line they are opened.
  • Expressions such as "$i$th" must be written as "$i$#th#", since AsciiDoc "+++" quotes are constrained.
I'm content with this setup, but I'm considering extending my script to detect equations automatically, a recent feature of ASCIIMathML, so that even the dollar signs are unnecessary.


jomarshe said...

Glad I stumbled on your blog. I recently switched from reStructuredText to AsciiDoc (which, strangely, I had never heard of before Googling some RST limitations). I haven't come close to exploring all its features, but I should be neck-deep in equations and tables soon. One feature I miss from RST is its S5 implementation. I entered your site on a post from a couple of years ago (!! I was still using MS Office back then) about S5 and I'm wondering if you have found an easy way to transform AsciiDoc into S5 slides. Thanks for your informative posts.

Ben Lynn said...

I never thought of this. Custom macro definitions should do the trick, but the AsciiDoc mailing list would know better.

It's been a while since I played with S5. Last time, partly to show moral support, I used an in-house GUI tool at work (Google Docs), which also runs on any browser. But that was a straightforward one-off presentation containing no equations.