Thursday, September 4, 2008

AsciiDoc and MathML

Nowadays I use AsciiDoc to author HTML. I'm even converting old pages to use this convenient and presentable text format.

My mathematics notes present an interesting challenge. For MathML the AsciiDoc user guide recommends using double-dollar passthroughs around equations, but this implies at least three characters are required to delimit every equation, since ASCIIMathML or equivalent itself needs at least one character.

I'd like to have something like:
= Introduction =

Let $E: Y^2 = X^3 + a X + b$ be an elliptic curve.
to just work, so I settled on the following solution. I use itex2MML because
  • My old-fashioned, superstitious, purist side prefers bare-bones JavaScript-free static documents.
  • Familiarity.
  • It has a handy syntax for equations in display mode, i.e. equations in their own center-justified paragraphs as seen in mathematics texts. An invaluable feature, as I can't figure out how to center a paragraph with AsciiDoc.
It should be simple to substitute an alternative such as ASCIIMathML. Some may prefer its approach because documents are smaller, and the page source is intelligible. But I argue that most never bother viewing the source, the savings are slight, and one can always provide the AsciiDoc source if these factors matter.

Write these rules to a file named "macros":
[miscellaneous]
newline=\n

[blockdef-passthrough]
delimiter=^@{4,}$
subs=none
The first is optional. I just abhor the CR LF abomination.

Then feed the source through these commands to produce the final product, which should be a file with an .xhtml extension:
sed 's/\$\([^$]*\)\$/+++$\1$+++/g' \
| sed '/\\\[/i@@@@' \
| sed '/\\\]/a@@@@' \
| asciidoc -b xhtml11 -f macros - \
| itex2MML | sed '/<!DOCTYPE/c \
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"\
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" [\
<!ENTITY mathml "http://www.w3.org/1998/Math/MathML"> ]>
/xhtml11.dtd/d'
The above gobbledygook [careful with some of those newlines; Blogger may have put extra cuts in my lines] performs the following:
  1. Put the "+++" AsciiDoc inline passthrough around equations delimited by "$".
  2. Surround display-mode equations, delimited by "\[" and "\]", with our newly defined "@@@@" block passthrough macro.
  3. Put the resulting mess through AsciiDoc, which converts everything but our equations to an HTML 1.1 document.
  4. Run itex2MML to convert the equations to MathML.
  5. Fix the DOCTYPE declaration.
Caveats:
  • Inline equations must be closed on the line they are opened.
  • Expressions such as "$i$th" must be written as "$i$#th#", since AsciiDoc "+++" quotes are constrained.
I'm content with this setup, but I'm considering extending my script to detect equations automatically, a recent feature of ASCIIMathML, so that even the dollar signs are unnecessary.