Sunday, June 7, 2009

AsciiDoc To Blogger

Once I grew accustomed to writing AsciiDoc, editing even tiny amounts of HTML became bothersome. The sites I maintain use custom scripts to build HTML files from AsciiDoc source, but I have less control over this blog. Up until now I’ve been using the Blogger in-browser editor to fine-tune the markup in these posts.

AsciiDoc’s author, Stuart Rackham, also wrote a tool to go from AsciiDoc to a WordPress blog. Blogger should be similar, and perhaps even easier to work with, since WordPress appears to have a few quirks.

My first thought was to use the Mail-to-Blogger feature: I could run AsciiDoc on on the source, then send it to a particular email address to publish it. This attempt floundered becaues GMail has no raw HTML mode. Of course, I could script an SMTP server instead, but this seems excessive.

Next I considered the import and export feature. But even if I could figure out how to generate suitable XMLs, I’d have to click around and solve a CAPTCHA each time I imported a post.

Finally the simplest solution hit me: use the Blogger Data API. With an HTTPS request or two, I can post raw HTML, set labels, and even choose whether to publish immediately or save as a draft. All it takes is a shell script using curl with Google data services:

#!/bin/bash

if [[ -z "$1" ]] ; then
echo Usage: $0 ASCIIDOC_SOURCE [LABELS...]
exit 1
fi

if [[ ! -f "$1" ]] ; then
echo $1 not found.
exit 1
fi

outfile=$1.xml
# Extract = Title =, which must be on a line by itself.
title=$(grep '^ *=' -m 1 $1 | sed 's/^ *=* *//' | sed 's/ *=* *$//')

# Hmm, this draft thing used to work, but not anymore.
echo '<entry xmlns="http://www.w3.org/2005/Atom">
<app:control xmlns:app="http://www.w3.org/2007/app">
<app:draft>yes</app:draft>
</app:control>
<title type="text">'$title'</title>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">' > $outfile
asciidoc -f macros -s -o - $1 >> $outfile
echo '
</div>
</content>' >> $outfile
while [[ -n $2 ]]; do
echo ' <category scheme="http://www.blogger.com/atom/ns#" term="'"$2"'" />' >> $outfile
shift
done
echo '</entry>' >> $outfile

# I can't figure out how to preserve line breaks in Atom XML, hence the
# following workaround.
sed -i '/<pre>/,/<\/pre>/a<br \/>' $outfile

if [[ -z $AUTH_TOKEN ]]; then
stty -echo
read -p "Blogger password: " pw
stty echo

token=$(curl --silent https://www.google.com/accounts/ClientLogin \
-d Email=benlynn@gmail.com -d Passwd="$pw" \
-d accountType=GOOGLE \
-d source=asciidoc2blogger \
-d service=blogger | grep Auth | cut -d = -f 2)
AUTH_TOKEN=$token
echo AUTH_TOKEN=$token
fi

# The URL was cut and pasted from <link rel="service.post"> from my
# blog's HTML source.
curl --silent --request POST --data "@$outfile" \
--header "Content-Type: application/atom+xml" \
--header "Authorization: GoogleLogin auth=$AUTH_TOKEN" \
"http://www.blogger.com/feeds/4222267598459829544/posts/default" \
| tidy -xml -indent -quiet

Actually, that’s not quite all: to work around another weird XML whitespace issue I use the following AsciiDoc macros file.

[miscellaneous]
newline=" \n"

We insert a space before each newline to prevent words separated by a line break from being joined together.

1 comment:

Unknown said...

Thanks for the script.

But the workaround with sed doesn't work.

Blogger will complain with below information:

The markup in the document following the root element must be well-formed.

It works without sed line.

Thanks