tag:blogger.com,1999:blog-42222675984598295442024-03-15T01:17:13.731-07:00Ben Lynn's Online GarbageGames, puzzles, coding, maths, etc.Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.comBlogger117125tag:blogger.com,1999:blog-4222267598459829544.post-89028284662334300402020-01-15T08:37:00.000-08:002020-01-15T08:37:24.186-08:00Laziness is next to godliness<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p>I’ve been working through <a href="https://doi.org/10.1017/CBO9780511576430"><em>Handbook of
Practical Logic and Automated Reasoning</em> by John Harrison</a>. I enjoy translating
its OCaml listings to Haskell: the two languages share much in common, so I can
devote most of my attention to experimentation and exploration.</p></div>
<div class="paragraph"><p>I largely focused on aesthetics. With the help of pattern synonyms, recursion
schemes lead to succinct code for manipulating abstract syntax trees. Monads
and typeclasses reduce clutter. Lazy evaluation simplifies some tasks such as
enumerating all ground instances: we produce a never-ending list rather than
manage drip-fed enumeration with a counter.</p></div>
<div class="paragraph"><p>As I’ve come to expect from Haskell, frequently my code just worked with little
or no debugging. But success bred suspicion; my code worked too well!</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_flattening_the_competition">Flattening the competition</h2>
<div class="sectionbody">
<div class="paragraph"><p>My first MESON port solved
<a href="https://www.aaai.org/Papers/AAAI/1984/AAAI84-024.pdf">Schubert’s Steamroller</a> in
about 15 seconds on my laptop in GHCi. I was pleased, but the end of section
3.15 showed how to more efficiently distribute the size bound over subgoals to
get a faster MESON that proves the steamroller "in a reasonable amount of
time".</p></div>
<div class="paragraph"><p>Wow! Did the author feel 15 seconds was sluggish? Would this optimization bring
the time down to 5 seconds? Half a second? I eagerly implemented it to find
out.</p></div>
<div class="paragraph"><p>The change ruined my program. It crawled so slowly that I interrupted it,
afraid it would eat all my system’s resources. In desperation I downloaded
<a href="https://www.cl.cam.ac.uk/~jrh13/atp/">the original OCaml code</a> to investigate
why I was experiencing the opposite of what the book said. I expected to be
awed by its speed, after which in a fit of jealously I’d figure out how I’d
botched my rewrite.</p></div>
<div class="paragraph"><p>Instead, I was shocked to find the OCaml version was even worse. In other words,
my supposedly unoptimized MESON was an order of magnitude faster than the most
advanced MESON in the book. But how? I had merely translated from one language
to another, almost mechanically. Surely bugs were to blame.</p></div>
<div class="paragraph"><p>After spending hours looking for them in vain, it dawned on me that my code
might be correct after all, and I had fortuitously stumbled upon effective
optimizations. Further analysis supported this: I now believe my implementation
of MESON legitimately outperforms the book version due to lazy evaluation.</p></div>
<div class="paragraph"><p>Because of how we use continuations, Haskell memoizes expensive computations
and avoids repeating them during backtracking. It calls to mind a suggestion in
the book to "somehow remember lemmas encountered earlier in proof search".
Adding the sophisticated size bound distribution hampers the reuse of the
memoized continuations because of an additional parameter, which explains why
a purported optimization crippled my program.</p></div>
<div class="paragraph"><p>Normally, lazy evaluation surprises me with an unpleasant space leak. I’m
grateful that for once it surprised me by dramatically boosting performance!</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_got_something_to_prove">Got something to prove?</h2>
<div class="sectionbody">
<div class="paragraph"><p>Thanks to <a href="https://github.com/tweag/asterius">the Asterius GHC WebAssembly backend</a>,
we can use a web browser to confirm the unreasonable effectiveness of a lazy MESON:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="https://crypto.stanford.edu/~blynn/compiler/fol.html">First-order logic theorem provers</a>
</p>
</li>
</ul></div>
<div class="paragraph"><p>Click on "Presets", select "steamroller", then click "Lazy MESON".</p></div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-9532129948951544032018-11-16T10:39:00.000-08:002018-11-16T10:39:20.472-08:00Lambda the Penultimate<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p>Lambda expressions have proven so useful that even Java and C++ support them
nowadays. But how do we compile them for a machine to run? No CPU has a lambda
instruction.</p></div>
<div class="paragraph"><p>One strategy is to convert lambda terms into
<a href="https://en.wikipedia.org/wiki/Tacit_programming">point-free code</a>, a process
known as
<a href="https://en.wikipedia.org/wiki/Combinatory_logic#Completeness_of_the_S-K_basis"><em>bracket
abstraction</em></a>. One such algorithm rewrites any program in terms of two
functions: the S and K combinators.
<a href="https://crypto.stanford.edu/~blynn/lambda/sk.html">We can build a compiler by
assembling just two simple functions</a>.</p></div>
<div class="paragraph"><p>Unfortunately, even with extra rewrite rules, classic bracket abstraction
yields monstrous unwieldy expressions. Decades ago, they worked around this
problem by building custom combinators tailored for each input program, known
as <em>supercombinators</em>. Compilation is trickier, but at least the output is
reasonable.</p></div>
<div class="paragraph"><p>But recently,
<a href="http://okmij.org/ftp/tagless-final/ski.pdf">Oleg Kiselyov found <strong>a time-linear
and space-linear bracket abstraction algorithm</strong></a> provided the De Bruijn
indices of the input term are encoded in unary. It only takes 20 lines:</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.8
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span style="font-weight: bold"><span style="color: #0000FF">data</span></span> <span style="color: #009900">Deb</span> <span style="color: #990000">=</span> <span style="color: #009900">Zero</span> <span style="color: #990000">|</span> <span style="color: #009900">Succ</span> <span style="color: #009900">Deb</span> <span style="color: #990000">|</span> <span style="color: #009900">Lam</span> <span style="color: #009900">Deb</span> <span style="color: #990000">|</span> <span style="color: #009900">App</span> <span style="color: #009900">Deb</span> <span style="color: #009900">Deb</span> <span style="font-weight: bold"><span style="color: #0000FF">deriving</span></span> <span style="color: #009900">Show</span>
<span style="font-weight: bold"><span style="color: #0000FF">infixl</span></span> <span style="color: #993399">5</span> <span style="color: #990000">:#</span>
<span style="font-weight: bold"><span style="color: #0000FF">data</span></span> <span style="color: #009900">Com</span> <span style="color: #990000">=</span> <span style="color: #009900">Com</span> <span style="color: #990000">:#</span> <span style="color: #009900">Com</span> <span style="color: #990000">|</span> <span style="color: #009900">S</span> <span style="color: #990000">|</span> <span style="color: #009900">I</span> <span style="color: #990000">|</span> <span style="color: #009900">C</span> <span style="color: #990000">|</span> <span style="color: #009900">K</span> <span style="color: #990000">|</span> <span style="color: #009900">B</span> <span style="color: #990000">|</span> <span style="color: #009900">Sn</span> <span style="color: #009900">Int</span> <span style="color: #990000">|</span> <span style="color: #009900">Bn</span> <span style="color: #009900">Int</span> <span style="color: #990000">|</span> <span style="color: #009900">Cn</span> <span style="color: #009900">Int</span>
ski <span style="color: #990000">::</span> <span style="color: #009900">Deb</span> <span style="color: #990000">-></span> <span style="color: #990000">(</span><span style="color: #009900">Int</span><span style="color: #990000">,</span> <span style="color: #009900">Com</span><span style="color: #990000">)</span>
ski deb <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">case</span></span> deb <span style="font-weight: bold"><span style="color: #0000FF">of</span></span>
<span style="color: #009900">Zero</span> <span style="color: #990000">-></span> <span style="color: #990000">(</span><span style="color: #993399">1</span><span style="color: #990000">,</span> <span style="color: #009900">I</span><span style="color: #990000">)</span>
<span style="color: #009900">Succ</span> d <span style="color: #990000">|</span> x<span style="color: #990000">@(</span>n<span style="color: #990000">,</span> <span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">)</span> <span style="color: #990000"><-</span> ski d <span style="color: #990000">-></span> <span style="color: #990000">(</span>n <span style="color: #990000">+</span> <span style="color: #993399">1</span><span style="color: #990000">,</span> f <span style="color: #990000">(</span><span style="color: #993399">0</span><span style="color: #990000">,</span> <span style="color: #009900">K</span><span style="color: #990000">)</span> x<span style="color: #990000">)</span>
<span style="color: #009900">App</span> d1 d2 <span style="color: #990000">|</span> x<span style="color: #990000">@(</span>a<span style="color: #990000">,</span> <span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">)</span> <span style="color: #990000"><-</span> ski d1
<span style="color: #990000">,</span> y<span style="color: #990000">@(</span>b<span style="color: #990000">,</span> <span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">)</span> <span style="color: #990000"><-</span> ski d2 <span style="color: #990000">-></span> <span style="color: #990000">(</span>max a b<span style="color: #990000">,</span> f x y<span style="color: #990000">)</span>
<span style="color: #009900">Lam</span> d <span style="color: #990000">|</span> <span style="color: #990000">(</span>n<span style="color: #990000">,</span> e<span style="color: #990000">)</span> <span style="color: #990000"><-</span> ski d <span style="color: #990000">-></span> <span style="font-weight: bold"><span style="color: #0000FF">case</span></span> n <span style="font-weight: bold"><span style="color: #0000FF">of</span></span>
<span style="color: #993399">0</span> <span style="color: #990000">-></span> <span style="color: #990000">(</span><span style="color: #993399">0</span><span style="color: #990000">,</span> <span style="color: #009900">K</span> <span style="color: #990000">:#</span> e<span style="color: #990000">)</span>
<span style="font-weight: bold"><span style="color: #0000FF">_</span></span> <span style="color: #990000">-></span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> <span style="color: #993399">1</span><span style="color: #990000">,</span> e<span style="color: #990000">)</span>
<span style="font-weight: bold"><span style="color: #0000FF">where</span></span>
f <span style="color: #990000">(</span>a<span style="color: #990000">,</span> x<span style="color: #990000">)</span> <span style="color: #990000">(</span>b<span style="color: #990000">,</span> y<span style="color: #990000">)</span> <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">case</span></span> <span style="color: #990000">(</span>a<span style="color: #990000">,</span> b<span style="color: #990000">)</span> <span style="font-weight: bold"><span style="color: #0000FF">of</span></span>
<span style="color: #990000">(</span><span style="color: #993399">0</span><span style="color: #990000">,</span> <span style="color: #993399">0</span><span style="color: #990000">)</span> <span style="color: #990000">-></span> x <span style="color: #990000">:#</span> y
<span style="color: #990000">(</span><span style="color: #993399">0</span><span style="color: #990000">,</span> n<span style="color: #990000">)</span> <span style="color: #990000">-></span> <span style="color: #009900">Bn</span> n <span style="color: #990000">:#</span> x <span style="color: #990000">:#</span> y
<span style="color: #990000">(</span>n<span style="color: #990000">,</span> <span style="color: #993399">0</span><span style="color: #990000">)</span> <span style="color: #990000">-></span> <span style="color: #009900">Cn</span> n <span style="color: #990000">:#</span> x <span style="color: #990000">:#</span> y
<span style="color: #990000">(</span>n<span style="color: #990000">,</span> m<span style="color: #990000">)</span> <span style="color: #990000">|</span> n <span style="color: #990000">==</span> m <span style="color: #990000">-></span> <span style="color: #009900">Sn</span> n <span style="color: #990000">:#</span> x <span style="color: #990000">:#</span> y
<span style="color: #990000">|</span> n <span style="color: #990000"><</span> m <span style="color: #990000">-></span> <span style="color: #009900">Bn</span> <span style="color: #990000">(</span>m <span style="color: #990000">-</span> n<span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #990000">(</span><span style="color: #009900">Sn</span> n <span style="color: #990000">:#</span> x<span style="color: #990000">)</span> <span style="color: #990000">:#</span> y
<span style="color: #990000">|</span> otherwise <span style="color: #990000">-></span> <span style="color: #009900">Cn</span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> m<span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #990000">(</span><span style="color: #009900">Bn</span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> m<span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #009900">Sn</span> m <span style="color: #990000">:#</span> x<span style="color: #990000">)</span> <span style="color: #990000">:#</span> y</tt></pre></div></div>
<div class="paragraph"><p>Our <span class="monospaced">ski</span> function returns an integer and a combinatory logic term equivalent
to the input lambda term. The integer is zero if the given lambda term is
closed; for an open term, it’s the number of lambdas needed to close it.</p></div>
<div class="paragraph"><p>It uses bulk variants of the B, C, and S combinators:</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.8
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt>linBulk <span style="color: #990000">::</span> <span style="color: #009900">Com</span> <span style="color: #990000">-></span> <span style="color: #009900">Com</span>
linBulk b <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">case</span></span> b <span style="font-weight: bold"><span style="color: #0000FF">of</span></span>
<span style="color: #009900">Bn</span> n <span style="color: #990000">-></span> iterate <span style="color: #990000">((</span><span style="color: #009900">B</span><span style="color: #990000">:#</span> <span style="color: #009900">B</span><span style="color: #990000">):#)</span> <span style="color: #009900">B</span> <span style="color: #990000">!!</span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> <span style="color: #993399">1</span><span style="color: #990000">)</span>
<span style="color: #009900">Cn</span> n <span style="color: #990000">-></span> iterate <span style="color: #990000">((</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">C</span><span style="color: #990000">):#</span><span style="color: #009900">B</span><span style="color: #990000">):#)</span> <span style="color: #009900">C</span> <span style="color: #990000">!!</span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> <span style="color: #993399">1</span><span style="color: #990000">)</span>
<span style="color: #009900">Sn</span> n <span style="color: #990000">-></span> iterate <span style="color: #990000">((</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">S</span><span style="color: #990000">):#</span><span style="color: #009900">B</span><span style="color: #990000">):#)</span> <span style="color: #009900">S</span> <span style="color: #990000">!!</span> <span style="color: #990000">(</span>n <span style="color: #990000">-</span> <span style="color: #993399">1</span><span style="color: #990000">)</span>
x <span style="color: #990000">:#</span> y <span style="color: #990000">-></span> linBulk x <span style="color: #990000">:#</span> linBulk y
<span style="font-weight: bold"><span style="color: #0000FF">_</span></span> <span style="color: #990000">-></span> b</tt></pre></div></div>
<div class="paragraph"><p>Linear complexity depends on memoizing these bulk combinators. If memoization
is undesirable, we can
<a href="https://crypto.stanford.edu/~blynn/lambda/logski.html">replace each bulk
combinator of order n with O(log n) ordinary combinators</a>.</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.8
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt>logBulk <span style="color: #990000">::</span> <span style="color: #009900">Com</span> <span style="color: #990000">-></span> <span style="color: #009900">Com</span>
logBulk b <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">case</span></span> b <span style="font-weight: bold"><span style="color: #0000FF">of</span></span>
<span style="color: #009900">Bn</span> n <span style="color: #990000">-></span> go n <span style="color: #990000">(</span><span style="color: #009900">K</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #009900">B</span> <span style="color: #990000">:#</span> <span style="color: #009900">I</span>
<span style="color: #009900">Cn</span> n <span style="color: #990000">-></span> go n <span style="color: #990000">(</span><span style="color: #009900">K</span><span style="color: #990000">:#(</span><span style="color: #009900">C</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">))</span> <span style="color: #990000">:#</span> <span style="color: #990000">(</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">C</span><span style="color: #990000">):#</span><span style="color: #009900">B</span><span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #009900">I</span>
<span style="color: #009900">Sn</span> n <span style="color: #990000">-></span> go n <span style="color: #990000">(</span><span style="color: #009900">K</span><span style="color: #990000">:#(</span><span style="color: #009900">C</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">))</span> <span style="color: #990000">:#</span> <span style="color: #990000">(</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">S</span><span style="color: #990000">):#</span><span style="color: #009900">B</span><span style="color: #990000">)</span> <span style="color: #990000">:#</span> <span style="color: #009900">I</span>
x <span style="color: #990000">:#</span> y <span style="color: #990000">-></span> logBulk x <span style="color: #990000">:#</span> logBulk y
<span style="font-weight: bold"><span style="color: #0000FF">_</span></span> <span style="color: #990000">-></span> b
<span style="font-weight: bold"><span style="color: #0000FF">where</span></span>
go n base <span style="color: #990000">=</span> foldr <span style="color: #990000">(:#)</span> base <span style="color: #990000">$</span> <span style="color: #990000">([</span>b0<span style="color: #990000">,</span> b1<span style="color: #990000">]!!)</span> <span style="color: #990000"><$></span> bits <span style="color: #990000">[]</span> n
bits acc <span style="color: #993399">0</span> <span style="color: #990000">=</span> reverse acc
bits acc n <span style="color: #990000">|</span> <span style="color: #990000">(</span>q<span style="color: #990000">,</span> r<span style="color: #990000">)</span> <span style="color: #990000"><-</span> divMod n <span style="color: #993399">2</span> <span style="color: #990000">=</span> bits <span style="color: #990000">(</span>r<span style="color: #990000">:</span>acc<span style="color: #990000">)</span> q
b0 <span style="color: #990000">=</span> <span style="color: #009900">C</span><span style="color: #990000">:#</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">S</span><span style="color: #990000">:#</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">)</span>
b1 <span style="color: #990000">=</span> <span style="color: #009900">C</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">S</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">B</span><span style="color: #990000">):#(</span><span style="color: #009900">C</span><span style="color: #990000">:#</span><span style="color: #009900">B</span><span style="color: #990000">:#(</span><span style="color: #009900">S</span><span style="color: #990000">:#</span><span style="color: #009900">B</span><span style="color: #990000">:#</span><span style="color: #009900">I</span><span style="color: #990000">)))):#</span><span style="color: #009900">B</span></tt></pre></div></div>
<div class="paragraph"><p>For example:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>λ print $ logBulk $ Sn 1234
CB(SBI)(C(BS(B(BB)(CB(SBI))))B(CB(SBI)(CB(SBI)(C(BS(B(BB)(CB
(SBI))))B(CB(SBI)(C(BS(B(BB)(CB(SBI))))B(C(BS(B(BB)(CB(SBI))
))B(CB(SBI)(CB(SBI)(C(BS(B(BB)(CB(SBI))))B(K(CII))))))))))))
(B(BS)B)I
λ print $ logBulk $ Bn 1234
CB(SBI)(C(BS(B(BB)(CB(SBI))))B(CB(SBI)(CB(SBI)(C(BS(B(BB)(CB
(SBI))))B(CB(SBI)(C(BS(B(BB)(CB(SBI))))B(C(BS(B(BB)(CB(SBI))
))B(CB(SBI)(CB(SBI)(C(BS(B(BB)(CB(SBI))))B(KI)))))))))))BI</pre>
</div></div>
<div class="paragraph"><p>For completeness, we include our pretty-printing code:</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.8
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span style="font-weight: bold"><span style="color: #0000FF">instance</span></span> <span style="color: #009900">Show</span> <span style="color: #009900">Com</span> <span style="font-weight: bold"><span style="color: #0000FF">where</span></span>
show <span style="color: #009900">S</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"S"</span>
show <span style="color: #009900">I</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"I"</span>
show <span style="color: #009900">C</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"C"</span>
show <span style="color: #009900">K</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"K"</span>
show <span style="color: #009900">B</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"B"</span>
show <span style="color: #990000">(</span>l <span style="color: #990000">:#</span> r<span style="color: #990000">@(</span><span style="font-weight: bold"><span style="color: #0000FF">_</span></span> <span style="color: #990000">:#</span> <span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">))</span> <span style="color: #990000">=</span> show l <span style="color: #990000">++</span> <span style="color: #FF0000">"("</span> <span style="color: #990000">++</span> show r <span style="color: #990000">++</span> <span style="color: #FF0000">")"</span>
show <span style="color: #990000">(</span>l <span style="color: #990000">:#</span> r<span style="color: #990000">)</span> <span style="color: #990000">=</span> show l <span style="color: #990000">++</span> show r
show <span style="color: #990000">(</span><span style="color: #009900">Bn</span> n<span style="color: #990000">)</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"B_"</span> <span style="color: #990000">++</span> show n
show <span style="color: #990000">(</span><span style="color: #009900">Cn</span> n<span style="color: #990000">)</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"C_"</span> <span style="color: #990000">++</span> show n
show <span style="color: #990000">(</span><span style="color: #009900">Sn</span> n<span style="color: #990000">)</span> <span style="color: #990000">=</span> <span style="color: #FF0000">"S_"</span> <span style="color: #990000">++</span> show n</tt></pre></div></div>
<div class="paragraph"><p>In other words, we can easily rewrite a lambda term of length N as a
combinatory logic term of length O(N log N).</p></div>
<div class="paragraph"><p><a href="https://www.youtube.com/watch?v=zhj_tUMwTe0">Edward Kmett outlines a different
approach (about 21 minutes into the video)</a> though the details are so
"horrific" that even he has yet to work through them. By the way, the slides
from this talk are packed with excellent references on combinators.</p></div>
<div class="paragraph"><p>Kiselyov laments bracket abstraction has "many descriptions and explanations
and blogs", all of which take a syntactic approach. I’m one of the guilty
parties, and hope to redeem myself with this post. Also, I rewrote
<a href="https://crypto.stanford.edu/~blynn/lambda/crazyl.html">one of my toy compilers</a>
to demonstrate another algorithm from Kiselyov’s paper. Though not linear, the
algorithm avoids bulk combinators and often produces short and sweet programs.</p></div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-57848121811306225042018-06-26T09:43:00.000-07:002018-08-31T04:33:02.566-07:00Why Laziness Matters<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Should a programming language be lazy by default?
<a href="https://existentialtype.wordpress.com/2011/04/24/the-real-point-of-laziness/">Robert Harper says no</a>.
<a href="http://augustss.blogspot.com/2011/05/more-points-for-lazy-evaluation-in.html">Lennart Augustsson says yes</a>.
No matter who is right, I say all computer scientists should become fluent
in a lazy language, whether or not they speak it in daily life.</p>
</div>
<div class="paragraph">
<p>My evidence is <a href="https://research.swtch.com/yaccalive">a post by Russ Cox on
parsing with derivatives</a>: a very experienced programmer very convincingly
argues why a parsing algorithm has exponential time complexity. But the claims
are very wrong; <a href="https://arxiv.org/abs/1604.04695">Adams, Hollenbeck, and Might
proved the algorithm is cubic</a>.</p>
</div>
<div class="paragraph">
<p>How did he err so badly? Did he underestimate the power of lazy evaluation?</p>
</div>
<div class="paragraph">
<p>I once exclusively wrote eager code, and I imagine my younger self would have
agreed with his analysis without a second thought. Today I know better.
Marvel at these
<a href="http://www.cs.dartmouth.edu/~doug/powser.html">lines by Doug McIlroy</a>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>int fs = 0 : zipWith (/) fs [1..] -- integral from 0 to x
sins = int coss
coss = 1 - int sins</pre>
</div>
</div>
<div class="paragraph">
<p>It seems too good to be true. Indistinguishable from magic perhaps. But somehow
it all works when lazily evaluated. Beware of summarily dismissing lazy code
because it looks implausibly amazing.</p>
</div>
<div class="paragraph">
<p>Also consider <a href="https://swtch.com/~rsc/regexp/regexp1.html">an earlier article
by Cox on regular expressions</a>. Again, a very experienced
programmer very convincingly argues why a parsing algorithm has exponential
time complexity. In this post, however, the claims are solid, and backed up by
graphs of running times. (It’s worth reading by the way: it tells the tragedy
of how popular regular expression implementations became sluggish twisted
mockeries of true regular expressions, while offering hope for the future. My
only criticism is it fails to mention
<a href="https://crypto.stanford.edu/~blynn/haskell/re.html">regular expression
derivatives</a>.)</p>
</div>
<div class="paragraph">
<p>Why does the erroneous post lack similar graphs? Why didn’t the author throw
some code together and benchmark it to produce damning evidence?</p>
</div>
<div class="paragraph">
<p>Perhaps he thought it was too tedious. This would imply unfamiliarity with
lazy languages, because prototyping
<a href="http://matt.might.net/papers/might2011derivatives.pdf">parsing with derivatives</a>
in Haskell is easier than criticizing it.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_preliminaries">Preliminaries</h2>
<div class="sectionbody">
<div class="paragraph">
<p>We define a <code>Pe</code> data structure to represent parsing expressions, that is, the
right-hand side of the production rules of a grammar.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">import Control.Arrow
import Control.Monad.State
import qualified Data.Map as M
import qualified Data.Set as S
-- NT = non-terminal. (:.) = concatenation.
data Pe = NT String | Eps Char | Nul | Ch Char | Or [Pe] | Pe :. Pe | Del Pe</code></pre>
</div>
</div>
<div class="paragraph">
<p>Although it represents the empty string, the <code>Eps</code> (for epsilon) expression
holds a character that winds up in the abstract syntax tree (AST) returned by
the parser. Similarly, the <code>Del</code> (for delta) expression, which is only generated
internally, holds an expression which later helps build an AST.</p>
</div>
<div class="paragraph">
<p>A context-free grammar maps non-terminal symbols to parsing expressions:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">type Grammar = M.Map String Pe</code></pre>
</div>
</div>
<div class="paragraph">
<p>Our ASTs are full binary trees whose leaf nodes are characters
(<a href="https://en.wikipedia.org/wiki/Magma_(algebra)#Free_magma">the free magma on the
alphabet</a>). The tree structure captures the order the production rules are
applied.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">data Ast = Bad | Lf Char | Ast :@ Ast deriving Show
isBad :: Ast -> Bool
isBad Bad = True
isBad _ = False</code></pre>
</div>
</div>
<div class="paragraph">
<p>The <code>Bad</code> AST is returned for unparseable strings. An alternative is to drop
<code>Bad</code> and replace <code>Ast</code> with <code>Maybe Ast</code> throughout our code.</p>
</div>
<div class="paragraph">
<p>A fancier parser might return a parse forest, that is, all parse
trees for a given input. Ours simply settles on one parse tree.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_parsing_with_derivatives">Parsing with derivatives</h2>
<div class="sectionbody">
<div class="paragraph">
<p>To parse an input string, we first take successive derivatives of the start
symbol with respect to each character of the input, taking care to leave bread
crumbs in the <code>Eps</code> and <code>Del</code> expressions to record consumed characters. (The
<code>Del</code> constructor is named for the delta symbol from the paper, but I also
think of it as "deleted", because it remembers what has just been deleted from
the input.)</p>
</div>
<div class="paragraph">
<p>Then the string is accepted if and only if the resulting expression is
nullable, that is, accepts the empty string. As we traverse the expression to
determine nullability, we also build an AST to return.</p>
</div>
<div class="paragraph">
<p>We memoize derivatives by adding entries to a state of type <code>Grammar</code>.
Initially, this cache contains only the input grammar, mapping nonterminal
symbols to <code>Pe</code> values. Later, we place a derivative at the key formed by
concatenating the characters involved in the derivative with the
nonterminal symbol being derived.</p>
</div>
<div class="paragraph">
<p>For example, if <code>S</code> is a nonterminal in the input grammar, then
<code>abS</code> maps to <code>derive 'a' (derive 'b' (NT "S"))</code>. We assume no nonterminal
symbol in the input grammar is a suffix of any other nonterminal symbol, which
is fine for a prototype.</p>
</div>
<div class="paragraph">
<p>It may help to imagine the grammar growing over time, gaining new production
rules as we process input characters. Indeed, we consider nonterminals to refer
to both nonterminals in the input grammar as well as their derivatives.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">parse :: Grammar -> String -> String -> Ast
parse g start s = evalState (parseNull $ NT $ reverse s ++ start) g</code></pre>
</div>
</div>
<div class="paragraph">
<p>Computing nullability requires finding a least fixed point. I found this
the toughest part of the algorithm, partly because they never taught fixed
point theory when I was in school. For some reason, the method reminds me of
<a href="https://en.wikipedia.org/wiki/DFA_minimization">Hopcroft’s algorithm to minimize
a DFA</a>, where we repeatedly refine a partition until we reach a stable answer.</p>
</div>
<div class="paragraph">
<p>We initially guess each nonterminal is not nullable, which means it
corresponds to the <code>Bad</code> AST. On encountering a nonterminal, if we’ve already
seen it, then return our guess for that nonterminal. Otherwise, it’s the first
time we’ve seen it and instead of guessing, we recursively traverse its
corresponding expression. In doing so, we may discover our guess is wrong, so
we correct it if necessary before returning an AST.</p>
</div>
<div class="paragraph">
<p>We repeat until our guesses stabilize.
Guesses never change from a good AST to <code>Bad</code>, and the map of all
guesses only changes if a guess is revised from <code>Bad</code> to a good AST.
We exploit these facts to simplify our code slightly.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">parseNull :: Pe -> State Grammar Ast
parseNull pe = leastFix M.empty where
leastFix guessed = do
(b, (_, guessed')) <- runStateT (visit pe) (S.empty, guessed)
if M.size guessed == M.size guessed' then pure b else leastFix guessed'
visit :: Pe -> StateT (S.Set String, M.Map String Ast) (State Grammar) Ast
visit pe = case pe of
Eps x -> pure $ Lf x
Del x -> visit x
Nul -> pure Bad
Ch _ -> pure Bad
Or xs -> chainsaw <$> mapM visit xs
x :. y -> mul <$> visit x <*> visit y
NT s -> do
(seen, guessed) <- get
case () of
() | Just x <- M.lookup s guessed -> pure x
| S.member s seen -> pure Bad
| otherwise -> do
modify $ first $ S.insert s
b <- visit =<< lift (memoDerive s)
unless (isBad b) $ modify $ second $ M.insert s b
pure b
mul :: Ast -> Ast -> Ast
mul Bad _ = Bad
mul _ Bad = Bad
mul x y = x :@ y
-- | Helps cut a non-empty parse forest down to one tree.
chainsaw :: [Ast] -> Ast
chainsaw xs | null xs' = Bad
| otherwise = head xs'
where xs' = filter (not . isBad) xs</code></pre>
</div>
</div>
<div class="paragraph">
<p>Memoized derivatives are straightforward. For computing derivatives, we
translate the rules given in the paper, and for memoization, on discovering
a missing entry, we insert a knot-tying value before recursing, and replace
it with the result of the recursion afteward.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">memoDerive :: String -> State Grammar Pe
memoDerive cs@(c:s) = do
m <- get
unless (M.member cs m) $ do
modify $ M.insert cs $ NT cs
d <- derive c =<< memoDerive s
modify $ M.insert cs d
gets (M.! cs)
memoDerive _ = error "unreachable"
derive :: Char -> Pe -> State Grammar Pe
derive c pe = case pe of
NT s -> pure $ NT $ c:s
Ch x | x == c -> pure $ Eps x
Or xs -> Or <$> mapM (derive c) xs
Del x :. y -> (Del x :.) <$> derive c y
x :. y -> do
b <- parseNull x
dx <- derive c x
if isBad b then pure $ dx :. y else do
dy <- derive c y
pure $ Or [dx :. y, Del x :. dy]
_ -> pure Nul</code></pre>
</div>
</div>
<div class="paragraph">
<p>Here’s the grammar that Cox claims will grind our parser to a halt:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">cox :: Grammar
cox = M.fromList
[ ("S", NT "T")
, ("T", Or [NT "T" :. (Ch '+' :. NT "T"), NT "N"])
, ("N", Ch '1')
]</code></pre>
</div>
</div>
<div class="paragraph">
<p>Let’s try it on a small input in an interactive interpreter:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>parse cox "S" "1+1+1"</pre>
</div>
</div>
<div class="paragraph">
<p>The parser picks a particular parse tree:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>(Lf '1' :@ (Lf '+' :@ Lf '1')) :@ (Lf '+' :@ Lf '1')</pre>
</div>
</div>
<div class="paragraph">
<p>How about all strings of length 7 consisting of <code>1</code> or <code>+</code>?</p>
</div>
<div class="listingblock">
<div class="content">
<pre>filter (not . isBad . parse cox "S") $ replicateM 7 "+1"</pre>
</div>
</div>
<div class="paragraph">
<p>Thankfully, we get:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>["1+1+1+1"]</pre>
</div>
</div>
<div class="paragraph">
<p>At last, it’s time to demolish Cox’s claims.
We parse an 80-character input with a typo near the end:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">main :: IO ()
main = print $ parse cox "S" $ concat (replicate 39 "1+") ++ "+1"</code></pre>
</div>
</div>
<div class="paragraph">
<p>Our prototype is awful. We really should:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Add a slimmed down version of <code>parseNull</code> that returns a boolean instead of
an AST, and call this in <code>derive</code>. We only want to recover the AST once the
whole string has been parsed; the rest of the time, we only care whether an
expression is nullable.</p>
</li>
<li>
<p>Use a better algorithm for finding the least fixed point. We’ve perhaps
chosen the clunkiest and most obvious method.</p>
</li>
<li>
<p>Remove a layer of indirection when tying the knot. That is, point to
a node directly rather than a string (which requires another lookup to get at
the node).</p>
</li>
<li>
<p>Apply algebraic identities to reduce the number of nodes in parsing
expressions and abstract syntax trees.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>And yet, on my laptop:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>Bad
real 0m0.220s
user 0m0.215s
sys 0m0.005s</pre>
</div>
</div>
<div class="paragraph">
<p>Clearly, parsing with derivatives is efficient when run on the allegedly
exponential-running-time example given by Cox.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_the_moral_of_the_story">The moral of the story</h2>
<div class="sectionbody">
<div class="paragraph">
<p>It’s best to test drive an algorithm before condemning it.
If we see hilariously bad running times, then we can include them to hammer our
points home. If we see surprisingly good running times, then there’s a mistake
in our reasoning and we should keep quiet until we successfully attack the
algorithm from another angle. (Cox rightly notes parsing with derivatives
forgoes two key properties of yacc: linear running time and ambiguity
detection. If only he had focused on these trade-offs.)</p>
</div>
<div class="paragraph">
<p>Is this practicable for parsing with derivatives? Well, we have presented an
entire program, yet we have written less code than appears in
<a href="https://swtch.com/~rsc/regexp/regexp1.html">Cox’s excellent article on regular
expressions</a>, which quotes just a few choice cuts from a presumably complete
program. Indeed, with a splash of HTML, we can easily
<a href="https://crypto.stanford.edu/~blynn/haskell/pwd.html">build an
interactive online demo of parsing with derivatives</a>.</p>
</div>
<div class="paragraph">
<p>The existence of the flawed post indicates no such sanity check was done.
This was caused by poor understanding of lazy evaluation, or because it was
deemed too troublesome to implement a lazy algorithm. Both problems are solved
by learning a lazy language.</p>
</div>
<div class="paragraph">
<p>In sum, insufficient experience with lazy evaluation leads to faulty time
complexity analysis. Therefore we should all be comfortable with lazy languages
so computer science can progress unimpeded.</p>
</div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com2tag:blogger.com,1999:blog-4222267598459829544.post-62561076287115412542018-06-04T11:53:00.001-07:002018-06-24T14:40:40.397-07:00Regex Derivatives<script type="text/x-mathjax-config">
MathJax.Hub.Config({
messageStyle: "none",
tex2jax: {
inlineMath: [["\\(", "\\)"]],
displayMath: [["\\[", "\\]"]],
ignoreClass: "nostem|nolatexmath"
},
asciimath2jax: {
delimiters: [["\\$", "\\$"]],
ignoreClass: "nostem|noasciimath"
},
TeX: { equationNumbers: { autoNumber: "none" } }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.6.0/MathJax.js?config=TeX-MML-AM_HTMLorMML"></script>
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p>Like many of my generation, I was taught to use
<a href="https://en.wikipedia.org/wiki/Thompson%27s_construction">Thompson’s
construction</a> to convert a regular expression to a
deterministic finite automaton (DFA).
Namely, we draw tiny graphs for each component of a given
regular expression, and stitch them together to form a nondeterministic
finite automaton (NFA), which we then convert to a DFA.</p>
</div>
<div class="paragraph">
<p>The ideas are interesting. Sadly, there is no other reason to study them,
because there’s a simpler approach that:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Constructs a DFA directly from a regular expression. Forget NFAs.</p>
</li>
<li>
<p>Supports richer regular expressions. Behold, logical
AND and NOT: <code>[a-z]+&!(do|for|if|while)</code></p>
</li>
<li>
<p>Immediately obtains smaller and often minimal DFAs in realistic
applications.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>All this is expertly explained in
<a href="http://www.ccs.neu.edu/home/turon/re-deriv.pdf"><em>Regular-expression derivatives
reexamined</em> by Owens, Reppy, and Turon</a>. To my chagrin, I only stumbled
across it recently, almost a decade after its publication. And after I had
already written a regex tool.</p>
</div>
<div class="paragraph">
<p>But it could be worse: the authors note the superior method was published over
50 years ago by Brzozowski, before being "lost in the sands of time".</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_derive_to_succeed">Derive to succeed</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Take
<a href="https://en.wikipedia.org/wiki/Regular_expression#Formal_definition">"standard"
regular expressions</a>. We have the constants:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>\$\emptyset\$: accepts nothing; the empty language.</p>
</li>
<li>
<p>\$\epsilon\$: accepts the empty string.</p>
</li>
<li>
<p>\$c\$: accepts the character \$c\$.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>and regexes built from other regexes \$r\$ and \$s\$:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>\$rs\$: the language built from all pairwise concatenations of strings in \$r\$ and strings in \$s\$.</p>
</li>
<li>
<p>\(r\mid s\): logical or (alternation); the union of the two languages.</p>
</li>
<li>
<p>\$r\mbox{*}\$: Kleene closure; zero or more strings of \$r\$ concatenated together.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Then solve two problems:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Determine if a regex accepts the empty string.</p>
</li>
<li>
<p>For a character \$c\$ and a regex \$f\$, find a regex that accepts a
string \$s\$ precisely when \$f\$ accepts \$c\$ followed by \$s\$.
For example, feeding <code>a</code> to <code>ab*c|d*e*f|g*ah</code> results in the regex
<code>b*c|h</code>.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>The first problem is little more than a reading comprehension quiz. Going down
the list, we see the answers are: no; yes; no; exactly when \$r\$ and \$s\$ do;
exactly when \$r\$ or \$s\$ do; yes.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">import Data.List
data Re = Nul | Eps | Lit Char | Kleene Re | Re :. Re | Alt [Re]
deriving (Eq, Ord)
nullable :: Re -> Bool
nullable re = case re of
Nul -> False
Eps -> True
Lit _ -> False
r :. s -> nullable r && nullable s
Alt rs -> any nullable rs
Kleene _ -> True</code></pre>
</div>
</div>
<div class="paragraph">
<p>In the second problem, the base cases remain easy: return \$\emptyset\$
except for the constant \$c\$, in which case return \$\epsilon\$.</p>
</div>
<div class="paragraph">
<p>The recursive cases are tougher. Given \(r\mid s\), solve the problem on
both alternatives to get \$r'\$ and \$s'\$ then return \(r'\mid s'\). For
\(r\mbox{*}\), return \(r'r\mbox{*}\).</p>
</div>
<div class="paragraph">
<p>The trickiest is concatenation: \$rs\$. First, determine if \$r\$ accepts the
empty string (the problem we just solved). If so, return
\(r's\mid s'\). If not, return \(r's\).</p>
</div>
<div class="paragraph">
<p>The answer to the second problem is the <em>derivative</em> of the regex \$f\$
with respect to the character \$c\$, and denoted \$\partial_c f\$.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">derive :: Char -> Re -> Re
derive c f = case f of
Nul -> Nul
Eps -> Nul
Lit a | a == c -> Eps
| otherwise -> Nul
r :. s | nullable r -> mkAlt [dc r :. s, dc s]
| otherwise -> dc r :. s
Alt rs -> mkAlt $ dc <$> rs
Kleene r -> dc r :. f
where dc = derive c</code></pre>
</div>
</div>
<div class="paragraph">
<p>For now, pretend <code>mkAlt = Alt</code>. We shall soon reveal its true definition,
why we need it, and why we represent an alternation with a list.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_the_regex_is_the_state">The regex is the state</h2>
<div class="sectionbody">
<div class="paragraph">
<p>We can now directly construct a DFA for any regex \$r\$.</p>
</div>
<div class="paragraph">
<p>Each state of our DFA corresponds to a regex. The start state is the input
regex \$r\$. For each character \$c\$, create the state \$\partial_c r\$ if
it doesn’t already exist, then draw an arrow labeled \$c\$ from \$r\$ to
\$\partial_c r\$.</p>
</div>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzGwwO5HcDLazTZRKNkj8pYPuiyFTdVk00VXt6IRjs28HU_Tq7zrsbV0Hh7mzPQmz4LYi8uSlgVDl2l4vJWLkslryTVDeAYntI6XOJYCSB7HO5oTF-grztqxFNjknfq5gDcEfWOuCHcIM/s1600/a.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzGwwO5HcDLazTZRKNkj8pYPuiyFTdVk00VXt6IRjs28HU_Tq7zrsbV0Hh7mzPQmz4LYi8uSlgVDl2l4vJWLkslryTVDeAYntI6XOJYCSB7HO5oTF-grztqxFNjknfq5gDcEfWOuCHcIM/s1600/a.png" data-original-width="219" data-original-height="56" /></a></div>
<div class="paragraph">
<p>Repeat on all newly created states. The accepting states are those which accept
the empty string. Done!</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">mkDfa :: Re -> ([Re], Re, [Re], [((Re, Re), Char)])
mkDfa r = (states, r, filter nullable states, edges) where
(states, edges) = explore ([r], []) r
explore gr q = foldl' (goto q) gr ['a'..'z']
goto q (qs, es) c | qc `elem` qs = (qs, es1)
| otherwise = explore (qc:qs, es1) qc
where qc = derive c q
es1 = ((q, qc), c):es</code></pre>
</div>
</div>
<div class="paragraph">
<p>So long as we’re mindful that the logical or operation is idempotent,
commutative, and associative, that is,
\(r\mid r = r\), \(r\mid s = s\mid r\), and
\((r\mid s)\mid t = r\mid (s\mid t)\),
the above is guaranteed to terminate.</p>
</div>
<div class="paragraph">
<p>This makes sense intuitively, because taking a derivative usually yields a
simpler regex. The glaring exception is the Kleene star, but on further
inspection, we ought to repeat ourselves eventually after taking enough
derivatives so long as we can cope with the proliferating logical ors.</p>
</div>
<div class="paragraph">
<p>We handle idempotence with <code>nub</code>, commutativity with <code>sort</code>, and
associativity by flattening lists:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-haskell" data-lang="haskell">mkAlt :: [Re] -> Re
mkAlt rs | [r] <- rs' = r
| otherwise = Alt rs'
where rs' = nub $ sort $ concatMap flatAlt rs
flatAlt (Alt as) = as
flatAlt a = [a]</code></pre>
</div>
</div>
<div class="paragraph">
<p>This ties off the loose ends mentioned above, and completes our regex compiler.
Not bad for 30 lines or so!</p>
</div>
<div class="paragraph">
<p>In practice, we apply more algebraic identities before comparing
regexes to produce smaller DFAs, which empirically are often optimal.
(Ideally, we’d like to tell if two given regexes describe the same language so
we could always generate the minimal DFA, but this is too costly.)</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_extending_regexes">Extending regexes</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Adding new features to the regex language is easy with derivatives. Given
an operation, we only need to:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Determine if it accepts the empty string.</p>
</li>
<li>
<p>Figure out the rules for its derivative.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>(We should prove the algorithm still terminates, but we’ll just eyeball it and
wave our hands.)</p>
</div>
<div class="paragraph">
<p>For example, we get the familiar \(r\mbox{+}\) by rejecting the empty string
and defining its derivative to be \(r' r\mbox{*}\). We obtain \$r?\$ by
accepting the empty string and defining its derivative to be \$r'\$. But let’s
do something more fun.</p>
</div>
<div class="paragraph">
<p>The <em>logical and</em> \$r&s\$ of regexes \$r\$ and \$s\$ accepts if and only if
both \$r\$ and \$s\$ match. Then \$r&s\$ accepts the empty string exactly when
both \$r\$ and \$s\$ do (similar to concatenation), and the derivative of
\$r&s\$ is \$r'&s'\$.</p>
</div>
<div class="paragraph">
<p>The <em>complement</em> \$!r\$ of a regex \$r\$ to accepts if and only if \$r\$
rejects. Then \$!r\$ accepts the empty string if and only if \$r\$ rejects it,
and the derivative of \$!r\$ is \$!r'\$.</p>
</div>
<div class="paragraph">
<p>For example, if we write <code>()</code> for \$\epsilon\$ then <code>!()&[a-z]*</code> is the same as
<code>[a-z]+</code>.</p>
</div>
<div class="paragraph">
<p>As before, we can plug these operations into our DFA-maker right away. Good
luck doing this with NFAs! Well, I think it’s possible if we add weird rules,
e.g. "if we can reach state A <em>and</em> state B, then we can magically reach state
C", but then they’d no longer be true NFAs.</p>
</div>
<div class="paragraph">
<p>The unfortunate, undeserved, and hopefully soon-to-be unlamented prominence of
the NFA approach are why these useful operations are considered exotic.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_regularly_express_yourself">Regularly express yourself</h2>
<div class="sectionbody">
<div class="ulist">
<ul>
<li>
<p>Read <a href="http://www.ccs.neu.edu/home/turon/re-deriv.pdf"><em>Regular-expression
derivatives reexamined</em> by Owens, Reppy, and Turon</a>.</p>
</li>
<li>
<p>If you’re teaching regexes, teach derivatives.</p>
</li>
<li>
<p>If someone is teaching you to convert regexes to DFAs via NFAs only,
ask why they aren’t teaching derivatives as well.</p>
</li>
<li>
<p>If you’re implementing regexes, use derivatives.</p>
</li>
<li>
<p><a href="https://crypto.stanford.edu/~blynn/haskell/re.html">See my online regex-to-DFA demo</a>!</p>
</li>
</ul>
</div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-84013375847368333492017-06-10T17:13:00.000-07:002017-06-10T17:13:20.994-07:00Solving the JavaScript Problem<div class="paragraph"><p><a href="https://wiki.haskell.org/The_JavaScript_Problem">The JavaScript Problem</a>
is a good problem to have.
Against the odds, “write once, run anywhere” is a reality for web browsers
because of a language governed by a standards organization.
Not so long ago, proprietary technologies such as Flash and Silverlight
threatened the openness of the web.</p></div>
<div class="paragraph"><p>So we should be grateful the JavaScript problem is merely a technical one, namely
that JavaScript is poorly designed. Though the language has improved over the
years, its numerous flaws are too deeply entrenched to remove.
<a href="https://www.slant.co/topics/1515/~solutions-to-the-javascript-problem">Transpilers
help by unlocking access to better languages</a>, but JavaScript was never
intended to be an assembly language. It’s only thanks to heroic efforts of many
talented engineers that JavaScript has gone so far.</p></div>
<div class="paragraph"><p>Ideally, the lingua franca of the web should be low-level, clean, and simple,
so we can develop in any language with little overhead.</p></div>
<div class="paragraph"><p><strong>WebAssembly</strong></p></div>
<div class="paragraph"><p><a href="http://caniuse.com/#feat=wasm">Recent versions of several popular browsers
have fulfilled our wishes with WebAssembly</a>, also known as wasm.
WebAssembly is an open standard, and well-designed. At last, works such as
<a href="https://en.wikipedia.org/wiki/Types_and_Programming_Languages">Benjamin Pierce’s
“Types and Programming Languages”</a> are mainstream enough that
<a href="https://github.com/WebAssembly/spec/blob/master/papers/pldi2017.pdf">WebAssembly
has formally specified reduction and typing rules, and even a proof of
soundness</a>. In contrast, weakly typed languages such as JavaScript
ignore a century or so of mathematical progress.</p></div>
<div class="paragraph"><p>In WebAssembly, nondeterminstic behvaviour can only arise from exhaustion,
external host functions, and the IEEE-754 floating-point standard, which fails
to specify the NaN bit pattern for all cases.
Recall in C and the many languages built upon it such as Go and Haskell,
<a href="http://blog.robertelder.org/signed-or-unsigned/">signed integer overflow causes
undefined behaviour</a>. WebAssembly fixes this by stipulating two’s complement
for negative numbers,
<a href="https://en.wikipedia.org/wiki/Signed_number_representations">as competing
representations of negative numbers are ultimately responsible for this defect
of C</a>. Endianness is similarly settled, though curiously by travelling the road
not taken by network byte order: numbers in WebAssembly are encoded in
little-endian.</p></div>
<div class="paragraph"><p>The WebAssembly virtual machine is stack-based. Years ago, I read that
<a href="http://flint.cs.yale.edu/jvmsem/doc/inferno.ps">register-based virtual machines
are faster</a>, but perhaps these results are now obsolete. Browsing briefly, I
found newer papers:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="https://arxiv.org/abs/1611.00467">A Performance Survey on Stack-based and
Register-based Virtual Machines</a>
</p>
</li>
<li>
<p>
<a href="https://www.usenix.org/legacy/events/vee05/full_papers/p153-yunhe.pdf">Virtual
Machine Showdown: Stack Versus Registers</a>
</p>
</li>
</ul></div>
<div class="paragraph"><p>It’s the same old story. Register-based virtual machines are still faster after
all. It seems WebAssembly prioritizes code size, and trusts browsers will ship
with good JIT compilers.</p></div>
<div class="paragraph"><p><strong>Toy Compilers</strong></p></div>
<div class="paragraph"><p>Online demonstrations of WebAssembly compilers are fun to build, and fun to
describe: I compiled a Haskell program to JavaScript that when executed, reads
a Haskell-like program and compiles it to WebAssembly, which some JavaScript
loads and executes.</p></div>
<div class="paragraph"><p>Perhaps it’s easier to invite the reader to:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/wasm.html">Type an expression and
click a button to compile it to WebAssembly and run it</a>.
</p>
</li>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/sk.html">Try a WebAssembly compiler
for a Turing-complete language (laziiy evaluated lambda calculus)</a>.
</p>
</li>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/crazyl.html">Play with a
family of compilers for Lazy K and similar languages</a>.
</p>
</li>
</ul></div>
<div class="paragraph"><p>For the last 2 programs, I transformed the source to a tree of S and K
combinators, so apart from graph reduction, I only had to code 2 combinators in
assembly. The resulting binaries are excruciatingly slow, especially since
numbers are Church-encoded, but it all seems to work.</p></div>
<div class="paragraph"><p>I look forward to a Haskell compiler that produces efficient WebAssembly,
though it may have to wait until
<a href="http://webassembly.org/docs/future-features/">WebAssembly gains a few more
features, such as threads and tail calls</a>.</p></div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com1tag:blogger.com,1999:blog-4222267598459829544.post-85197040837191189752017-04-02T12:10:00.000-07:002017-04-02T13:04:25.617-07:00Lambda Calculus Surprises
<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p>Much time has passed since my last entry. I’ve been frantically filling gaps in
my education, so I’ve had little to say here. My notes are better off on
<a href="http://crypto.stanford.edu/~blynn/">my homepage</a>, where I can better organize them, and incorporate interactive demos.</p></div>
<div class="paragraph"><p>However, I want to draw attention to delightful surprises that seem unfairly
obscure.</p></div>
<div class="paragraph"><p><strong>1. Succinct Turing-complete self-interpreters</strong></p></div>
<div class="paragraph"><p><a href="http://www-formal.stanford.edu/jmc/recursive.html">John McCarthy’s classic
paper</a> showed how to write a Lisp interpreter in Lisp itself. By adding a
handful of primitives (quote, atom, eq, car, cdr, cons,
cond) to lambda calculus, we get a Turing-complete language where a
self-interpreter is easy to write and understand. For contrast, see
<a href="https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf">Turing’s
universal machine of 1936</a>.</p></div>
<div class="paragraph"><p>Researchers have learned more about lambda calculus since 1960, but many
resources seem stuck in the past.
<a href="http://matt.might.net/articles/implementing-a-programming-language/">Writing a
Turing-complete interpreter in 7 lines</a> is ostensibly still a big deal.
<a href="http://www.paulgraham.com/rootsoflisp.html"><em>The Roots of Lisp</em> by Paul Graham</a>
praises McCarthy’s self-interpreter but explores no further.
<a href="https://www.cs.auckland.ac.nz/~chaitin/lm.html"><em>The Limits of Mathematics</em>
by Gregory Chaitin</a> chooses Lisp over plain lambda calculus for dubious reasons.
Perhaps <a href="http://homepages.inf.ed.ac.uk/wadler/topics/history.html">McCarthy’s
work is so life-changing</a> that some find it hard to notice new advances.</p></div>
<div class="paragraph"><p><a href="https://tromp.github.io/cl/cl.html">John Tromp’s fascinating website</a> is a
refreshing exception. It led me to
<a href="http://repository.readscheme.org/ftp/papers/topps/D-128.pdf">a paper by
Mogensen, who found a one-line self-interpreter without adding anything to
lambda calculus</a>:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>(f.(x.f(xx))(x.f(xx)))(em.m(x.x)(mn.em(en))(mv.e(mv)))</pre>
</div></div>
<div class="paragraph"><p>(I’ve suppressed the lambdas. Exercise: write a regex substitution that
restores them.)</p></div>
<div class="paragraph"><p>In fact, under some definitions, the program
“λq.q(λx.x)” is a self-interpreter.</p></div>
<div class="paragraph"><p><strong>2. Hindley-Milner sort</strong></p></div>
<div class="paragraph"><p><a href="https://en.wikipedia.org/wiki/Types_and_Programming_Languages"><em>Types
and Programming Languages</em> (TaPL)</a> by Benjamin C. Pierce is a gripping
action thriller. Types are the heroes, and we follow their epic struggle
against the most ancient and powerful foes of computer science and mathematics.</p></div>
<div class="paragraph"><p>When we first meet them, types are humble guardians of a barebones language
that can only express the simplest of computations involving booleans and
natural numbers. As the story progresses, types gain additional abilities,
enabling them to protect more powerful languages.</p></div>
<div class="paragraph"><p>However, there seems to be a plot hole when types level up from Hindley-Milner
to System F. As a “nice demonstration of the expressive power of pure System
F”, the book mentions a program that can sort lists.</p></div>
<div class="paragraph"><p>The details are left as an exercise to the reader. Working through them, we
realize a Hindley-Milner type system is already powerful enough to sort lists.
Moreover, the details are far more pleasant in Hindley-Milner because we avoid
the ubiquitous type spam of System F.</p></div>
<div class="paragraph"><p>System F is indeed more powerful than Hindley-Milner and deserves admiration,
but because of well-typed self-application and polymorphic identity
functions, existential types, and other gems; not because lists can be sorted.</p></div>
<div class="paragraph"><p><strong>3. Self-interpreters for total languages</strong></p></div>
<div class="paragraph"><p>They said it couldn’t be done.</p></div>
<div class="paragraph"><p>According to <a href="http://compilers.cs.ucla.edu/popl16/popl16-full.pdf"><em>Breaking
Through the Normalization Barrier: A Self-Interpreter for F-omega</em> by Matt
Brown and Jens Palsberg</a>, “several books, papers, and web pages” assert
self-interpreters for a strongly normalizing lambda calculus are
impossible. The paper then shows that reports of their non-existence have
been greatly exaggerated.</p></div>
<div class="paragraph"><p>Indeed, famed researcher
<a href="https://en.wikipedia.org/wiki/Robert_Harper_(computer_scientist)">Robert
Harper</a> writes on
<a href="https://existentialtype.wordpress.com/2014/03/20/old-neglected-theorems-are-still-theorems/">his
blog</a> that “one limitation of total programming languages is that they are not
universal: you cannot write an interpreter for T within T (see Chapter 9 of
PFPL for a proof).”, and as of now (April 2017),
<a href="https://en.wikipedia.org/wiki/Normalization_property_(abstract_rewriting)">the
Wikipedia article they cite</a> still declares “it is impossible to define a
self-interpreter in any of the calculi cited above”, referring to simply typed
lambda calculus, System F, and the calculus of constructions.</p></div>
<div class="paragraph"><p>I was shocked. Surely academics are proficient with diagonalization by now?
Did they all overlook a hole in their proofs?</p></div>
<div class="paragraph"><p>More shocking is the stark simplicity of what Brown and Palsberg call a
<em>shallow</em> self-interpreter for System F and System F<sub>ω</sub>, which is
essentially a typed version of “λq.q(λx.x)”.</p></div>
<div class="paragraph"><p>It relies on a liberal definition of representation (we only require an
injective map from legal terms to normal forms) and self-interpretation
(mapping a representation of a term to its value) which is nonetheless still
strong enough to upend conventional wisdom.</p></div>
<div class="paragraph"><p>Which brings us to the most shocking revelation: there is no official agreement
on the definition of representation or self-interpretation, or even what we
should name these concepts.</p></div>
<div class="paragraph"><p>Does this mean I should be wary of even the latest textbooks? Part of me
hopes not, because I want to avoid learning falsehoods, but another part of me
hopes so, for it means I’ve reached the cutting edge of research.</p></div>
<div class="paragraph"><p><strong>See for yourself!</strong></p></div>
<div class="paragraph"><p>Interactive demos of the above:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/">A Turing-complete
interpreter in one line</a>.
</p>
</li>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/hm.html">A Hindley-Milner well-typed
program that sorts lists</a>.
</p>
</li>
<li>
<p>
<a href="http://crypto.stanford.edu/~blynn/lambda/systemf.html">A self-interpreter in System F</a>.
</p>
</li>
</ul></div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-3092716202179196012015-11-10T21:41:00.001-08:002015-11-10T21:41:59.936-08:00Neural Networks in Haskell
<div class="paragraph"><p>Long ago, when I first looked into machine learning, neural networks didn’t
stand out of the crowd. They seemed on par with decision trees, genetic
algorithms, genetic programming, and a host of other techniques. I wound up
dabbling in genetic programming because it seemed coolest.</p></div>
<div class="paragraph"><p>Neural networks have since distinguished themselves. Lately, they seem
responsible for each newsworthy machine learning achievement I hear about.
To name a few:</p></div>
<div class="ulist"><ul>
<li>
<p>
<a href="http://arxiv.org/abs/1509.01549">A neural net teaches itself Master-level chess in 72 hours</a>.
</p>
</li>
<li>
<p>
<a href="http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html">A neural net “dreams” surreal images</a>.
</p>
</li>
<li>
<p>
<a href="https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf">A neural net teaches itself to master several Atari games</a>.
</p>
</li>
</ul></div>
<div class="paragraph"><p>Inspired, I began reading <a href="http://neuralnetworksanddeeplearning.com/">Michael
Nielsen’s online book on neural networks</a>. We can whip up a neural network
without straying beyond a Haskell base install, though we do have to implement
<a href="https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform">the Box-Muller
transform</a> ourselves to avoid pulling in a library to sample from a normal
distribution.</p></div>
<div class="paragraph"><p>The following generates a neural network with 3 inputs, a hidden layer of 4
neurons, and 2 output neurons, and feeds it the inputs [0.1, 0.2, 0.3].</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.6
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt><span style="font-weight: bold"><span style="color: #0000FF">import</span></span> Control<span style="color: #990000">.</span><span style="color: #009900">Monad</span>
<span style="font-weight: bold"><span style="color: #0000FF">import</span></span> Data<span style="color: #990000">.</span><span style="color: #009900">Functor</span>
<span style="font-weight: bold"><span style="color: #0000FF">import</span></span> Data<span style="color: #990000">.</span><span style="color: #009900">List</span>
<span style="font-weight: bold"><span style="color: #0000FF">import</span></span> System<span style="color: #990000">.</span><span style="color: #009900">Random</span>
main <span style="color: #990000">=</span> newBrain <span style="color: #990000">[</span><span style="color: #993399">3</span><span style="color: #990000">,</span> <span style="color: #993399">4</span><span style="color: #990000">,</span> <span style="color: #993399">2</span><span style="color: #990000">]</span> <span style="color: #990000">>>=</span> print <span style="color: #990000">.</span> feed <span style="color: #990000">[</span><span style="color: #993399">0.1</span><span style="color: #990000">,</span> <span style="color: #993399">0.2</span><span style="color: #990000">,</span> <span style="color: #993399">0.3</span><span style="color: #990000">]</span>
newBrain szs<span style="color: #990000">@(</span><span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">:</span>ts<span style="color: #990000">)</span> <span style="color: #990000">=</span> zip <span style="color: #990000">(</span>flip replicate <span style="color: #993399">1</span> <span style="color: #990000"><$></span> ts<span style="color: #990000">)</span> <span style="color: #990000"><$></span>
zipWithM <span style="color: #990000">(\</span>m n <span style="color: #990000">-></span> replicateM n <span style="color: #990000">$</span> replicateM m <span style="color: #990000">$</span> gauss <span style="color: #993399">0.01</span><span style="color: #990000">)</span> szs ts
feed <span style="color: #990000">=</span> foldl' <span style="color: #990000">(((</span>max <span style="color: #993399">0</span> <span style="color: #990000"><$>)</span> <span style="color: #990000">.</span> <span style="color: #990000">)</span> <span style="color: #990000">.</span> zLayer<span style="color: #990000">)</span>
zLayer as <span style="color: #990000">(</span>bs<span style="color: #990000">,</span> wvs<span style="color: #990000">)</span> <span style="color: #990000">=</span> zipWith <span style="color: #990000">(+)</span> bs <span style="color: #990000">$</span> sum <span style="color: #990000">.</span> zipWith <span style="color: #990000">(*)</span> as <span style="color: #990000"><$></span> wvs
gauss <span style="color: #990000">::</span> <span style="color: #009900">Float</span> <span style="color: #990000">-></span> <span style="color: #009900">IO</span> <span style="color: #009900">Float</span>
gauss stdev <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">do</span></span>
x <span style="color: #990000"><-</span> randomIO
y <span style="color: #990000"><-</span> randomIO
return <span style="color: #990000">$</span> stdev <span style="color: #990000">*</span> sqrt <span style="color: #990000">(-</span><span style="color: #993399">2</span> <span style="color: #990000">*</span> log x<span style="color: #990000">)</span> <span style="color: #990000">*</span> cos <span style="color: #990000">(</span><span style="color: #993399">2</span> <span style="color: #990000">*</span> pi <span style="color: #990000">*</span> y<span style="color: #990000">)</span></tt></pre></div></div>
<div class="paragraph"><p>The tough part is training the network. The sane choice is to use a library
to help with the matrix and vector operations involved in backpropagation
by gradient descent, but where’s the fun in that?</p></div>
<div class="paragraph"><p>It turns out even if we stay within core Haskell, we only need a few more
lines, albeit some hairy ones:</p></div>
<div class="listingblock">
<div class="content"><!-- Generator: GNU source-highlight 3.1.6
by Lorenzo Bettini
http://www.lorenzobettini.it
http://www.gnu.org/software/src-highlite -->
<pre><tt>relu <span style="color: #990000">=</span> max <span style="color: #993399">0</span>
relu' x <span style="color: #990000">|</span> x <span style="color: #990000"><</span> <span style="color: #993399">0</span> <span style="color: #990000">=</span> <span style="color: #993399">0</span>
<span style="color: #990000">|</span> otherwise <span style="color: #990000">=</span> <span style="color: #993399">1</span>
revaz xs <span style="color: #990000">=</span> foldl' <span style="color: #990000">(\(</span>avs<span style="color: #990000">@(</span>av<span style="color: #990000">:</span><span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">),</span> zs<span style="color: #990000">)</span> <span style="color: #990000">(</span>bs<span style="color: #990000">,</span> wms<span style="color: #990000">)</span> <span style="color: #990000">-></span> <span style="font-weight: bold"><span style="color: #0000FF">let</span></span>
zs' = zLayer av (bs, wms) in ((relu <$> zs'<span style="color: #990000">):</span>avs<span style="color: #990000">,</span> zs'<span style="color: #990000">:</span>zs<span style="color: #990000">))</span> <span style="color: #990000">([</span>xs<span style="color: #990000">],</span> <span style="color: #990000">[])</span>
dCost a y <span style="color: #990000">|</span> y <span style="color: #990000">==</span> <span style="color: #993399">1</span> <span style="color: #990000">&&</span> a <span style="color: #990000">>=</span> y <span style="color: #990000">=</span> <span style="color: #993399">0</span>
<span style="color: #990000">|</span> otherwise <span style="color: #990000">=</span> a <span style="color: #990000">-</span> y
deltas xv yv layers <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">let</span></span>
<span style="color: #990000">(</span>avs<span style="color: #990000">@(</span>av<span style="color: #990000">:</span><span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">),</span> zv<span style="color: #990000">:</span>zvs<span style="color: #990000">)</span> <span style="color: #990000">=</span> revaz xv layers
delta0 <span style="color: #990000">=</span> zipWith <span style="color: #990000">(*)</span> <span style="color: #990000">(</span>zipWith dCost av yv<span style="color: #990000">)</span> <span style="color: #990000">(</span>relu' <span style="color: #990000"><$></span> zv<span style="color: #990000">)</span>
<span style="font-weight: bold"><span style="color: #0000FF">in</span></span> <span style="color: #990000">(</span>reverse avs<span style="color: #990000">,</span> f <span style="color: #990000">(</span>transpose <span style="color: #990000">.</span> snd <span style="color: #990000"><$></span> reverse layers<span style="color: #990000">)</span> zvs <span style="color: #990000">[</span>delta0<span style="color: #990000">])</span>
<span style="font-weight: bold"><span style="color: #0000FF">where</span></span>
f <span style="font-weight: bold"><span style="color: #0000FF">_</span></span> <span style="color: #990000">[]</span> dvs <span style="color: #990000">=</span> dvs
f <span style="color: #990000">(</span>wm<span style="color: #990000">:</span>wms<span style="color: #990000">)</span> <span style="color: #990000">(</span>zv<span style="color: #990000">:</span>zvs<span style="color: #990000">)</span> dvs<span style="color: #990000">@(</span>dv<span style="color: #990000">:</span><span style="font-weight: bold"><span style="color: #0000FF">_</span></span><span style="color: #990000">)</span> <span style="color: #990000">=</span> f wms zvs <span style="color: #990000">$</span> <span style="color: #990000">(:</span>dvs<span style="color: #990000">)</span> <span style="color: #990000">$</span>
zipWith <span style="color: #990000">(*)</span> <span style="color: #990000">[</span>sum <span style="color: #990000">$</span> zipWith <span style="color: #990000">(*)</span> row dv <span style="color: #990000">|</span> row <span style="color: #990000"><-</span> wm<span style="color: #990000">]</span> <span style="color: #990000">(</span>relu' <span style="color: #990000"><$></span> zv<span style="color: #990000">)</span>
descend av dv <span style="color: #990000">=</span> zipWith <span style="color: #990000">(-)</span> av <span style="color: #990000">((</span><span style="color: #993399">0.002</span> <span style="color: #990000">*)</span> <span style="color: #990000"><$></span> dv<span style="color: #990000">)</span>
learn xv yv layers <span style="color: #990000">=</span> <span style="font-weight: bold"><span style="color: #0000FF">let</span></span> <span style="color: #990000">(</span>avs<span style="color: #990000">,</span> dvs<span style="color: #990000">)</span> <span style="color: #990000">=</span> deltas xv yv layers
<span style="font-weight: bold"><span style="color: #0000FF">in</span></span> zip <span style="color: #990000">(</span>zipWith descend <span style="color: #990000">(</span>fst <span style="color: #990000"><$></span> layers<span style="color: #990000">)</span> dvs<span style="color: #990000">)</span> <span style="color: #990000">$</span>
zipWith3 <span style="color: #990000">(\</span>wvs av dv <span style="color: #990000">-></span> zipWith <span style="color: #990000">(\</span>wv d <span style="color: #990000">-></span> descend wv <span style="color: #990000">((</span>d<span style="color: #990000">*)</span> <span style="color: #990000"><$></span> av<span style="color: #990000">))</span>
wvs dv<span style="color: #990000">)</span> <span style="color: #990000">(</span>snd <span style="color: #990000"><$></span> layers<span style="color: #990000">)</span> avs dvs</tt></pre></div></div>
<div class="paragraph"><p>See
<a href="https://cs.stanford.edu/~blynn/haskell/brain.html">my Haskell notes</a> for
details. In short: ReLU activation function; online learning with a rate of
0.002; an ad hoc cost function that felt right at the time.</p></div>
<div class="paragraph"><p>Despite cutting many corners, after a few runs, I obtained a neural network
that correctly classifies 9202 of 10000 handwritten digits in the
<a href="http://yann.lecun.com/exdb/mnist/">MNIST test set</a> in just one pass over the
training set.</p></div>
<div class="paragraph"><p>I found this result surprisingly good. Yet there is much more to explore: top
on my must-see list are deep learning (also described in Nielsen’s book)
and <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">long short-term
memory</a>.</p></div>
<div class="paragraph"><p>I turned the neural net into an
<a href="https://cs.stanford.edu/~blynn/mnist/"><strong>online digit recognition demo</strong></a>: you can
draw on the canvas and see how it affects the outputs.</p></div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-59951747273467247552015-02-23T00:35:00.000-08:002015-02-23T00:35:55.280-08:00Mighty Warp<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p><a href="http://www.aosabook.org/en/posa/warp.html">Wow. There exists a Haskell web server
with the following properties</a>:</p></div>
<div class="ulist"><ul>
<li>
<p>
Outperforms <a href="http://www.nginx.org/">nginx</a>.
</p>
</li>
<li>
<p>
Under 1300 lines of source.
</p>
</li>
<li>
<p>
Clear control flow: handles one request per thread using blocking calls.
</p>
</li>
<li>
<p>
Slowloris DoS protection.
</p>
</li>
</ul></div>
<div class="paragraph"><p>The secret is GHC’s runtime system (RTS). Every Haskell program must spend
time in the RTS, and maybe this does hurt performance in certain cases, but for
web servers it is a huge win: the RTS automatically transforms code that seems
to handle one request per thread into a server with multiple event-driven
processes. This saves many a context switch while keeping the source simple.</p></div>
<div class="paragraph"><p>Best of all, this magic technology is widely available. To start a webserver
on port 3000 using the Warp library, run these commands on Ubuntu
[<a href="http://stackoverflow.com/questions/22620294/minimal-warp-webserver-example">original inspiration</a>]:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>sudo apt-get install cabal-install
cabal update
cabal install warp
cat > server.hs << EOF
#!/usr/bin/env runghc
{-# LANGUAGE OverloadedStrings #-}
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import Network.HTTP.Types (status200)
import Network.HTTP.Types.Header (hContentType)
main = run 3000 $ \_ f -> f $ responseLBS
status200 [(hContentType, "text/plain")] "Hello, world!\n"
EOF
chmod +x server.hs
./server.hs</pre>
</div></div>
<div class="paragraph"><p>Eliminating context switches is the best part of the story, but there’s more.
Copying data can be avoided with a simple but clever trick the authors call
<em>splicing</em>. Using <em>conduits</em> instead of lazy I/O solves the non-deterministic
resource finalization problem. And a few judiciously placed lockless atomic
operations can work wonders: in particular, for basic Slowloris protection and
for a robust file descriptor cache.</p></div>
<div class="paragraph"><p><strong>Uramaki</strong></p></div>
<div class="paragraph"><p>Those who fear straying too far from a C-like language can still reap the
benefits in <a href="http://www.golang.org/">Go</a>:</p></div>
<div class="ulist"><ul>
<li>
<p>
Goroutines are like green threads.
</p>
</li>
<li>
<p>
Channels are like conduits.
</p>
</li>
<li>
<p>
Array slices are like splices.
</p>
</li>
</ul></div>
<div class="paragraph"><p>If I didn’t know better, I would say the designers of Go emulated the
inventor of the <a href="http://en.wikipedia.org/wiki/California_roll">California roll</a>:
they took some of the best features of languages like Haskell and made them
palatable to a wider audience.</p></div>
<div class="paragraph"><p>I wonder how Go’s RTS compares. One innate advantage GHC may have is Haskell’s
type system, which leads to largely non-destructive computation, which
ultimately leads to a cheap and effective scheduling scheme (namely,
<a href="http://www.aosabook.org/en/ghc.html">context switching on memory
allocation</a>). Still, I expect a well-written Go web server could achieve
similar results.</p></div>
</div>
</div>
Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-29977196339572333122014-12-20T03:54:00.000-08:002014-12-24T15:58:53.092-08:00Haskell for programming contests<a name="preamble"></a> <p><a href="http://www.drdobbs.com/architecture-and-design/farewell-dr-dobbs/240169421">Farewell, Dr. Dobb’s</a>. In a way, <a href="http://benlynn.blogspot.com.au/2014/08/let-code.html">my previous post</a> proved prescient: in the old days, I relied on printed magazines like Dr. Dobb’s Journal to learn about computers. Now, most things are but a few search terms away. Coding is easier than ever.</p> <p>Though I have a soft spot for this particular magazine, ultimately I’m glad information has become more organized and accessible. I like to think I played a part in this, however small, by posting my own tutorials, articles, rants, and code.</p> <p>Here’s hoping the remainder of my previous post also ages well. That is, may Haskell live long and prosper. [<a href="http://www.drdobbs.com/architecture-and-design/in-praise-of-haskell/240163246">A sentiment echoed by Dr. Dobb’s</a>.] Again, I’d like to play a small part in this: <b><a href="http://crypto.stanford.edu/~blynn/haskell/">Haskell for programming contests</a></b>.</p>Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-72056368162796628862014-08-12T23:15:00.000-07:002014-12-24T15:59:08.796-08:00Let's Code!<a name="preamble"></a> <p>A recent <a href="http://www.drdobbs.com/tools/just-let-me-code/240168735">article in Dr. Dobb’s Journal</a> bemoans the complexity of today’s development toolchains: “it’s hard to get any real programming done”. However, my own experience suggests the opposite: I find programming is now easier than ever, partly due to better tools.</p> <p>I say “partly” because when I was a kid, it was difficult to obtain code, compilers, and documentation, let alone luxuries like an SCM. I scoured public libraries for books on programming and checked out what they had, which meant I studied languages which I could never use because I lacked the right compiler, or even the right computer. I nagged my parents to buy me expensive books, and occasionally they’d succumb. Perhaps the most cost-efficient were magazines containing program listings which of course had to be keyed in by hand. (One of my most treasured was an issue of Dr. Dobb’s Journal, back when it was in print, and only in print.)</p> <p>Nowadays, a kid can get free high-quality compilers, code, tutorials, and more at the click of a button. But I believe even without this freer flow of information, programming would still be easier than ever because our tools have improved greatly.</p> <hr> <h2><a name="_got_git"></a>got git?</h2> <p>The author singles out Git as a source of trouble, but the reasoning is suspect. For example, we’re told that with respect to other “SCMs you’ve used…Git almost certainly does those same actions differently.”</p> <p>This suggests that the author used other SCMs, then tried Git and found it confusing. In contrast, I used Git, then tried other SCMs and found them confusing. I predict as time passes, more and more developers will learn Git first, and their opinions of SCMs will mirror mine.</p> <p>Nevertheless, I’m leery of ranking the friendliness of tools by the order you picked them up. I hereby propose a different yardstick. Take Git, and a traditional SCM. Implement, or at least think about implementing, a clone of each from scratch; just enough so it is self-hosting. Then the one that takes less time to implement is simpler.</p> <p>I wrote <a href="https://crypto.stanford.edu/~blynn/gg/">a self-hosting Git clone</a> in a few hours: longer than expected because I spent an inordinate amount of time debugging silly mistakes. Though I haven’t attempted it, I would need more time to write a clone of Perforce or Subversion (pretty much the only other SCMs I have used). With Git, there’s no transactions, revision numbers, rename tracking, central servers, and so on; Git is essentially SHA-1 hashes all the way down.</p> <p>But let’s humour the author and suppose Git is complex. Then why not use tarballs and patches? This was precisely <a href="https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript">how Linux was managed for 10 years</a>, so should surely suffice for a budding developer. In fact, I say you should only bother with Git once you realize, firstly, you’re addicted to coding, and secondly, how annoying it is to manage source with tarballs and patches!</p> <p>In other words, although Git is handy, you only really need it when your project grows beyond a certain point, by which time you’ve already had plenty of fun coding. Same goes for tools like defect trackers.</p> <hr> <h2><a name="_apps_and_oranges"></a>Apps and Oranges</h2> <p>I agree that developing for mobiles is painful. However, comparing this against those “simple programs of a few hundred lines of C++ long ago” is unfair. With mobile apps, the program usually runs on a system different to the one used to write the code.</p> <p>It might be fairer to compare writing an mobile app with, say, programming a dot matrix printer of yesteryear, as in both cases the target is different to the system used to write the code. I once did the latter, for the venerable Epson MX-80: after struggling with a ton of hardware-specific low-level nonsense, I was rewarded with a handful of crummy pictures. I would say it involved less “real programming” than writing an Android app.</p> <p>All the same, I concede that writing Android software is harder than it should be, <a href="http://www.youtube.com/watch?v=5kj5ApnhPAE">largely due to Java</a>. But firstly, a mobile phone involves security and privacy issues that would never arise with a dot matrix printer, which necessarily implies more bookkeeping, and secondly, the Java problem can be worked around: either via native code, or a non-Java compiler that generates Dalvik bytecode. [I’ve only mentioned Android throughout because it’s the only mobile platform I’ve developed on.]</p> <p>Comparing server-side web apps with the good old days is similarly unfair unless the good old days also involved networks, in which case they were really the bad old days. PC gamers of a certain age may remember a myriad of mysterious network options to configure multiplayer mode; imagine the even more mysterious code behind it. As for cloud apps, I would rather work on a cloud app than on an old-school equivalent: BBS software, which involves renting out extra phones lines if you want high availability.</p> <p>What about client-side web apps? As they can run on the same system used to develop them, it is therefore fair to compare developing them against writing equivalent code in those halcyon days of yore. Let’s look at a couple of examples.</p> <hr> <h2><a name="_tic_tac_toe"></a>Tic-tac-toe</h2> <p>I wrote <a href="https://crypto.stanford.edu/~blynn/play/tictactoe.html">a tic-tac-toe web app with an AI that plays perfectly</a> because it searches the entire game tree; modern hardware and browsers are so fast that this is bearable (though we’re spared one ply because the human goes first). It works on desktops, laptops, tablets, phones: anything with a browser.</p> <p>Here’s the minimax game tree search, based on code from <a href="http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html">John Hughes, <em>Why Functional Programming Matters</em></a>:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="4"><tr><td> <pre><code>score (Game _ Won 'X') = -1<br />score (Game _ Won 'O') = 1<br />score _ = 0<br /><br />maximize (Node leaf []) = score leaf<br />maximize (Node _ kids) = maximum (map minimize kids)<br /><br />minimize (Node leaf []) = score leaf<br />minimize (Node _ kids) = minimum (map maximize kids)</code></pre></td></tr></table> <p>Despite my scant Haskell knowledge and experience, the source consists of a single file containing less than 150 lines like the above, plus a small HTML file: hardly a “multiplicity of languages”. Writing it was enjoyable, and I did so with a text editor in a window 80 characters wide.</p> <p>Let’s rewind ten to twenty years. I’d have a hard time achieving the brevity and clarity of the above code. The compiler I used didn’t exist, and depending how far back we go, neither did the language. Not that I’d consider compiling to JavaScript in the first place: depending how far back we go, it was too slow or didn’t exist.</p> <hr> <h2><a name="_netwalk"></a>Netwalk</h2> <p>In my student days, I developed <a href="https://code.google.com/p/netwalk/">a clone of a Windows puzzle game named Netwalk</a>. I chose C, so users either ran untrusted binaries I supplied (one for each architecture), or built their own binaries from scratch. Forget about running it on phones and PDAs.</p> <p>I managed my files with tarballs and patches. The source consisted of a few thousand lines, though admittedly much of it is GUI cruft: menus, buttons, textboxes, and so on. Lately, I hacked up <a href="https://crypto.stanford.edu/~blynn/play/netwalk.html">a web version of Netwalk</a>. The line count? About 150.</p> <p>Thanks to Git, you can view the entire source right now on <a href="https://code.google.com/p/combotrain/source/browse/netwalk.hs">Google Code</a> or <a href="https://github.com/blynn/combotrain/blob/master/netwalk.hs">GitHub</a>, all dolled up with syntax highlighting and line numbers.</p> <p>Building native binaries has a certain charm, but I have to admit that a client-side web app has less overhead for developers and users alike. I only need to build the JavaScript once, then anyone with a browser can play.</p> <p>Thus in this case, my new tools are better than my old tools in every way.</p> <hr> <h2><a name="_choose_wisely"></a>Choose Wisely</h2> <p>The real problem perhaps is the sheer number of choices. Tools have multiplied and diversified, and some indeed impede creativity and productivity. But others are a boon for programmers: they truly just let you code.</p> <p>Which tools are the best ones? The answer probably depends on the person as well as the application, but I will say for basic client-side web apps and native binaries, I heartily recommend my choices: <a href="http://www.haskell.org/">Haskell</a>, <a href="http://haste-lang.org/">Haste</a>, <a href="http://git-scm.com/">Git</a>.</p> <p>I’m confident the above would perform admirably for other kinds of projects. I intend to find out, but at the moment I’m having too much fun coding games.</p> <p><a href="https://crypto.stanford.edu/~blynn/play/">Play Now!</a></p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-53323927324978221932014-07-29T22:52:00.000-07:002014-12-24T15:59:27.592-08:0015 Shades of Grey<a name="preamble"></a> <p><a href="http://en.wikipedia.org/wiki/John_Carmack">John Carmack</a> indirectly controlled significant chunks of my life. For hours at a time, I would fight in desperate gun battles in beautiful and terrifying worlds he helped create. On top of this, the technical wizardry of id Software’s games inspired me to spend yet more hours learning how they managed to run <em>Wolfenstein 3D</em> and <em>Doom</em> on PCs, in an era when clockspeeds were measured in megahertz and dedicated graphics cards were rare.</p> <p>I read about cool tricks like <a href="http://en.wikipedia.org/wiki/Binary_space_partitioning">binary space partitioning</a>, and eventually wrote a toy 3D engine of my own. The process increased my respect for the programmers: it’s incredibly difficult to get all the finicky details right while sustaining good frame rates.</p> <p>Accordingly, I paid close attention when <a href="http://www.youtube.com/watch?v=1PhArSujR_A">John Carmack spoke about programming languages in his QuakeCon 2013 keynote</a>. Many people, myself included, have strong opinions on programming languages, but few have a track record as impressive as his.</p> <hr> <h2><a name="_carmack_8217_s_sneaky_plan"></a>Carmack’s Sneaky Plan</h2> <p>I was flabbergasted by Carmack’s thoughts on the Haskell language. He starts by saying: “My big software evolution over the last, certainly three years and stretching back tendrils a little bit further than that, has been this move towards functional programming style and pure functions.”</p> <p>He then states that not only is Haskell suitable for programming games, but moreover, thinks Haskell may beat typical languages by roughly “a factor of two”, which “would be monumental” and “a really powerful thing for game development”. He has even begun reimplementing <em>Wolfenstein 3D</em> in Haskell as part of a “sneaky plan” to convince others.</p> <p>Wow! I had always thought Haskell was a pretty but impractical language. I loved composing elegant Haskell snippets to solve problems that one might encounter in interviews and programming contests, but for real stuff I resorted to C.</p> <p>Among my concerns is garbage collection: I have bad memories of unexpected frequent pauses in Java programs. But Carmack notes that Haskell’s almost uncompromising emphasis on purity simplifies garbage collection to the point where it is a predictable fixed overhead.</p> <p>A second concern is lazy evaluation. It’s easy to write clear and simple but inefficient Haskell: <a href="http://book.realworldhaskell.org/read/profiling-and-optimization.html">computing the average of a list of numbers</a> comes to mind. Carmack is also “still not completely sold on the value of laziness”, but evidently it’s not a showstopper for him. I suppose it’s all good so long as there are ways of forcing strict evaluation.</p> <p>A third concern (but probably not for Carmack) is that I don’t know how to write a Haskell compiler; I’m more at ease with languages when I know how their compilers work. I can ignore this discomfort, though I intend to overcome my ignorance one day. I’m hoping it’s mostly a matter of understanding Hindley-Milner type inference.</p> <p>Speaking of types, Carmack is a fan of static strong typing, because in his experience, “if it’s syntactically legal, it will make it into the codebase”. He notes during his recent foray into Haskell, the one time he was horribly confused was due to untyped data from the original <em>Wolfenstein 3D</em>.</p> <hr> <h2><a name="_my_obvious_plan"></a>My Obvious Plan</h2> <p>Once again, I’m inspired by Carmack. I plan to take Haskell more seriously to see if it really is twice as good. Although I lack the resources to develop a complex game, I may be able to slap together a few prototypes from time to time.</p> <p>First up is the <a href="http://en.wikipedia.org/wiki/15_puzzle">15-Puzzle by Noyes Palmer Chapman</a> with a cosmetic change: to avoid loading fonts and rendering text, I replaced the numbers 1 to 15 with increasingly darker shades of grey.</p> <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAaEK6HVT59xIajIPkUgsVMWwPPFRxnI2Apf_ntf6EMx-guhkR6EmGUdysr0Mi7HMCCxL_OpboQr1uqXpKqMKoqPkvbG2TBieu6EY1aT17QHX6wQ6_RfxjLBJkt38UZFRZulN4hUEFzKI/s1600/15.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjAaEK6HVT59xIajIPkUgsVMWwPPFRxnI2Apf_ntf6EMx-guhkR6EmGUdysr0Mi7HMCCxL_OpboQr1uqXpKqMKoqPkvbG2TBieu6EY1aT17QHX6wQ6_RfxjLBJkt38UZFRZulN4hUEFzKI/s1600/15.png" /></a></div><p>I began with a program depending on <a href="http://www.haskell.org/haskellwiki/SDL">SDL</a>. The result was surprisingly playable, and I found <a href="https://github.com/blynn/chapman">the source code</a> surprisingly short in spite of my scant knowledge of Haskell. To better show off my work, I made a few edits to produce a version of my program suitable for <a href="http://haste-lang.org/">the Haste compiler</a>, which compiles Haskell to JavaScript. I added mouse support and tweaked the HTML so the game is tolerable on tablets and phones.</p> <p><a href="http://crypto.stanford.edu/~blynn/15/">Play now!</a></p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com1tag:blogger.com,1999:blog-4222267598459829544.post-45922057514516539612014-05-25T10:49:00.000-07:002018-11-16T13:43:58.177-08:00Straw Men in Black<a name="preamble"></a> <p>There’s a phrase used to praise a book: “you can’t put it down”. Unfortunately, I felt the opposite while reading <em>The Black Swan</em> by Nassim N. Taleb.</p> <p>I’ll admit some prejudice. We’re told not to judge a book by its cover, but review quotes in the blurb ought to be exempt. One such quote originated from Peter L. Bernstein, the author of <a href="http://books.google.com/books/about/Against_the_Gods.html?id=uTje6PYAijUC"><em>Against the Gods</em></a>. While I enjoyed reading it, <a href="http://benlynn.blogspot.com/2013/02/probability-made-less-uneasy.html">his book contained a litany of elementary mathematical mistakes.</a> Did this mean <em>The Black Swan</em> was similarly full of errors?</p> <p>All the same, the book began well. Ideas were clear and well-expressed. The writing was confident: perhaps overly so, but who wants to read text that lacks conviction? It promised wonders: we would learn how statisticians have been fooling us, and then learn the right way to deal with uncertainty, with potentially enormous life-changing payoffs.</p> <p>I failed to reach this part because several chapters in, I was exhausted by a multitude of issues. I had to put the book down. I intend to read further once I’ve recovered, and hopefully the book will redeem itself. Until then, here are a few observations.</p> <hr> <h2><a name="_one_weird_trick"></a>One Weird Trick</h2> <p><a href="http://www.slate.com/articles/business/moneybox/2013/07/how_one_weird_trick_conquered_the_internet_what_happens_when_you_click_on.html">What’s on the other end of those "one weird trick" online ads</a>? You won’t find out easily. If clicked, one is forced to sit through a video that:</p> <ul> <li> <p> makes impressive claims about a product </p> </li> <li> <p> takes pains to keep the product a secret </p> </li> <li> <p> urges the viewer to wait until the end, when they will finally learn the secret </p> </li> </ul> <p>This recipe must be effective, because I couldn’t help feeling the book was similar. It took me on a long path, meandering from anecdote to anecdote, spiced with poorly constructed arguments and sprinkled with assurances that the best was yet to come.</p> <p>Perhaps this sales tactic has become a necessary evil. With so much competition, how can a book distinguish itself? Additionally, I’m guessing fattening the book for any reason has a positive effect on sales.</p> <p>Even so, the main idea of the book could be worth reading. I’ll post an update if I find out.</p> <hr> <h2><a name="_lay_off_laplace"></a>Lay Off Laplace</h2> <p>Chapter 4 features a story about a turkey. As days pass, a turkey’s belief in the proposition such as "I will be cared for tomorrow" grows ever stronger, right until the day of its execution, when its belief turns out to be false. This retelling of a parable about a chicken due to Bertrand Russell is supposed to warn us about inferring knowledge from observations, a repeated theme in the book.</p> <p>But what about Laplace’s <a href="http://en.wikipedia.org/wiki/Sunrise_problem">sunrise problem</a>? By the <a href="http://en.wikipedia.org/wiki/Rule_of_succession">Rule of Succession</a>, if the sun rose every day for 5000 years, that is, for 5000 × 365.2426 days, the odds it will rise tomorrow are only 1826214 to 1. Ever since Laplace wrote about this, he has been mercilessly mocked because of this ludicrously small probability.</p> <p>So which is it? Do repeated observations make our degrees of belief too strong (chicken) or too weak (sunrise)?</p> <p><strong>Live long and prosper</strong></p> <p>Much of this material is discussed in Chapter 18 of <a href="http://books.google.com/books/about/Probability_Theory.html?id=tTN4HuUNXjgC"><em>Probability Theory: The Logic of Science</em></a> by Edwin T. Jaynes, which also contains the following story.</p> <p>A boy turns 10 years old. The Rule of Succession implies the probability he lives one more year is (10 + 1) / (10 + 2), which is 11/12. A similar computation shows his 70-year old grandfather will live one more year with probability 71/72.</p> <p>I like this example, because it contains both the chicken and the sunrise problem. Two for the price of one. Shouldn’t the old man’s number be lower than the young boy’s? One number seems too big and the other too small. How can the same rule be wrong in two different ways?</p> <p><strong>Ignorance is strength?</strong></p> <p>What should we do to avoid these ridiculous results?</p> <p>Well, if the sun rose for every day for 5000 years <em>and that is all you know</em>, then 1826214 to 1 is correct. The only reason we think this is too low is because we know a lot more than the number of consecutive sunrises: we know about stars, planets, orbits, gravity, and so on. If we take all this into account, our degree of belief that the sun rises tomorrow grows much stronger.</p> <p>The same goes for the other examples. In each one, we:</p> <ol type="1"> <li> <p> Ignored what we know about real world. </p> </li> <li> <p> Calculated based on what little data was left. </p> </li> <li> <p> Un-ignored the real world so we could laugh at the results. </p> </li> </ol> <p>In other words, we have merely shown that ignoring data leads to bad results. It’s as obvious as noting that if you shut your eyes while driving a car, you’ll end up crashing.</p> <p>Sadly, despite pointing this out, Laplace became a victim of this folly. Immediately after describing the sunrise problem, Laplace explains that the unacceptable answer arises because of wilfully neglected data. For some reason, his critics take his sunrise problem, ignore his explanation for the hilarious result, then savage his ideas.</p> <p><em>The Black Swan</em> joins the peanut gallery in condemning Laplace. However, its conclusion differs from those of most detractors. The true problem is that most of the data is ignored when computing probabilities. Taleb considers addressing this by ignoring even more data! But then why not toss out more? Why not throw away most of mathematics and assign arbitrary probabilities to arbitrary assertions?</p> <p>Orthodox statistics is indeed broken, but not because more data should be ignored. It’s broken for the opposite reason: too much data is being ignored.</p> <p>Poor Laplace. Give the guy a break.</p> <hr> <h2><a name="_hempel_8217_s_joke"></a>Hempel’s Joke</h2> <p>Stop me if you’ve heard this one: 2 + 2 = 5 for sufficiently large values of 2. This is obviously a joke (though sometimes told so convincingly that <a href="http://www.straightdope.com/columns/read/1382/does-2-2-5-for-very-large-values-of-2">the audience is unsure</a>).</p> <p>Hempel’s Paradox is a similar but less obvious joke that proceeds as follows. Consider the hypothesis: all ravens are black. This is logically equivalent to saying all non-black things are non-ravens. Therefore seeing a white shoe is evidence supporting the hypothesis.</p> <p>The following <a href="http://golang.org/">Go</a> program makes the attempted humour abundantly clear:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>package main<br /><br />import "fmt"<br /><br />func main() {<br /> state := true<br /> for {<br /> var colour, thing string<br /> if _, e := fmt.Scan(&colour, &thing); e != nil {<br /> break<br /> }<br /> if thing == "raven" && colour != "black" {<br /> state = false<br /> }<br /> fmt.Println(" hypothesis:", state)<br /> }<br />}</pre></td></tr></table> <p>A sample run:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>black raven<br /> hypothesis: true<br />white shoe<br /> hypothesis: true<br />red raven<br /> hypothesis: false<br />black raven<br /> hypothesis: false<br />white shoe<br /> hypothesis: false</pre></td></tr></table> <p>The state of the hypothesis is represented by a boolean variable. Initially the boolean is true, and it remains true until we encounter a non-black raven. This is the only way to change the state of the program: neither "black raven" nor "white shoe" has any effect.</p> <p>Saying we have "evidence supporting the hypothesis" is saying there are truer values of true. It’s like saying there are larger values of 2.</p> <p>The original joke exploits the mathematical concept “sufficiently large” which has applications, but is absurd when applied to constants.</p> <p>Similarly, Hempel’s joke exploits the concept "supporting evidence", which has applications, but is absurd when applied to a lone hypothesis.</p> <p><strong>Off by one</strong></p> <p>If we want to talk about evidence supporting or undermining a hypothesis mathematically, we’ll need to advance beyond boolean logic. Conventionally we represent degrees of belief with numbers between 0 and 1. The higher the number, the stronger the belief. We call these <em>probabilities</em>.</p> <p>Next, we propose some mutually exclusive hypotheses and assign probabilities between 0 and 1 to each one. The sum of the probabilities must be 1.</p> <p>If we take a single proposition by itself, such as "all ravens are black", then we’re forced to give it a probability of 1. We’re reduced to the situation above, where the only interesting thing that can happen is that we see a non-black raven and we realize we must restart with a different hypothesis. (In general, probability theory taken to extremes devolves into plain logic.)</p> <p>We need at least two propositions with nonzero probabilties for the phrase "supporting evidence" to make sense. For example, we might have two propositions A and B, with probabilities of 0.2 and 0.8 respectively. If we find evidence supporting A, then its probability increases and the probability of B decreases accordingly, for their sum must always be 1. Naturally, as before, we may encounter evidence that implies all our propositions are wrong, in which case we must restart with a fresh set of hypotheses.</p> <p>To avoid nonsense, we require at least two mutually exclusive propositions, such as A: "all ravens are black", and B: "there exists a non-black raven", and each must have a nonzero probability. Now it makes sense to ask if a white shoe is supporting evidence. Does it support A at B’s expense? Or B at A’s expense? Or neither?</p> <p>The propositions as stated are too vague to answer one way or another. We can make the propositions more specific, but there are infinitely many ways to do so, and the choices we make change the answer. See Chapter 5 of Jaynes.</p> <p><strong>One Card Trick</strong></p> <p>Instead of trying to flesh out hypotheses involving ravens, let us content ourselves with a simpler scenario. Suppose a manufacturer of playing cards has a faulty process that sometimes uses black ink instead of red ink to print the entire suit of hearts. We estimate one in ten packs of cards have black hearts instead of red hearts and is otherwise normal, while the other nine decks are perfectly fine.</p> <p>We’re given a pack of cards from this manufacturer. Thus we believe the hypothesis A: "all hearts are red" with probability 0.9, and B: "there exists a non-red heart" with probability 0.1. We draw a card. It’s the four of clubs. What does this do to our beliefs?</p> <p>Nothing. Neither hypothesis is affected by this irrelevant evidence. I believe this is at least intuitively clear to most people, and furthermore, had Hempel spoke of hearts and clubs instead of ravens and shoes, his joke would have been more obvious.</p> <p><strong>Great Idea, Poor Execution</strong></p> <p><em>The Black Swan</em> attacks orthodox statistics using Hempel’s paradox, alleging that it shows we should beware of evidence supporting a hypothesis.</p> <p>It turns out orthodox statistics can be attacked with Hempel’s paradox, but not by claiming "supporting evidence" is meaningless. That would be like claiming "sufficiently large" is meaningless.</p> <p>Instead, Hempel’s joke reminds us we must consider more than one hypothesis if we want to talk about supporting evidence. This may seem obvious; assigning a degree of belief in a lone proposition is like awarding points in a competition with only one contestant.</p> <p>However, apparently it is not obvious enough. <em>The Black Swan</em> misses the point, and so did my university professors. My probability and statistics textbook instructs us to consider only one hypothesis. (Actually, it’s worse: one of the steps is to devise an alternate hypothesis, but this second hypothesis is never used in the procedure!)</p> <hr> <h2><a name="_mathematics_versus_society"></a>Mathematics Versus Society</h2> <p>In an off-hand comment, Taleb begins a sentence with “Mathematicians will try to convince you that their science is useful to society by…”</p> <p>By this point, I already found faults. First and foremost: how often do mathematicians talk about their usefulness to society? There are many jokes about mathematicians and real life, such as:</p> <blockquote> <p>Engineers believe their equations approximate reality. Physicists believe reality approximates their equations. Mathematicians don’t care.</p> <p align="right"> </p> </blockquote> <p>The truth is being exaggerated for humour, but asserting their work is useful in the real world is evidently a low priority for mathematicians. It is almost a point of pride. In fact, Taleb himself later quotes Hardy:</p> <blockquote> <p>The “real” mathematics of the “real” mathematicians…is almost wholly “useless”.</p> <p align="right"> </p> </blockquote> <p>This outlook is not new. Gauss called number theory “the queen of mathematics”, because it was pure and beautiful and had no applications in real life. (He had no way of foreseeing that number theory would one day be widely used in real life for secure communication!)</p> <p>But sure, whatever, let’s suppose mathematicians go around trying to convince others that their field is useful to society. [Presumably Hardy would call such a mathematician “imaginary” or “complex”.] They are trivially right. If you try to talk about how useful things are to society, then you’ll want to measure and compare usefulness of things, all the while justifying your statements with sound logical arguments. Measuring and comparing and logic all lie squarely in the domain of mathematics.</p> <hr> <h2><a name="_jumping_to_conclusions"></a>Jumping to Conclusions</h2> <p>So far, I feel the author’s heart is in the right place but his reasoning is flawed. Confirmation bias is indeed pernicious, and orthodox statistics is indeed erroneous. However, <em>The Black Swan</em> knocks down straw men instead of hitting these juicy targets.</p> <p>The above are but a few examples of the difficulties I ran into while reading the book. I had meant to pick apart more specious arguments but I’ve already written more than I had intended.</p> <p>Again, I stress I have not read the whole work, and it may improve in the second half.</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-19399393646954425902014-04-18T21:53:00.000-07:002014-04-18T21:53:29.150-07:00Crit-bit trees yet again<a name="preamble"></a> <p>I’ve been pleased with <a href="http://benlynn.blogspot.com/2013/11/crit-bit-tree-micro-optimizations_3.html">my recent implementation of crit-bit trees</a> based on <a href="https://www.imperialviolet.org/binary/critbit.pdf">Adam Langley’s code</a>, but in the back of my mind, I’ve been bothered by <a href="http://cr.yp.to/critbit.html">Bernstein’s statement</a> that the overhead on top of each string stored need only be two control words: one pointer and one integer.</p> <p>To obtain relief, I updated my library so that internal nodes only have a single child pointer instead of one pointer each for the left and right child nodes. In a crit-bit tree, an internal node always has two children, so we may as well allocate sibling nodes in one contiguous chunk of memory.</p> <p>Bernstein suggests storing strings directly in the crit-bit tree, which can be done by storing the left string backwards: it resides to the left of the integer control word. I’ve chosen an alternate scheme which he also mentions: expressing the strings as pointers in a separate storage pool.</p> <p>Removing one pointer per internal node has several benefits on a 64-bit machine. Firstly, malloc is typically 16-byte aligned, which means that the 24-byte internal nodes of the original version were actually taking 32 bytes. In contrast, now we only need just one pointer and one integer, so we can fit internal nodes within 16 bytes.</p> <p>Secondly, we no longer have to tag pointers. Instead, 8 bytes hold an unadulterated child pointer, and the other 8 bytes store the crit-bit position as well as one bit that indicates whether the current node is internal or not.</p> <p>Thirdly, allocating two nodes at once only requires a single malloc() call. Before, we would call malloc() to allocate a new leaf node, and again to allocate a new internal node.</p> <p>Again, "unrolling" the child pointer lookup produced better benchmarks on my system, namely, instead of <tt>p = p->kid + predicate()</tt>, we write <tt>p = predicate() ? p->kid + 1 : p->kid</tt>. This obviated the need for some fun but complex micro-optimizations. While I was at it, I unrolled a few other routines.</p> <p>Benchmarks:</p> <div> <table rules="all" width="100%" frame="border" cellspacing="0" cellpadding="4"> <caption><b>Table 1. </b>Keys: <tt>/usr/share/dict/words</tt></caption> <thead> <tr> <th align="left" width="33%" valign="top"></th> <th align="left" width="33%" valign="top">old</th> <th align="left" width="33%" valign="top">new</th> </tr> </thead> <tbody> <tr> <td align="left" width="33%" valign="top"><p>insert</p></td> <td align="left" width="33%" valign="top"><p>0.066135</p></td> <td align="left" width="33%" valign="top"><p>0.056116</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>get</p></td> <td align="left" width="33%" valign="top"><p>0.043584</p></td> <td align="left" width="33%" valign="top"><p>0.040845</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>iterate</p></td> <td align="left" width="33%" valign="top"><p>0.031851</p></td> <td align="left" width="33%" valign="top"><p>0.015135</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>allprefixed</p></td> <td align="left" width="33%" valign="top"><p>0.013285</p></td> <td align="left" width="33%" valign="top"><p>0.011924</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>delete</p></td> <td align="left" width="33%" valign="top"><p>0.048030</p></td> <td align="left" width="33%" valign="top"><p>0.041754</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>overhead</p></td> <td align="left" width="33%" valign="top"><p>3966824</p></td> <td align="left" width="33%" valign="top"><p>3173464</p></td> </tr> </tbody> </table> </div> <div> <table rules="all" width="100%" frame="border" cellspacing="0" cellpadding="4"> <caption><b>Table 2. </b>Keys: output of <tt>seq 2000000</tt></caption> <thead> <tr> <th align="left" width="33%" valign="top"></th> <th align="left" width="33%" valign="top">old</th> <th align="left" width="33%" valign="top">new</th> </tr> </thead> <tbody> <tr> <td align="left" width="33%" valign="top"><p>insert</p></td> <td align="left" width="33%" valign="top"><p>2.700949</p></td> <td align="left" width="33%" valign="top"><p>2.359999</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>get</p></td> <td align="left" width="33%" valign="top"><p>2.304070</p></td> <td align="left" width="33%" valign="top"><p>2.214699</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>iterate</p></td> <td align="left" width="33%" valign="top"><p>0.912950</p></td> <td align="left" width="33%" valign="top"><p>0.472594</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>allprefixed</p></td> <td align="left" width="33%" valign="top"><p>0.295322</p></td> <td align="left" width="33%" valign="top"><p>0.264326</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>overhead</p></td> <td align="left" width="33%" valign="top"><p>79999984</p></td> <td align="left" width="33%" valign="top"><p>63999992</p></td> </tr> <tr> <td align="left" width="33%" valign="top"><p>delete</p></td> <td align="left" width="33%" valign="top"><p>2.339102</p></td> <td align="left" width="33%" valign="top"><p>2.177294</p></td> </tr> </tbody> </table> </div> <p>The insert benchmark of my old library is slightly worse than the one I originally posted because I have since tweaked the code to make it easy to carry out operations such as "insert or replace" or "insert if absent" in a single step.</p> <p><a href="https://github.com/blynn/blt">https://github.com/blynn/blt</a></p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com2tag:blogger.com,1999:blog-4222267598459829544.post-82166478200297711602013-12-19T23:41:00.000-08:002013-12-19T23:41:22.810-08:00Reentrant parsers with Flex and Bison<a name="preamble"></a> <p>By default, Flex and Bison generate old-school code with global variables. Trawling the manuals to find the options that generate re-entrant code is tedious, so I’m recording a small example that works on my system (which has Bison 2.7.12 and Flex 2.5.35).</p> <p><strong>Flex preamble</strong></p> <p>With these options, <tt>yylval</tt> is now a pointer. When converting existing Flex source, we mostly replace with <tt>yylval</tt> with <tt>*yylval</tt>.</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>%option outfile="flex.c" header-file="flex.h"<br />%option reentrant bison-bridge<br />%option noyywrap nounput noinput<br /><br />%{<br />#include "bison.h"<br />%}</pre></td></tr></table> <p><strong>Bison preamble</strong></p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>%output "bison.c"<br />%defines "bison.h"<br />%define api.pure full<br />%lex-param { yyscan_t scanner }<br />%parse-param { yyscan_t scanner }<br />%parse-param { val_callback_t callback }<br /><br />%code requires {<br />#include "val.h"<br />#define YYSTYPE val_ptr<br />#ifndef YY_TYPEDEF_YY_SCANNER_T<br />#define YY_TYPEDEF_YY_SCANNER_T<br />typedef void *yyscan_t;<br />#endif<br />}<br /><br />%code {<br />#include "flex.h"<br />int yyerror(yyscan_t scanner, val_callback_t callback, const char *msg) {<br /> return 0;<br />}<br />}</pre></td></tr></table> <p><strong><tt>val.h</tt>: semantic values</strong></p> <p>Rather than use the <tt>%union</tt> Bison declaration or similar, I prefer to define the type that holds the semantic values in a C source file. In general, I like to minimize the amount of C in the Bison and Flex source.</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>enum {<br /> T_INT,<br /> T_STRING,<br />};<br /><br />struct val_s {<br /> int type;<br /> struct {<br /> char *s;<br /> struct val_s **kid;<br /> int nkid;<br /> };<br />};<br />typedef struct val_s *val_ptr;<br />typedef int (*val_callback_t)(val_ptr);</pre></td></tr></table> <p><strong>Calling the parser</strong></p> <p>Because the parser is no longer global, we must initialize and pass a <tt>yyscan_t</tt> variable to Bison and Flex.</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre> yyscan_t scanner;<br /> if (yylex_init(&scanner)) exit(1);<br /> YY_BUFFER_STATE buf = NULL;<br /> // Uncomment to parse from a string instead of standard input.<br /> // buf = yy_scan_string("input string", scanner);<br /> int f(struct val_s *v) {<br /> val_print_pre(v);<br /> putchar('\n');<br /> val_print_tree("", v);<br /> val_free(v);<br /> return 0;<br /> }<br /> if (yyparse(scanner, f)) exit(1);<br /> yy_delete_buffer(buf, scanner);<br /> yylex_destroy(scanner);</pre></td></tr></table> <p><strong>Complete example</strong></p> <p>See <a href="https://github.com/blynn/symple/tree/1b838f658ef601c91b9478d12501918c29d3614d"><tt>https://github.com/blynn/symple/</tt></a>, which reads an expression and pretty-prints it:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>$ ./main 'sin(x)*cos(y) + e^x'<br />+(*(sin(x), cos(y)), ^(e, x))<br />+─┬─*─┬─sin───x<br /> │ └─cos───y<br /> └─^─┬─e<br /> └─x</pre></td></tr></table> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com2tag:blogger.com,1999:blog-4222267598459829544.post-35331593566816723162013-11-21T00:54:00.000-08:002014-04-19T10:08:42.999-07:00To Brute-Force A Mockingbird<a name="preamble"></a> <p><em>To Mock a Mockingbird</em> by Raymond M. Smullyan should be required reading for any fan of the programming language Haskell. We learn combinatory logic through a series of delightful puzzles, almost without realizing.</p> <p>We’re asked to imagine a forest populated by talking birds. On encountering one of these birds, we may call out the name of any bird. In response, the bird will say the name of some bird in the forest. (The reply could be the same bird we named, or the bird’s own name, or any other bird.)</p> <p>An enchanted forest populated by birds is disarmingly endearing. We’re almost unaware we’re actually dealing with a set of functions that take a function as input and return a function. The evocative backdrop also pays homage to Haskell Curry, who was an avid birdwatcher.</p> <p>One puzzle challenges us to find an egocentric bird given that a lark lives in the forest. Or, using mundane terminology, given a combinator L such that (Lx)y = x(yy) for all x and y, construct a combinator E such that EE = E.</p> <p>The author states his solution is “a bit tricky” and consists of 12 correctly parenthesized Ls. Furthermore, the author states he doesn’t know if a shorter solution exists.</p> <p>To maximize the likelihood of solving this puzzle, the reader should take advantage of facts learned from previous puzzles, and build up to the solution in stages. But that’s only if you’re playing fair and using pencil and paper! Instead, I saw an opportunity to bust out one of my favourite snippets of code.</p> <p><strong>Seeing the forest for the trees</strong></p> <p>Let us first recast the problem in terms of trees. Instead of Ls and parentheses, we work with <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">syntax trees</a>. In other words, we work with full binary trees where each leaf node corresponds to an L, and to evaluate an internal node, we recursively evaluate its child nodes, then apply the left child to the right child. (In Smullyan’s terminology, we call out the name of the bird represented by the right child to the bird represented by the left child.)</p> <p>In this setting, the puzzle is to find a full binary tree such that repeatedly transforming parts of the tree according to the rule (Lx)y = x(yy) produces a tree where both of the root node’s children are identical to the original tree.</p> <p>Hence to solve with brute force, we need only generate all full binary trees containing up to 12 leaf nodes, and for each one see if we can transform the tree into two copies of itself.</p> <p>Here’s where my treasured routine comes in. The following roughly describes how to call a function on every full binary tree with exactly <em>n</em> leaf nodes:</p> <ul> <li> <p> Allocate a node <em>x</em>. </p> </li> <li> <p> If <em>n</em> is 1, then mark <em>x</em> as a leaf, call the function, then return. </p> </li> <li> <p> Otherwise mark <em>x</em> as an internal node, and for every 0 < <em>k</em> < <em>n</em>: </p> <ul> <li> <p> For every full binary tree <em>y</em> with <em>k</em> leaf nodes: </p> <ul> <li> <p> Set the left child of <em>x</em> to <em>y</em>. </p> </li> <li> <p> For every full binary tree <em>z</em> with <em>n</em> - <em>k</em> leaf nodes: </p> <ul> <li> <p> Set the right child of <em>x</em> to <em>z</em>. </p> </li> <li> <p> Call the function. </p> </li> </ul> </li> </ul> </li> </ul> </li> </ul> <p>We generate the left and right subtrees by calling this algorithm recursively. More precisely, in Go:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>type node struct {<br /> kind int // 0 = leaf, 1 = branch.<br /> left, right *node<br />}<br /><br />// For each full binary tree with n leaf nodes,<br />// sets '*p' to a pointer to the tree and calls the given function.<br />func forall_tree(p **node, n int, fun func()) {<br /> var t node<br /> *p = &t<br /> if (n == 1) {<br /> t.kind = 0<br /> fun()<br /> return<br /> }<br /> t.kind = 1<br /> for k := 1; k < n; k++ {<br /> forall_tree(&t.left, k, func() {<br /> forall_tree(&t.right, n - k, fun)<br /> })<br /> }<br />}</pre></td></tr></table> <p>I was proud when I found this gem a few years ago while working on a <a href="http://projecteuler.net/">Project Euler</a> problem, though I’d be shocked if it were original. [Actually, my first version preallocated an array of 2n - 1 nodes and used indices instead of pointers save a bit of time and space, but this is less elegant.]</p> <p>For example, we can print the first 10 <a href="http://en.wikipedia.org/wiki/Catalan_number">Catalan numbers</a>:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>func main() {<br /> for k := 1; k <= 10; k++ {<br /> var root *node<br /> n := 0<br /> forall_tree(&root, k, func() { n++ })<br /> println(n)<br /> }<br />}</pre></td></tr></table> <p>Or print all full binary trees with exactly 6 leaf nodes, as parenthesized expressions:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>func tree_print(p *node) {<br /> if p.kind == 1 {<br /> print("(")<br /> tree_print(p.left)<br /> tree_print(p.right)<br /> print(")")<br /> return<br /> }<br /> print("L")<br />}<br /><br />func main() {<br /> var root *node<br /> forall_tree(&root, 6, func() { tree_print(root); println() })<br />}</pre></td></tr></table> <p>With a little more effort, we can write a program that solves the puzzle. However, some care is needed: if we replace subtrees of the form (Lx)y with x(yy) and vice versa without rhyme nor reason, we’ll have no idea when we’ll finish and we’ll only stumble across a solution by chance.</p> <p>Instead, we observe that (Lx)y is either strictly smaller than x(yy), or has the form (Lx)L. Let us say that we are reducing the tree when we replace x(yy) by (Lx)y, and expanding when we perform the reverse. Thus rather than start from a tree t and repeatedly applying the rule to obtain the tree tt, we do the reverse: we start from tt, and consider reductions only. The above observation implies that every sequence of reductions must terminate eventually.</p> <p>But what if we need to temporarily expand tt before reducing it in order to reach t? Let’s optimistically hope that Smullyan’s 12-L solution was sufficiently expanded; that is, only reductions are needed to go from tt to t, where t is his solution.</p> <p>Multiple subtrees may be reducible, and choosing the wrong one prevents future reductions necessary to reach the goal. We therefore try every path: for each possible reduction, we apply it and recurse. This leads to a problem of wastefully repeating many computations because there can be several ways to arrive at the same tree. We tackle this in an obvious manner: by remembering the trees we’ve seen so far.</p> <p>I wrote solutions in GNU C and Go. The Go solution is a bit too slow for comfort. The C code is slightly clumsier mainly because I had to name the anonymous functions (though one can <a href="http://en.wikipedia.org/wiki/Anonymous_function#GCC">define a macro to work around this</a>). C also lacks a map data structure, but this was no problem thanks to my recently released <a href="http://github.com/blynn/blt">BLT library</a>.</p> <p><strong>Results (Spoiler Alert)</strong></p> <p>Optimism paid off. On my laptop, my C program took well under a minute to find 4 solutions:</p> <pre>(((L(LL))(L(LL)))((L(LL))(L(LL))))<br />(((L(LL))((LL)L))((L(LL))((LL)L)))<br />(((L(LL))((LL)L))(((LL)L)((LL)L)))<br />((((LL)L)((LL)L))(((LL)L)((LL)L)))</pre><p>The Go version took about 10 minutes.</p> <p>These all contain 12 Ls, so in a sense, Smullyan’s solution is minimal. Since no other strings are printed, these four 12-L solutions are minimal when only reductions are permitted.</p> <p>If we allow expansions (that is, (Lx)y → x(yy)), then firstly, we have at least 2<sup>4</sup> = 16 solutions of length 12, since in this case (L(LL)) and (LL(L)) are interchangeable, and secondly, we can reduce the above strings to find shorter solutions. For example, the solution:</p> <pre>(((L(LL))(L(LL)))((L(LL))(L(LL))))</pre> <p>reduces to:</p> <pre>(L((L(LL))(L(LL))))(L(LL))</pre> <p>which further reduces to:</p> <pre>((LL)(L(LL)))(L(LL))</pre> <p>which only has 8 Ls. I doubt Smullyan missed this. My guess is he meant that if you solve the problem the clever way, then you arrive at an expression with 12 Ls; reductions should be ignored because they only obscure the ingenuity of the solution.</p> <p>Is there, say, a 7-L expression that expands to some expression (that is necessarily longer than 24 Ls) which can be reduced to half of itself? I think not, but I have no proof.</p> <p><strong>Exercise: Four Fours</strong></p> <p>I wanted to do something with the Go version of the the <tt>forall_tree()</tt> routine, so I tweaked it to solve <a href="http://en.wikipedia.org/wiki/Four_fours">the four fours puzzle</a>. I just ploughed through all possible trees and evaluated each one; there are only 320 to do. For larger variants of the puzzle, I’d use dynamic programming; that is, memoize subtrees and their values. Division by zero is annoying, but I managed to keep the tree evaluation function short and sweet by using Go’s panic-recover mechanism.</p> <p><strong>Recursion: the best thing since recursion</strong></p> <p>The <tt>forall_tree()</tt> function is but one example of the eloquence of anonymous functions and recursion. For similar reasons, nested functions are also indispensable. We attain economy of expression by letting the stack automatically take care of the heavy lifting.</p> <p>Curiously, <a href="http://en.wikipedia.org/wiki/Nested_function">early structured languages including ALGOL, Simula, and Pascal supported nested functions</a>, but C shied away from this beautiful feature.</p> <p>Its absence is sorely missed, as can be seen in C-style callbacks. An ugly <tt>void *</tt> argument is inevitably passed and cast. For instance, take <a href="http://www.libsdl.org/release/SDL-1.2.15/docs/html/guideaudioexamples.html">the <tt>udata</tt> parameter in the SDL audio API</a>.</p> <p>Its absence is also egregiously gratuitous. <a href="http://gcc.gnu.org/onlinedocs/gccint/Trampolines.html">GCC supports nested functions by using this one weird trick</a> [compiler writers hate him!]. One might complain this weird trick conflicts with executable space protection, but <a href="http://crypto.stanford.edu/~blynn/rop/">executable space protection is worse than pointless thanks to return-oriented programming</a>.</p> <p><strong>Code Complete</strong></p> <ul> <li> <p> <a href="https://github.com/blynn/mockingbird">https://github.com/blynn/mockingbird</a> </p> </li> </ul> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-27393152555709556032013-11-13T22:55:00.000-08:002013-11-13T22:59:32.569-08:00Chocolate, Logic Puzzles, and Dancing Links<a name="preamble"></a> <p>Early last year, I visited <a href="https://plus.google.com/108526534018923770979/about?gl=us&hl=en">a cafe that awards chocolate for solving logic puzzles</a>. Naturally, I couldn’t resist free chocolate, and afterwards, just as naturally, I couldn’t resist thinking about programs that solve logic puzzles.</p> <p>I’ve had to write such programs before for homework assignments or to test out frameworks. But oddly, I had never put much effort into it. I loved logic puzzles as a kid, even going so far as to buy a magazine or two that were filled with grids and clues. Why hadn’t I already written a decent tool to help?</p> <p>Better late than never. After a spate of intense coding, I had a program that read clues in a terse text format and used brute force to find the solution. I spent most of my time devising the mini-language for the clues rather than the algorithm, as I figured the bottleneck would be the human entering the clues.</p> <p>My solver worked well enough on a few examples, including the puzzle I solved to get a chocolate. But then I tried my program on <a href="http://en.wikipedia.org/wiki/Zebra_Puzzle">the Zebra Puzzle</a>. I was too impatient to let it finish. After a little thought, it was clear why.</p> <p><strong>On the grid</strong></p> <p>Logic grid puzzles can be abstracted as follows. We are given a table with M rows and N columns, and each cell contains a unique symbol. Our goal is to rearrange the symbols within each row except for those in the first row so that given constraints are satisfied. To be clear, symbols must stay in the row they started in, but apart from the first row, they can change places with other symbols in their row.</p> <p>Some examples of constraints:</p> <ul> <li> <p> symbols A and B must be in the same column. </p> </li> <li> <p> symbols A, B, and C must be in distinct columns. </p> </li> <li> <p> symbol A’s column must be exactly one to the left of symbol B’s column. </p> </li> </ul> <p>We fix the first row because of contraints such as the last example: clues often refer to the order of the elements in the first row in the input table. Let us call them <em>order contraints</em>. This inspires the following convenient relabeling. Without loss of generality, let the symbols of the first row be 1, …, N from left to right.</p> <p>My brute force solver generates every possible table and prints the ones that satisfy all given constraints. That means it has to examine up to N!<sup>(M-1)</sup> cases: there are N! permutations of the symbols in each row, and we have M-1 rows to rearrange. For the Zebra Puzzle, this is 120<sup>5</sup>.</p> <p><strong>Got it covered</strong></p> <p>I needed a smarter approach. Since I had already coded <a href="http://benlynn.blogspot.com/2005/12/solving-sudoku.html">a sudoku solver</a>, I chose the same approach, namely, represent a puzzle as an instance of the exact cover problem, then use <a href="http://en.wikipedia.org/wiki/Dancing_Links">the dancing links algorithm</a>.</p> <p>Firstly, we populate a set X with all the symbols in the table. Next, instead of generating every table, generate every possible column. Each such column corresponds to the subset of X consisting of the symbols in the column. Generating every possible column means brute force is still present, but to a lesser degree.</p> <p>An exact cover of X corresponds to a collection of columns such that no two columns contain the same symbol, and furthermore, each symbol appears in one of the columns. Thus these columns can be joined together to form a candidate solution to the logic puzzle: we merely order them so that the first row reads 1, …, N.</p> <p>It remains to disallow covers that violate the constraints. For some constraints, we achieve this by omitting certain columns. For example, suppose A and B must be in the same column. When we generate a column that only contains one of A or B, we omit it from the exact cover instance. Similarly, if a constraint requires that A and B lie in distinct columns, we omit subsets that contain both symbols from the exact cover instance.</p> <p><strong>Out of order</strong></p> <p>The above suffices for many puzzles, but falls short for those with order constraints such as "A and B lie in adjacent columns". For this particular constraint, we add N elements to X. Let us call them x[1], …, x[N]. Given a generated column whose first row contains the number n (recall we have relabeled so that the first row of the input table is 1, …, N), if the column contains:</p> <ul> <li> <p> both A and B: eliminate the column from consideration. </p> </li> <li> <p> A and not B: we add x[n] to the column’s corresponding subset. </p> </li> <li> <p> B and not A: we add x[k] to the column’s corresponding subset for all k in 1..N where |n-k| is not 1. </p> </li> </ul> <p>Lastly, we mark x[1] ,…, x[N] as optional, meaning that they need not be covered by our solution. (We could instead augment our collection of subsets with singleton subsets {x[n]} for each n, but with dancing links there’s a faster way.)</p> <p>Thus any exact cover must have A and B in adjacent columns to avoid conflicts in the x[1], …, x[N] elements. We can handle other order constraints in a similar fashion.</p> <p>The size of X is the number of symbols, MN, plus N for each order constraint. The number of subsets is the number of possible columns, which is N<sup>M</sup>, because we have M rows, and each can be one of N different symbols. Each subset has size M, one for each row, plus up to N more elements for each order constraint that involves it.</p> <p><strong>That’ll do</strong></p> <p>Though N<sup>M</sup> may be exponential in input size, typical logic grid puzzles are so small that my program solves them in a few milliseconds on my laptop. The bottleneck is now indeed the human entering the clues. [I briefly thought about writing a program to automate this by searching for keywords in each sentence.]</p> <p>I was annoyed the above algorithm is adequate. Firstly, trying entire columns in a single step bears no resemblance to actions performed by human solvers. Secondly, it was too easy: I wish I were forced to think harder about the algorithm! Perhaps I’ll devise a larger puzzle that demands a better approach.</p> <p>Around this time I lost focus because I felt my old code could use a few edits. I got sidetracked rewriting my dancing-links sudoku solver and comparing it against other ways of solving sudoku, and soon after that I moved on.</p> <p>Luckily for my code, I recently felt compelled to clean it up enough for public release. It seemed a pity to let it languish in a forgotten corner until rendered useless from bit rot and lack of documentation.</p> <p>The DLX library is now available at a git repository near you:</p> <ul> <li> <p> <a href="https://github.com/blynn/dlx">https://github.com/blynn/dlx</a> </p> </li> <li> <p> <a href="https://code.google.com/p/blynn-dlx/">https://code.google.com/p/blynn-dlx/</a> </p> </li> </ul> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-57871330027631163832013-11-12T23:24:00.000-08:002013-11-12T23:31:21.594-08:00Lies, damned lies, and frequentist statistics<a name="preamble"></a> <p>Earlier this year <a href="http://benlynn.blogspot.com/2013/02/probability-made-less-uneasy.html">I rekindled an interest in probability theory</a>. In my classes, Bayes' theorem was little more than a footnote, and we drilled frequentist techniques. Browsing a few books led me to question this. In particular, though parts of Jaynes' "Probability Theory: The Logic of Science" sounded like a conspiracy theory at first, I was soon convinced that the author’s militant condemnation of frequentism was justified.</p> <p>Today, I had the pleasure of reading <a href="http://www.nature.com/news/weak-statistical-standards-implicated-in-scientific-irreproducibility-1.14131">a <em>Nature</em> article</a> about <a href="http://www.pnas.org/content/early/2013/10/28/1313476110.full.pdf">a paper by Valen E. Johnson directly comparing Bayesian and frequentist methods in scientific publications</a>, who suggests the latter is responsible for a plague of irreproducible findings. I felt vindicated; or rather, I felt I had several more decibels of evidence for the hypothesis that Bayesian methods produce far better results than frequentist methods when compared against the hypothesis that the two methods produce equivalent results!</p> <p><a href="http://telescoper.wordpress.com/2013/11/12/the-curse-of-p-values/">This post explains it well</a>. In short, frequentist methods have led to bad science.</p> <p>An apologist might retort that it’s actually the fault of bad scientists, who are misusing the methods due to insufficient understanding of the theory. There may be some truth here, but I still argue that Bayesian probability should be taught instead. I need only look at my undergraduate probability and statistics textbook. On page 78, I see the 0.05 P-value convention castigated by Johnson, right after recipe-like instructions for computing a P-value. If other textbooks are similar, no wonder scientists are robotically misapplying frequentist procedures and generating garbage.</p> <p>Johnson’s recommended fix of using 0.005 instead 0.05 is curious. I doubt it has firm theoretical grounding, but perhaps the nature of data that most scientists collect mean that this rule of thumb will usually work well enough. Though perhaps striving for the arbitrary 0.005 standard may require excessive data: a Bayesian method might yield similar results with less input. I guess it’s an expedient compromise. Those with poor understanding of statistical inference can still obtain decent results, at the cost of gathering more data than necessary.</p> <p>The above post also mentions <a href="http://arxiv.org/abs/1310.3791">a paper describing how even a correctly applied frequentist technique leads to radically different inferences from a Bayesian one</a>. The intriguing discussion within is beyond me, but I’m betting Bayesian is better; or rather, the prior I’d assign to the probability that Bayesian inference will one day shown to be better is extremly close to one!</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-51002414808380960602013-11-03T23:33:00.000-08:002013-11-03T23:33:06.133-08:00Crit-bit tree micro-optimizations<a name="preamble"></a> <p>Long ago, I implemented <a href="http://en.wikipedia.org/wiki/Radix_tree">crit-bit trees (one of the many names of this data structure)</a> after skimming a few online descriptions. I made obvious choices and put little effort into optimization. It worked well enough for my projects.</p> <p>Earlier this year, I studied <a href="https://github.com/agl/critbit">Adam Langley’s crit-bit tree library</a>, and was inspired to write a new crit-bit tree library from scratch. Micro-optimizations are fun! And surely my rebooted library would be faster thanks to ingenious techniques such as <a href="http://en.wikipedia.org/wiki/Tagged_pointer">tagged pointers</a> and fancy <a href="http://graphics.stanford.edu/~seander/bithacks.html">bit-twiddling</a> (notably a <a href="http://en.wikipedia.org/wiki/SWAR">SWAR hack</a> to find the critical bit).</p> <p>On my machine, the benchmarks show improvement in space <strong>and</strong> time. Iteration is much slower since I opted to forgo linked-list pointers <a href="http://cr.yp.to/critbit.html">as advocated by Bernstein</a>. Without them, we must travel up and down the tree to go from one leaf node to the next.</p> <p>However, an application wishing to visit every element should not do this: it should instead call the <tt>allprefixed</tt> routine with a blank prefix. The difference between <tt>allprefixed</tt> and the my original library’s <tt>iterate</tt> is small enough that I’m inclined to agree with Bernstein: we’re better off without auxiliary next and previous pointers.</p> <p>When I modified the benchmark program to measure the classes of C++, I changed <tt>char*</tt> to <tt>string</tt>, and an array to a vector which are natural choices for C++ and should not significantly hurt running times. As theory suggests, <tt>map</tt> insertion and lookup is slower, while <tt>unordered_map</tt> is much faster; as long as the drawbacks of the latter are borne in mind, it may be a reasonable choice for certain applications.</p> <p>I was pleased my rewritten library sometimes seemed a touch faster than the library that inspired it ("critbit" in the tables below), especially since my library implements a map and not just a set. The extra optimizations it performs are listed in comments in the C file. My guess is that most of them do almost nothing, and "unrolling" the child pointer lookup is largely responsible for the gains.</p> <p>I’m naming my code <a href="https://github.com/blynn/blt">the BLT library</a> for now. I already used up my first choice in my old "cbt" library. Though immodest, "Ben Lynn Trees" is easy for me to remember. Also, the name coincides with <a href="http://en.wikipedia.org/wiki/BLT">a delicious sandwich</a>.</p> <p>In the following tables, all entries are in seconds, except for the overhead which is measured in bytes. I used <a href="https://code.google.com/p/gperftools/">tcmalloc</a> to get the best numbers.</p> <div> <table rules="all" width="100%" frame="border" cellspacing="0" cellpadding="4"> <caption><b>Table 1. </b>Keys: <tt>/usr/share/dict/words</tt></caption> <thead> <tr> <th align="left" width="16%" valign="top"></th> <th align="left" width="16%" valign="top">BLT</th> <th align="left" width="16%" valign="top">cbt</th> <th align="left" width="16%" valign="top">critbit</th> <th align="left" width="16%" valign="top">map</th> <th align="left" width="16%" valign="top">unordered_map</th> </tr> </thead> <tbody> <tr> <td align="left" width="16%" valign="top"><p>insert</p></td> <td align="left" width="16%" valign="top"><p>0.062939</p></td> <td align="left" width="16%" valign="top"><p>0.085081</p></td> <td align="left" width="16%" valign="top"><p>0.070779</p></td> <td align="left" width="16%" valign="top"><p>0.086181</p></td> <td align="left" width="16%" valign="top"><p>0.050796</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>get</p></td> <td align="left" width="16%" valign="top"><p>0.044096</p></td> <td align="left" width="16%" valign="top"><p>0.046323</p></td> <td align="left" width="16%" valign="top"><p>0.053218</p></td> <td align="left" width="16%" valign="top"><p>0.077201</p></td> <td align="left" width="16%" valign="top"><p>0.034249</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>allprefixed</p></td> <td align="left" width="16%" valign="top"><p>0.013312</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p>0.015073</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>iterate</p></td> <td align="left" width="16%" valign="top"><p>0.031755</p></td> <td align="left" width="16%" valign="top"><p>0.006420</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p>0.006167</p></td> <td align="left" width="16%" valign="top"><p>0.004334</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>delete</p></td> <td align="left" width="16%" valign="top"><p>0.048093</p></td> <td align="left" width="16%" valign="top"><p>0.050687</p></td> <td align="left" width="16%" valign="top"><p>0.051923</p></td> <td align="left" width="16%" valign="top"><p>0.009572</p></td> <td align="left" width="16%" valign="top"><p>0.011820</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>overhead</p></td> <td align="left" width="16%" valign="top"><p>3966824</p></td> <td align="left" width="16%" valign="top"><p>6346992</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> </tr> </tbody> </table> </div> <div> <table rules="all" width="100%" frame="border" cellspacing="0" cellpadding="4"> <caption><b>Table 2. </b>Keys: output of <tt>seq 2000000</tt></caption> <thead> <tr> <th align="left" width="16%" valign="top"></th> <th align="left" width="16%" valign="top">BLT</th> <th align="left" width="16%" valign="top">cbt</th> <th align="left" width="16%" valign="top">critbit</th> <th align="left" width="16%" valign="top">map</th> <th align="left" width="16%" valign="top">unordered_map</th> </tr> </thead> <tbody> <tr> <td align="left" width="16%" valign="top"><p>insert</p></td> <td align="left" width="16%" valign="top"><p>2.675797</p></td> <td align="left" width="16%" valign="top"><p>3.048633</p></td> <td align="left" width="16%" valign="top"><p>2.683036</p></td> <td align="left" width="16%" valign="top"><p>3.175450</p></td> <td align="left" width="16%" valign="top"><p>1.457466</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>get</p></td> <td align="left" width="16%" valign="top"><p>2.328416</p></td> <td align="left" width="16%" valign="top"><p>2.477317</p></td> <td align="left" width="16%" valign="top"><p>2.510085</p></td> <td align="left" width="16%" valign="top"><p>2.913757</p></td> <td align="left" width="16%" valign="top"><p>0.919519</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>allprefixed</p></td> <td align="left" width="16%" valign="top"><p>0.296389</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p>0.303059</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>iterate</p></td> <td align="left" width="16%" valign="top"><p>0.917041</p></td> <td align="left" width="16%" valign="top"><p>0.171143</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p>0.193081</p></td> <td align="left" width="16%" valign="top"><p>0.131707</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>delete</p></td> <td align="left" width="16%" valign="top"><p>2.352568</p></td> <td align="left" width="16%" valign="top"><p>2.522654</p></td> <td align="left" width="16%" valign="top"><p>2.250305</p></td> <td align="left" width="16%" valign="top"><p>0.299835</p></td> <td align="left" width="16%" valign="top"><p>0.316262</p></td> </tr> <tr> <td align="left" width="16%" valign="top"><p>overhead</p></td> <td align="left" width="16%" valign="top"><p>79999984</p></td> <td align="left" width="16%" valign="top"><p>128000048</p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> <td align="left" width="16%" valign="top"><p></p></td> </tr> </tbody> </table> </div> <p>Try BLT now!</p> <ul> <li> <p> <a href="https://github.com/blynn/blt">https://github.com/blynn/blt</a> </p> </li> </ul> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-8145550492928780402013-07-21T21:16:00.000-07:002013-07-21T21:16:45.059-07:00Inheritance Quiz<a name="preamble"></a> <ol type="1"> <li> <p> This code uses implementation inheritance. Where’s the bug? </p> <div> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>class Point {<br /> int x, int y;<br /> Point(int x, int y) {<br /> this.x = x;<br /> this.y = y;<br /> display();<br /> }<br /> void display() {<br /> System.out.println(x + " " + y);<br /> }<br />}<br /><br />class CPoint extends Point {<br /> Color c;<br /> CPoint(int x, int y, Color c) {<br /> super(x, y);<br /> this.c = c;<br /> }<br /> void display() {<br /> System.out.println(x + " " + y + " " + c.name());<br /> }<br />}</pre></td></tr></table> </div> </li> <li> <p> This code uses implementation inheritance. Where’s the bug? </p> <div> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>/** A version of Hashtable that lets you do<br /> * table.put("dog", "canine");, and then have<br /> * table.get("dogs") return "canine". **/<br /><br />public class HashtableWithPlurals extends Hashtable {<br /><br /> /** Make the table map both key and key + "s" to value. **/<br /> public Object put(Object key, Object value) {<br /> super.put(key + "s", value);<br /> return super.put(key, value);<br /> }<br />}</pre></td></tr></table> <p><strong>Hint</strong>: After <tt>put("dog", foo)</tt>, although "dogs" is in the table as expected, sometimes "dogss" is also in the table. Why?</p> </div> </li> <li> <p> Consider this object buffer in Java: </p> <div> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>public class Buffer {<br /> protected Object[] buf;<br /> protected int MAX;<br /> protected int current = 0;<br /><br /> Buffer(int max) {<br /> MAX = max;<br /> buf = new Object[MAX];<br /> }<br /> public synchronized Object get()<br /> throws Exception {<br /> while (current<=0) { wait(); }<br /> current--;<br /> Object ret = buf[current];<br /> notifyAll();<br /> return ret;<br /> }<br /> public synchronized void put(Object v)<br /> throws Exception {<br /> while (current>=MAX) { wait(); }<br /> buf[current] = v;<br /> current++;<br /> notifyAll();<br /> }<br />}</pre></td></tr></table> <p>Use inheritance to extend this class to support the <tt>gget()</tt> method, which is identical to <tt>get()</tt> except it cannot be executed immediately after a <tt>get()</tt>. In other words, it blocks until a <tt>put()</tt> or a <tt>gget()</tt> finishes.</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>public class HistoryBuffer extends Buffer {<br /> // What goes here?<br />}</pre></td></tr></table> </div> </li> </ol> <hr> <h2><a name="_answers"></a>Answers</h2> <ol type="1"> <li> <p> See "<a href="http://www.cs.cornell.edu/andru/papers/masked-types.html">Masked types for sound object initialization</a>" for the answer. Did you know initialization of superclasses in Java and C++ is unsound? </p> <div> <p>In Go, which lacks inheritance, we might write:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>type Point struct {<br /> x, y int<br />}<br /><br />func NewPoint(int x, int y) *Point {<br /> p := &Point{x, y}<br /> p.Display()<br /> return p<br />}<br /><br />func (p Point) Display() {<br /> println(p.x, p.y)<br />}<br /><br />type CPoint struct {<br /> x, y int<br /> c Color<br />}<br /><br />func NewCPoint(int x, int y, Color c) *CPoint {<br /> p := &Cpoint{x, y, c}<br /> p.Display()<br /> return p<br />}<br /><br />func (p CPoint) Display() {<br /> println(p.x, p.y, c.Name())<br />}</pre></td></tr></table> <p>We must write factory methods instead of constructors. However, the language makes it easy to avoid the bug.</p> </div> </li> <li> <p> This question was taken from "<a href="http://norvig.com/java-iaq.html">The Java IAQ</a>", which also explains the answer. </p> <div> <p>In Go, which lacks inheritance, we might write:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>type PluralsTable struct {<br /> *Table tab;<br />};<br /><br />func (t *PluralsTable) Put(key string, value interface{}) interface{} {<br /> t.tab.Put(key + "s", value)<br /> return t.tab.Put(key, value)<br />}<br /><br />func (t *PluralsTable) Get(key string) interface{} {<br /> return t.tab.Get(key)<br />}<br /><br />...</pre></td></tr></table> <p>We must define a wrappper for each method of Table that we want PluralsTable to support. Additionally, if we have code that should work on PluralsTable and Table, we must define an <tt>interface</tt> with Put and Get. However, the language makes it easy to avoid the surprise recursion bug.</p> <p>We also avoid other kinds of bugs. For example, suppose we add PutFoo() to Table, which inserts "foo" into the table in a fast but strange way: the hash of "foo" is precomputed, so that the entry can be directly inserted into the underlying array.</p> <p>With inheritance, calling PutFoo() on a PluralsTable will silently succeed, but neglect to insert "foos", a bug that might only be noticed long after a release. Without inheritance, the program will fail to compile when code calls PutFoo() on a PluralsTable, at which point a human can intervene and supply the PluralsTable edition of PutFoo().</p> </div> </li> <li> <p> Did you manage it without rewriting the entire class? If so, congratulations on solving <a href="http://eprints.soton.ac.uk/262297/1/anomalySurvey.pdf">the inheritance anomaly</a>! Otherwise, you probably wrote something akin to the solution given in that paper: </p> <div> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>public class HistoryBuffer extends Buffer {<br /> boolean afterGet = false;<br /><br /> public HistoryBuffer(int max) { super(max); }<br /><br /> public synchronized Object gget()<br /> throws Exception {<br /> while ((current<=0)||(afterGet)) {<br /> wait();<br /> }<br /> afterGet = false;<br /> return super.get();<br /> }<br /> public synchronized Object get()<br /> throws Exception {<br /> Object o = super.get();<br /> afterGet = true;<br /> return o;<br /> }<br /> public synchronized void put(Object v)<br /> throws Exception {<br /> super.put(v);<br /> afterGet = false;<br /> }<br />}</pre></td></tr></table> <p>Implementation inheritance mixes poorly with concurrent programming.</p> <p>In Go, this question takes a different character because concurrent object buffers are built-in types:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>type Buffer chan interface{}<br /><br />func NewBuffer(max int) Buffer {<br /> return make(chan interface{}, max)<br />}<br /><br />func (buf Buffer) Get() interface{} {<br /> return <-buf<br />}<br /><br />func (buf Buffer) Put(v interface{}) {<br /> buf <- v<br />}</pre></td></tr></table> <p>One solution to the HistoryBuffer problem is:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>type HistoryBuffer struct {<br /> ch chan interface{}<br /> get, gget, put, done chan bool<br />}<br /><br />func NewHistoryBuffer(max int) *HistoryBuffer {<br /> buf := &HistoryBuffer{<br /> make(chan interface{}, max),<br /> make(chan bool), make(chan bool), make(chan bool), make(chan bool),<br /> }<br /> go func() { // Synchronization logic.<br /> maybe_gget := buf.gget<br /> for {<br /> select {<br /> case <-maybe_gget:<br /> case <-buf.put:<br /> maybe_gget = buf.gget<br /> case <-buf.get:<br /> maybe_gget = nil<br /> }<br /> <-buf.done<br /> }<br /> }()<br /> return buf<br />}<br /><br />func (buf *HistoryBuffer) Get() interface{} {<br /> buf.get <- true<br /> v := <-buf.ch<br /> buf.done <- true<br /> return v<br />}<br /><br />func (buf *HistoryBuffer) Put(v interface{}) {<br /> buf.put <- true<br /> buf.ch <- v<br /> buf.done <- true<br />}<br /><br />func (buf *HistoryBuffer) GGet() interface{} {<br /> buf.gget <- true<br /> v := <-buf.ch<br /> buf.done <- true<br /> return v<br />}</pre></td></tr></table> <p>Most of the synchronization logic resides in the anonymous function. Outside, we only have channel writes at the beginning and end of each method, comparable to Java’s <tt>synchronized</tt> keyword. Thus behaviour is decoupled from synchronization.</p> <p>We can effortlessly and independently change the synchronization logic. Suppose now GGet() can only be called after 3 Put() calls and 2 Get() calls have completed in some order since the previous GGet(), or, if there are no previous GGet() calls, since the program started. We simply change the anonymous function:</p> <table border="0" bgcolor="#e8e8e8" width="100%" cellpadding="10"><tr><td> <pre>go func() {<br /> var maybe_gget chan bool<br /> p, g := 0, 0<br /> for {<br /> select {<br /> case <-maybe_gget:<br /> p, g = 0, 0<br /> maybe_gget = nil<br /> case <-buf.put:<br /> p++<br /> case <-buf.get:<br /> g++<br /> }<br /> if p >= 3 && g >= 2 {<br /> maybe_gget = buf.gget<br /> }<br /> <-buf.done<br /> }<br />}()</pre></td></tr></table> <p>The <a href="http://golang.org/doc/effective_go.html">official Go documentation</a> suggests channels of ints instead of bools, which leads to slightly smaller code (mainly because we can replace <tt>true</tt> with <tt>1</tt>), and probably results in the same compiled binaries.</p> </div> </li> </ol> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com1tag:blogger.com,1999:blog-4222267598459829544.post-43089638311811701692013-03-14T00:04:00.000-07:002013-03-14T00:04:45.571-07:00MathJax<a name="preamble"></a> <p>About a decade ago, I began putting <a href="http://crypto.stanford.edu/pbc/notes/">my notes</a> on my homepage for the reasons cloud computing proponents love to spout (though I did it without uttering any buzzwords).</p> <p>But I hit a snag. How do I put equations on the web? Among the many awful workarounds, I picked the one which I thought was noblest: MathML. My pages would be static content; operable without JavaScript. Text is far slimmer than images, and are far more agreeable to things like searching. As for PDF? Over my dead <body> element!</p> <p>I was optimistic back then. Mozilla supported MathML provided you also downloaded a font or two, and despite the crushing dominance of Internet Explorer, I felt that righteous Free Software would ultimately triumph. One day, I hoped, a typical browser would render my site perfectly, out of the box.</p> <p>Turns out my predictions were half right. The web broke free of Internet Explorer’s chokehold. Now, more often than not, we use open source browsers. And one of them, Firefox, supports MathML out of the box.</p> <p>However, my mathematics notes still render incorrectly on most browsers. Popular search engines appear to shun them, possibly because I zealously followed the arcane XHTML 1.1 plus MathML guidelines. And everything supports JavaScript.</p> <p>Maybe they’re all going to support real soon, but ten years is too long for me. I switched to <a href="http://www.mathjax.org">MathJax</a>, a clever JavaScript library that figures out what your system can do, then renders the equations using an appropriate technique. It just works.</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-80518521699425683852013-02-18T01:32:00.000-08:002018-11-16T13:09:17.688-08:00Probability Made Less Uneasy<a name="preamble"></a> <p>I’ve been leafing through a few books on probability, a subject which I’ve mostly avoided since undergrad. Originally thinking I’d just refresh what I already learned, to my surprise I was led to reconsider fundamental beliefs. What follows is my journey told via book reviews.</p> <hr> <h2><a name="_em_hexaflexagons_and_other_mathematical_diversions_em_by_martin_gardner"></a><em>Hexaflexagons and Other Mathematical Diversions</em> by Martin Gardner</h2> <p>As a kid, I devoured this book and the others in the series, which I later learned were collections of <em>Mathematical Games</em> columns from <em>Scientific American</em> magazine. I didn’t always understand the material, and the puzzles were often too difficult, but Gardner’s writing skill kept me reading on.</p> <p>Among the many fascinating chapters was “Probability Paradoxes”. Gardner’s ability to communicate was so strong that after many years I still remember much of the content. In particular, he asked:</p> <blockquote> <p>Mr. Smith says, "I have two children and at least one of them is a boy." What is the probability that the other child is a boy?</p> <p align="right"> </p> </blockquote> <p>and his explanation of 1/3 being the correct answer not only stuck in my mind, but shaped my early views on probability. For the details, see this <a href="http://www.newscientist.com/article/dn18950-magic-numbers-a-meeting-of-mathemagical-tricksters.html"><em>New Scientist</em> article on a Martin Gardner convention</a>.</p> <p>Only a few years ago, after a debate with a friend, did I reconsider the reasoning. It turns out <a href="http://www.sciencenews.org/view/generic/id/60598/description/When_intuition_and_math_probably_look_wrong">Gardner’s statement of the problem is ambiguous</a>. This revelation sparked a desire to hit the books and brush up on probability one day.</p> <hr> <h2><a name="_em_a_primer_of_statistics_em_by_m_c_phipps_and_m_p_quine"></a><em>A Primer of Statistics</em> by M.C. Phipps and M.P. Quine</h2> <p>The second edition of this slim volume was the textbook for my first course on probability. I used it to cram for exams. For this purpose, it was good: I got decent grades.</p> <p>Sadly, it wasn’t as good in other respects. I acquired a distaste for the subject. Why did Probability and Statistics seem like a bag of ad hoc tricks, with few explanations given? Do I have poor intuition for it? Or is it glorified guesswork that seems to work well enough with real-life data? Whatever the reason, I decided that for the rest of my degree I’d steer towards the Pure Mathematics offerings.</p> <hr> <h2><a name="_em_the_signal_and_the_noise_why_so_many_predicitons_fail_8201_8212_8201_but_some_don_8217_t_em_by_nate_silver"></a><em>The Signal and the Noise: Why So Many Predicitons Fail — but Some Don’t</em> by Nate Silver</h2> <p>My renewed interest in probability was also sparked by the United States presidential election of 2012, or rather, its aftermath. Many had predicted its outcome but few were accurate.</p> <p>It was only then I read about Nate Silver, who turned out to have been famous for his prowess with predictions for quite some time. Eager to learn more, I thumbed through his bestseller.</p> <p>Though necessarily light on theory, the equations that do appear are correct and lucidly explained. Also, the pages are packed with interesting data sets and anecdotes. General pronouncements are often backed up with concrete tables and graphs, though, as Silver readily admits, some qualities are difficult to quantify, resulting in potentially dubious but novel yardsticks (such as measuring scientific progress by average research and development expenditure per patent).</p> <p>But most of all, I was intrigued by the tale of an ongoing conflict that I never knew existed, with frequentists on one side and Bayesians on the other. They never told me this in school!</p> <p>I soon found out why: Silver states that Fisher may almost be single-handedly to blame for the dominance of frequentism, the ideology foisted upon me when I was just out of high school. Sure enough, I went back and confirmed Phipps and Quine listed Fisher in the bibliography.</p> <hr> <h2><a name="_em_against_the_gods_the_remarkable_story_of_risk_em_by_peter_l_bernstein"></a><em>Against the Gods: The Remarkable Story of Risk</em> by Peter L. Bernstein</h2> <p>My dad told me about this book. Technical details are scant as it is also aimed at the general public. But in contrast to Silver’s work, what little that appears is laughably erroneous. In some sections, I felt the author was trying to trick himself into believing fallacies.</p> <p>The misinformation might be mostly harmless. Those with weak mathematical ability are going to skip the equations out of fear, and those with strong mathematical ability are probably also going to skip them because they already know them.</p> <p>But conceivably this book could be a gifted reader’s first introduction to probability, and it’d be a shame to start off on the wrong foot. As a sort of public service, I’ll explain some of the gaffes.</p> <p><strong>Exercises</strong></p> <p>Chapter 6 contains an example expected value calculation involving a coin flip.</p> <blockquote> <p>We multiply 50% by one for heads and do the same for the tails, take the sum---100%---and divide by two. The expected value of betting on a coin toss is 50%. You can expect either heads or tails, with equal likelihood.</p> <p align="right"> </p> </blockquote> <p>Why is this wrong? How can we fix it?</p> <p>The next example involves rolling two dice.</p> <blockquote> <p>If we add the 11 numbers that might come up…the total works out to 77. The expected value of rolling two dice is 77/11, or exactly 7.</p> <p align="right"> </p> </blockquote> <p>Why is this wrong? How can we fix it?</p> <p><strong>What’s the difference?</strong></p> <p>Bernstein and Silver offer competing reasons why modern civilization differs from the past. Bernstein singles out our relatively newfound ability to quantify risk, and also suggests that key intermediate steps could only have occurred at certain points in history due to the overall mood of the era.</p> <p>In contrast, Silver seems to place most importance on the printing press. In an early chapter, Silver suggests that after some teething trouble (lasting 330 years), the printing press paved the way for modern society. Apart from distribution of knowledge, perhaps more importantly the printing press helped with the preservation of knowledge; previously, writing would often be lost before it could be copied.</p> <p>I’m inclined to side with Silver, partly because of Bernstein’s basic technical mistakes. After observing how fast and loose Bernstein was playing with mathematics, I’m tempted to believe some of his statements are gut feelings.</p> <p>There is another glaring difference. Bernstein’s book lacks any mention of the frequentist-Bayesian war. Fisher’s name is conspicuously absent.</p> <p><strong>For or Against?</strong></p> <p><em>Against the Gods</em> is riveting. My favourite feature is the backstories of famous scholars. For some of them, before reading the book, the only thing I knew about them were their names, and I would have known even less if their names weren’t attached to their most famous discoveries (or at least, discoveries vaguely connected with them). Learning about their life, motivations, temperament, beliefs, and so on was illuminating. An intellectually superior form of gossip, I suppose.</p> <p>However, the elementary mathematical mistakes ultimately cast a cloud of suspicion over the book. How reliable are the author’s assertions in general? Although I heartily recommend <em>Against the Gods</em>, I also recommend thorough fact-checking before using it as a reference.</p> <p>So a tip for bestseller authors: if a section is technical, then ask an expert, be an expert, or cut it out. Too many howlers make readers like me wary of the whole, no matter how well-written and accurate the non-technical parts are.</p> <p><strong>Answers to exercises</strong></p> <p>As Bernstein himself implies, an expected value is a weighted average. We need weights, <em>and</em> we need numbers to sum. It takes two to tango; the expected value dance can only proceed if probabilities are accompanied by values.</p> <p>One example neglects the values, and the other neglects the probabilities. The author only computes the sum of the weights for the coin flip, and the sum of the values for the dice roll. In both cases the author divides by the number of outcomes, which might be considered another error: we already divided by the number of outcomes to compute the weights (probabilities) in the first place.</p> <p>Why are these blunders amusing? For the coin example, let’s ignore that the expected value is confused with a probability. Instead of a coin, consider winning the lottery. The probability of winning the lottery plus the probability of not winning the lottery sums to 100%. Dividing this by the number of outcomes, i.e. 2, yields 50%, so apparently we win or lose the lottery with equal likelihood! It’s almost like saying “either it happens or it doesn’t happen, so the chances it happens is 50%”.</p> <p>For the dice example, imagine rolling 2 loaded dice, both of which almost always show 6. The expected value should be close to 12, but because the probabilities are completely ignored, the author’s procedure leads to the same expected value of 7. Surely your calculation should change if the dice are loaded?</p> <p>How do we fix these problems? For the dice example, the author supplies the correct method in the very next paragraph. At last, both the probabilities and values are taken into account. Unfortunately, the author then concludes:</p> <blockquote> <p>The expected value…is exactly 7, confirming our calculation of 77/11. Now we can see why a roll of 7 plays such a critical role in the game of craps.</p> <p align="right"> </p> </blockquote> <p>This should have never been written. The first sentence suggests both methods for computing the expected value are valid, when of course it just so happens the wrong method leads to the right answer.</p> <p>The second sentence is difficult to interpret. Perhaps uncharitably, I’m guessing the sentence is an upgraded version of: “Look! Here’s a 7! Didn’t we see a 7 earlier?” What would have been written if we rolled a single die? The expected value is 3.5, but a roll of 3.5 obviously has no role in any game we play with one die.</p> <p>As for fixing the coin example: computing an expected value requires us to attach a numerical value to each outcome. One does not simply plow ahead with “heads” versus “tails”. We need numbers; any numbers. We could assign 42 to heads, and 1001 to tails; here, the expected value of a fair coin toss would be 50% of 42 plus 50% of 1001, which is 521.5. Typically we pick values relevant to the problem at hand: for instance, in a game where we earn a dollar for flipping heads, and lose a dollar for tails, we’d assign the values 1 and -1 (here, our expected winnings would be 0).</p> <p>[It may be possible to reinterpret the coin example as assigning the value 1 to both heads and tails. But if this were done, the expected value should also be 1, not “50%”. Furthermore, we learn nothing if the outcomes are indistinguishable.]</p> <hr> <h2><a name="_em_probability_theory_the_logic_of_science_em_by_e_t_jaynes"></a><em>Probability Theory: The Logic of Science</em> by E. T. Jaynes</h2> <p>If only Jaynes' book had been my introduction to probability. Like a twist ending in a movie, reading it was a thought-provoking eye-opening earth-shattering experience that compelled me to re-evaluate what I thought I knew.</p> <p>Whereas Silver presents whimsical examples that demonstrate the Bayesian approach, Jaynes forcefully argues for its theoretical soundness. From a few simple intuitive “desiderata” (too ill-defined to be axioms), Jaynes shows step-by-step how they imply more familiar probability axioms, and why the Bayesian approach is the natural choice. And all this happens within <a href="http://bayes.wustl.edu/etj/prob/book.pdf">the first 3 chapters, which are free online</a>.</p> <p>I had been uneasy about probability because I thought it was a collection of mysterious hacks, perhaps because it had to deal with the real world. I was flabbergasted to learn probability could be put on the same footing as formal logic. All those hacks can be justified after all. Probability is not just intuition and duct tape: it can be as solid as any branch of mathematics.</p> <p>Since there still exist competing philosophies of probability, presumably others find fault with Jaynes' arguments. I’m still working through it, but I’m convinced for now. If there’s another twist in this story, I’ll need another great book to show it to me.</p> <p>Washington University in St. Louis maintains <a href="http://bayes.wustl.edu/">a page dedicated to Jaynes</a>. It’s a shame he died before he finished writing. The remaining holes have been papered over with exercises, which explains their depth and difficulty.</p> <p>It’s also a shame Jaynes left Stanford University many years ago. Had he stayed, with luck I would have discovered his work earlier, or even have met him. <a href="http://bayes.wustl.edu/etj/articles/backward.look.pdf"><em>A backward look to the future</em></a> describes his reasons for departure.</p> <p>In short, Jaynes felt the “publish or perish” culture of academia was harmful and was taking over Stanford. I can’t tell if Jaynes was right because by the time I got into the game, this culture seemed universally well-established. I had no idea an alternative ever existed.</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com2tag:blogger.com,1999:blog-4222267598459829544.post-88931346649571590632012-09-16T15:48:00.000-07:002012-09-16T15:48:22.738-07:00Programming Dominion<a name="preamble"></a> <p>I recently learned to play <a href="http://en.wikipedia.org/wiki/Dominion_(card_game)">Dominion</a>, a game that spawned a genre known as deck-building card games. I’m a terrible player. While suffering defeats at the hands of a simple AI, I realized I might have more fun writing a Dominion-playing program.</p> <p>Implementing just the basic rules is a boring exercise. Luckily, Dominion is a self-modifying game. For example, each turn, you’re supposed to start with one Action and 5 cards in your hand, but there are ways of increasing your Action count, or changing the number of cards in your hand.</p> <p>Moreover, rule modifications interact with one another, further increasing complexity. For example, playing Witch causes other players to gain a Curse card, but not if the supply of Curse cards is exhausted, or a player is holding a Moat. Or take Throne Room, which plays another Action card twice. How can we design software to handle so many special cases?</p> <p>Of course, sufficient spaghetti can get anything working. But we should try to minimize mess; ideally the logic for each card should be as isolated as possible. It’d awful if, say, Throne Room required us to bury code somewhere in the Action-playing routine so it runs twice instead of once.</p> <p><strong>Dominion in Go</strong></p> <p>I’m reasonably pleased with my first attempt. For the simplest cards, the logic is completely contained in a string, in a tiny domain-specific language:</p> <pre>Village,3,Action,+C1,+A2<br />Woodcutter,3,Action,+B1,$2</pre><p>Less trivial cards require a bit more:</p> <pre>case "Feast":<br /> add(func(game *Game) {<br /> p := game.NowPlaying()<br /> game.trash = append(game.trash, p.played[len(p.played)-1])<br /> p.played = p.played[:len(p.played)-1]<br /> pickGain(game, 5)<br /> })</pre><p>And that’s it! To add a card, just one string, and maybe one block of code. As time passed, it became easier to add new cards. For some cards, it was more like data entry than programming.</p> <p>Moat is an exception. As the only Reaction card in the Base set, rather than figure out a clean way to implement it, I sprinkle ad hoc code here and there to get it working. If I were to add more Reaction cards, I’d factor out the common parts. There’s no reason to do so pre-emptively. In fact, that’s what happened with other cards: I would only refactor once there was duplicate code to eliminate.</p> <p>Intrepid readers can browse my git repo: <a href="https://github.com/blynn/gominion.git">https://github.com/blynn/gominion.git</a></p> <p>But beware. It’s all in one untidy monolithic file, the UI is horrible, and the AI is stupid, though it still beats me when I get too greedy with Action cards! The game state is shared by all players. If network play were added, to prevent cheating, information would need to be more tightly controlled.</p> <p>I have no plans to work much more on this, as many mature implementations already exist, and Rio Grande Games plans to release an official online version soon. All the same, I highly recommend learning to play Dominion, and then trying to program it. Both are enlightening experiences.</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-58285654033401757132012-08-26T21:01:00.000-07:002012-08-26T21:01:47.478-07:00Smashing the non-executable stack for fun and profit<a name="preamble"></a> <p>In 1996, Elias Levy ("Aleph One") published "Smashing The Stack For Fun And Profit" in Phrack magazine. The article showed how to overflow a buffer to launch a shell.</p> <p>I’m almost ashamed I never took a closer look for over a decade. My background would suggest I’d be one of the early adopters. As a kid, I loved messing with assembly language and poking around the system. I collected computer viruses. I bypassed copy protection systems. I knew how to make free phone calls. In grad school, my advisor and my colleagues taught <a href="http://crypto.stanford.edu/cs155/">a computer security class</a>, where rooting a system by smashing the stack was a homework assignment.</p> <p>With pride, and relief, I can now announce that at long last, in 2012, I have exploited a buffer overflow. Moreover, I have written a truly marvelous step-by-step guide to this, which this post is too narrow to contain. (I’m afraid of overflowing it.) I took notes because I encountered difficulties with other tutorials:</p> <ul> <li> <p> 32-bit systems are often assumed. My system is 64-bit. </p> </li> <li> <p> Various countermeasures are now enabled on stock installs. </p> </li> <li> <p> I wanted to try a newer variant of the attack known as return-oriented programming, which defeats one of the countermeasures. </p> </li> </ul> <p>Luckily my website has ample room. <a href="http://cs.stanford.edu/~blynn/rop/">Read now</a>, and get a bonus <a href="http://cs.stanford.edu/~blynn/rop/rop.sh">shell script</a> that demonstrates the attack!</p> Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-53481887886931948612012-08-08T12:12:00.000-07:002012-08-08T12:12:37.396-07:00Isn't Algebra Necessary?<a name="preamble"></a> <p>A recent <a href="http://www.nytimes.com/2012/07/29/opinion/sunday/is-algebra-necessary.html">New York Times article ponders if we should downgrade mathematics taught to high school and college students</a>, and in particular, cut basic algebra.</p> <p>Seriously? A horizontal line may represent an unknown word in those fill-in-the-blank primary school comprehension tests ("The dog’s name is <em>__</em>."), but a letter should never represent an unknown number lest it cause undue mental stress?</p> <p>Among my first thoughts was that the article was a professional troll posting. After all, <a href="http://www.nytimes.com/2012/07/27/business/media/the-new-york-times-co-posts-a-loss.html">The New York Times is sadly going through a rough patch</a>, and I sympathize if they must occasionally stoop lower to catch some extra cash. (If it is a troll posting, hats off! You got me.)</p> <p>But the truth is probably mundane; it seems the author genuinely believes that algebra should be dropped.</p> <p>On the one hand, this benefits me. If the article is taken seriously, and algebra is withheld from the masses, then those of us who know it possess formidable advantages. (The conspiracy theorist in me wonders if the author actually finds elementary algebra, well, elementary, and the true intent is to get ahead by encouraging everyone else to dumb down.)</p> <p>On the other hand, the piece smacks of ignorance-is-strength propaganda, and thus is worth smacking down.</p> <p><strong>Inflation</strong></p> <p>The article suggests that, instead of algebra, classes should perhaps focus on how the Consumer Price Index is computed. I agree studying this is important: for example, I feel more attention should be drawn to the 1996 recommendations of the Boskin commission. If the Fed did indeed repeat <a href="http://www.economist.com/node/387789">the mistakes of the 1970s</a>, then I should bump up the official US inflation rate when analyizing my finances. However, this stuff belongs to disciplines outside mathematics.</p> <p>More importantly, what use is the CPI without algebra? Take a simple example: say I owe you $1000, and the inflation rate is 5%. If all you care about is keeping up with inflation, is it fair if I pay you back $120 annually for 10 years? If not, what is the right amount?</p> <p>Without algebra, you might be able to figure that $1000 today is the same as 1000×(1.05)<sup>10</sup> = $1628.89 in 10 years. But how are you going to figure out that the yearly payment should be 0.05×1628.9/(1.05<sup>10</sup> - 1)? The easiest way to arrive here is to temporarily treat 1.05 as an abstract symbol. In other words, elementary algebra. One does need to play this ballgame for personal finance after all.</p> <p>You might counter that an amortized loan calculator can work out the answer for you; there’s no need to understand how it works, right?</p> <p><strong>Ignorance begets fraud</strong></p> <p>In the above calculation, do I make my first payment today, or a year from now? Don’t worry, I’ll figure it out for you. Or perhaps I’ll claim you’re using the wrong mode on the calculator and helpfully retrieve the "right" formula for you.</p> <p>Maybe you’d avoid these shenanigans by entrusting an accountant to oversee deals like this. Okay, but what if it’s not a loan? Say you’re making a policy recommendation and I’m an disingenuous lobbyist: can you tell if I’m fudging my figures?</p> <p>I heard a story about Reagan’s SDI program. Scientists estimated a space laser required 10<sup>20</sup> units of energy, and current technology could generate 10<sup>10</sup> units. They got funding by saying they were halfway there.</p> <p>I hope this tale is apocryphal. Nevertheless, one can gouge the mathematically challenged just as unscrupulous salesmen rip off unwitting buyers. Unfortunately, with finance and government policy, damage caused by bad decisions can be far worse and longer lasting.</p> <p><strong>Fermat’s Last … Dilemma?</strong></p> <p>One bright spot in the article was the mention of "the history and philosophy of [mathematics], as well as its applications in early cultures". While not required to solve problems, knowing the background to famous discoveries makes a subject more fun.</p> <p>It is inspiring that within a few short school years we enjoy the fruits of thousands of years of labour. Perhaps a student struggling with negative numbers would feel better knowing that it took many generations for them to be socially acceptable. For instance, the Babylonians were forced to divide the quadratic equation into different cases because they rejected negative numbers on philosophical grounds.</p> <p>But at the same time, we see a mention of "Fermat’s dilemma", which charitably is a creative renaming of "Fermat’s Last Theorem" (though more likely there was some confusion with the "Prisoner’s Dilemma" from game theory). The author chose this example poorly, because the history of Fermat’s Last Theorem actually bolsters the case for algebra. It shows how a little notation goes a long way.</p> <p>For Fermat did not use symbolic algebra to state his famous conjecture. Instead, he wrote:</p> <blockquote> <p>Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet.</p> <p align="right"> </p> </blockquote> <p>(If it took him that many words to state the theorem, no wonder he had no space for a proof!)</p> <p>We have it easy today. Mathematics would be considerably harder if you had to compute amortized loan payments with Latin sentences instead of algebra.</p> <p>How could a writer fail to appreciate algebra? Strunk taught that "vigorous writing is concise." Which is more concise: the above, or "x<sup>n</sup> + y<sup>n</sup> = z<sup>n</sup> has no positive integer solutions for n > 2"?</p> <p><strong>What should we learn?</strong></p> <p>Some time ago, I arrived at the opposite conclusion of the author, after reading confessions of professional academic ghostwriters. Algebra is fine; <a href="http://crypto.stanford.edu/~blynn/essay.html">the courses that need reform are those far removed from mathematics</a>.</p> <p>According to "Ed Dante", who is hopefully exaggerating, <a href="http://chronicle.com/article/The-Shadow-Scholar/125329/">you can pass such courses so long as you have Amazon, Google, Wikipedia, and a decent writing ability</a>. You get the same results and save money by paying for an internet connection instead of university tuition.</p> <p>I suppose I should also end on a positive note: I propose introducing ghostwriting courses, where the goal is to bluff your way through another course in the manner "Ed Dante" describes. The library would be off-limits, and you must not have previously studied the target subject. Perhaps the first 3 assignments can be admissions essays: one each for undergraduate, master’s and doctoral programs. Grading would be easy: if they fall for it, you get a good score.</p> <p>With luck, universities would be forced to either beef up the victim degrees (perhaps by assessing students with something besides essays, or by teaching something that cannot be immediately learned from the web), or withdraw them. Additionally, the students would learn the importance of writing, and be harder to fool.</p>Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com0tag:blogger.com,1999:blog-4222267598459829544.post-17843132002649263492012-08-05T17:39:00.000-07:002012-08-05T17:40:44.708-07:00Keeping up with yesterday<br />
My to-do list has grown frightening large. Perhaps I'll be more motivated to tackle it by publicly announcing a few of its entries.<br />
<br />
<ul>
<li>Apologies to those who sent me patches to my Git tutorial, or are awaiting email responses about the PBC library. I'll try to get around to them soon. And perhaps I'll even get back to working on the second edition of the printed version, which I originally planned to release 2 years ago!</li>
<li>I took notes on return-oriented programming on 64-bit Linux that I want to put up on my site somewhere. They've been almost ready for months.</li>
<li>Months ago, I also coded a logic puzzle solver that takes its input in a concise format. It's about ready for release.</li>
<li>In general, I want to rant and rave more over petty technical issues.</li>
</ul>
<br />
I'd better stop here, otherwise this list may also become too scary for me look at.Ben Lynnhttp://www.blogger.com/profile/09117417699962852340noreply@blogger.com2