Benchmarking is Hard

kNN in F# (Benchmarking is hard)

I was reading over some old blog posts and stumbled upon a series of blog posts at Comparing a Machine Learning Algorithm Implemented in F# and OCaml and kNN Algorithm in golang and Haskell on "kNN example code" in golang and Haskell. Others on the /r/haskell reddit have already pointed out the apples-to-peaches comparison going on here, but I think this is a fine way to illustrate a few gotchas / features of F# that may not be super obvious.

To begin, here are some of the numbers on my machine: In the interests of "benchmarking is hard" and isolating just run time, these times do not include compile times (fsc is an order of magnitude slower) Also, the .NET runtime should be faster than Mono, but I haven't tested on a Windows machine.

$ time ./golang-k-nn
real 0m4.265s
$ time ./golang-k-nn-speedup
real 0.790s
$ time mono fsharp-k-nn.exe
real 18.803s

Optimizing this F# code:

At the core of the code, the kNN classifier just calls a distance function to compare two images, and selects the minimum value. This means any optimizations to the distance function go a long way towards net speedup of the code base. Here's the original F# code:

let distance (p1: int[]) (p2: int[]) =
    Math.Sqrt (float(Array.sum (Array.map2 ( fun a b -> (pown (a-b) 2)) p1 p2) ))

First off, pown foo 2 is going to be slower than foo * foo, F# does not optimize for that special case. Switching that out instantly gives us a 150% speed improvement:

let distance (p1: int[]) (p2: int[]) =
    Math.Sqrt (float(Array.sum (Array.map2 ( fun a b -> (a-b) * (a-b))) p1 p2) ))

$ time mono fsharp-k-nn.exe
> real 11.934s

Secondly, F# doesn't inline functions terribly well -- imperative-style code and tail-recursive functions tend to both generate much more efficient IL than and Array.sum. Here's a tail recursive version:

let distance (p1: int[]) (p2: int[]) =
    let rec iterate s = function
        | -1 -> sqrt (float s)
        | n -> let v = p1.[n] - p2.[n] in iterate (s + v * v) (n-1)
    iterate 0 (p1.Length-1)

$ time mono fsharp-k-nn.exe
> real 4.541s

Aha! We're in the ballpark of the naive golang implementation. Which makes sense, because the above code does basically the same sequence of operations as the golang version (except it should be slightly faster since it operates on int arrays instead of float arrays)

Now, lets see if we can apply the extra set of optimizations in the sped-up golang version to our F# code:


The optimized go version parallelizes the computation using goroutines, going from this:

total := 0
for _, test := range validationSample {
    if is_correct(test) {

to this:

total := 0
channel := make(chan float32)

for _, test := range validationSample {
    go func(t) {
        if is_correct(t) {
            channel <- 1
        } else {
            channel <- 0
for i := 0; i < len(validationSample); i++ {
    total += <- channel


FSharp's analog to goroutines is the async Monad computation expression. To parallelize the classification of validationSample, we change:

let num_correct =
    |> (fun p -> if (classify p.Pixels ) = p.Label then 1 else 0)
    |> Array.sum

by changing the map to return a an array of async objects, which Async.Parallel dispatches to a task queue and Async.RunSynchronously unwraps back into a simple int[] object.

let num_correct =
    |> (fun p -> async { return if (classify p.Pixels ) = p.Label then 1 else 0 })
    |> Async.Parallel
    |> Async.RunSynchronously
    |> Array.sum

$ time mono fsharp-k-nn.exe
real    0m2.194s

Alternatively, the FSharp.Collections.ParallelSeq module from the Powerpack would be useful here, providing an in-place substitution into our original code:

let num_correct =
    |> (fun p -> if (classify p.Pixels ) = p.Label then 1 else 0)
    |> PSeq.sum

Changing our distance function further cuts the time down

let distance (p1: int[]) (p2: int[]) bailout =
    | n -> if s > bailout then s else let v = p1.[n] - p2.[n] in iterate (s + v * v) (n-1)

real 1.724s

At this point, we've improved the time from the inital F# version over ten-fold, by inlining the expensive parts of the distance function and using the same techniques that were applied to the golang implementation. While we're still a couple times slower than the golang implementation,

A Platformer Level Generator

A Platformer Level Generator

My second /r/proceduralgeneration contest entry! I spent less effort on this than the previous round, but apparently there was just a lot less participation in general because of the steep hurdle of having to create a game in order for creating a level to be particularly meaningful. Instead of using F#, I decided to go with Typescript for this round, since I wanted an interactive online demo level that people could play around with. This was my first time using Typescript but it turned out to be pretty nice to work with; The platformer/level generator wound up just shy of 300 lines of code, most of which were not involved in actual level generation.


Generating Platforms

The core of the generation function is at Github:level.ts. I basically roll a die, randomly chooseing a direction up, up-left, up-right, right, or down. Before going in any given direction, I make sure it doesn't interfere too much with the previous direction (for example, I wouldn't go up, then immediate go down.) To build the level, I start with an initial platform at 0, 0, pick a direction, build a random number of platforms out in that direction, and then repeat. In the following example code, for example, I would start with an initial platform, build right, then right again, then up-left.

    var platform = new Box(0, 0, 1, 1);
    platform = linkRight(platform, 3, 1, 1); // make a 1x1 box 3 units right from the first one
    platform = linkRight(platform, 1, 2, 10); // make a 2x10 box 1 unit away from the second box
    platform = linkUpl(platform); // make a box up and to the left from the third one

Series of Platforms


However, a level that goes in a random direction for every platform would turn out looking pretty .. random. While the random walk pattern looks nice for creating a cavelike network structure, it feels pretty aimless for a platformer level. It helps when you have a stronger direction of flow, both locally at the small scale (platforms form 'set pieces' that go in a particular direction) and at the large scale (the entire level doesn't double back on itself too often). For the first point, if I want to build platforms towards the right, I might as well build 3 or 4 platforms in a series. I encapsulate this in the repeat function, which will update the state of the platform and repeatedly call a link function.

    platform = repeat(3, linkRight);

For the second point, the generator tracks the last direction it was building new platforms in. Certain transitions are forbidden. RIGHT->RIGHT [which results in a long horizontal string of platforms towards the right] is forbidden, as well as "drops of faith" (large downward drops marked by caution blocks) when you've just been ascending up or leftwards (to prevent dropping back onto previously placed platforms).

var new_dir = randarr([RIGHT, RIGHT, RIGHT, UPRIGHT, UPLEFT, UP, DROP, DROP, DROP]);
switch (new_dir) {
    case DROP:
        if (prev_dir == UPLEFT || prev_dir == UP) continue;
        newbox.navi = "drop";
        yield * repeat(randi(1, 3), drop); break;



The level looks kind of boring when it's just a bunch of platforms, so I scatter little decorations, like rocks, bushes, and grass, randomly throughout the level. Since we want to keep the distribution uniform, the number of doodads on each block is proportional to the width of the block.

function * getDoodads() {
    for(var box of this.boxes) {
        for(var j = box.w * 2; j --> 0;)
            if (random() < 0.2)
                yield [randarr(["bush", "grass", "rock",
                    "grass", "bush", "grass",
                    "rock", "mushroomBrown", "mushroomRed"]),
                    new Point(box.x + 0.5 + random() * (box.w-1), box.t + 0.5)];

Keeping Track of Completion

Kind of a last-minute feature I threw in: save completed levels to Local Storage. 1) This gives you a sense of progress, and 2) it allows players to share particularly interesting level seeds with each other (or with me, if you want to report a bug).

Try it out!

Creating a Procedural Castle in F#

A Procedural Castle in F#


My goals here were to implement, using F# and OpenTK, a procedural castle village generator. The code should be as clean as you should expect from a month-long programming challenge and should be simple, elegant, and powerful.

Notes on the high-level design of this system

The Frontend

The "front end" of the generator are a set of functions that generate Trees, Towers, Columns, etc. As you will see from the tree example below, the front end is concerned purely with construction rules. For example, the three basic rules for producing a tower are:

In code, this looks like:

let rec create () =
    let stories = rand.Next(4, 8)
    let segments = generateSegments (Options.sides) stories
    let tower = Capped(segments, Options.topstyle ())
    let tower = match rand.Next(4) with
        | 0 -> Buttress(tower, [Capped(generateSegments 4 3, SpireStyle.ConeNoFlag)])
        | 1 -> Foundation(tower)
        | _ -> tower

The Backend

Ultimately all the high-level types like Tower get converted into a set of primitive types -- cylinders, blocks, vertex-lists, etc, in the backend understands. This provides a uniform representation of objects. Utility classes such as bounding-boxes and spatial hashes only ever need to deal with "boxes" and "lists of vertexes" instead of having separate logic for each object they interact with.

Backend Logic Example: the Convex Hull

One example of backend logic is calculating the convex hull of a given set of points. This is useful because we first generate a bunch of structures, and need to figure out where to build the walls so that they surround all the structures we've built. The algorithm implemented here is the Graham scan, an O(n log n) complexity algorithm. To understand this function, it may be helpful to translate it piece-by-piece into English.

English F#
Start with the leftmost point let firstpt = List.minBy (fun (v:Vector3) -> v.X) points
To find the next point, take all the other points ... let nextpt pt = List.except [pt] points |>
and find the one with the smallest counterclockwise angle List.reduce (fun p q -> if Matrix2(p.Xz-pt.Xz,q.Xz-pt.Xz).Determinant < 0.0f then q else p)
Keep taking the next point until you find the first point again List.unfold (fun p -> let n = nextpt p in if n = firstpt then None else Some(p, n)) firstpt
module ConvexHull =
    let FromPoints points =
        let firstpt = List.minBy (fun (v:Vector3) -> v.X) points
        let nextpt pt =
            List.except [pt] points |> List.reduce (fun p q -> if Matrix2(p.Xz-pt.Xz,q.Xz-pt.Xz).Determinant < 0.0f then q else p)
        List.unfold (fun p -> let n = nextpt p in if n = firstpt then None else Some(p, n)) firstpt

Backend Logic Notes: The placement algorithm

I used a spatial hash to detect collisions when placing objects. When placing a new object, we check for collisions, spiraling outwards from the initial point until we find a location. Not too much to say here, other than if i had more time, I would have preferred using some kind of tree structure instead for more flexibility. Since I wasn't comparing object-object collisions directly, I could only resolve collisions at the size of a spatial hash cell. This led to a tradeoff between hash resolution (lower resolution means larger gaps between objects) and performance (each insertion and collision check is O(V) where V is the volume of the inserted object)

Example Generator: Trees

A tree is one of the most common recursively generated objects out there. There is a fundamental fractal nature about trees that makes it very suitable for generation algorithms.

type Tree =
    | Leaf of Vector3
    | Branch of Vector3 * float32 * Tree list

The body of the generate function is three lines: 1 to create a Leaf node, 1 to create a random point pos, and 1 and create a Branch ending on pos, recursively containing between 2 and 5 sub-branches.

let rec generate = function
    | 0 -> Leaf(Vector3.One)
    | n -> let pos = 0.6f * Vector3.Normalize (r.NextV3(-0.5f, 0.5f, 0.2f, 0.6f, -0.5f, 0.5f))
           Branch(pos, 0.03f * float32 n, [for i in 0..r.Next(2, 6) -> generate (n-1)])

The body of the _toPolygon function is below. Leaves are converted to dark green blocks. The Branch part is slightly more involved. We find the rotation+scale matrix c based on the endpoint of the current branch, and recursively apply it to all child nodes. After the recursive step, I have a list of child nodes, comprising of a combination of rectangular Block objects and Offset matrices representing a nested transformation. Both Blocks and Offsets are primitive types that my rendering function recognizes and can convert to a triangle array for rasterization.

let rec private _toPolygon (mat:Matrix4) = function
    | Leaf(dim) -> [ Offset(mat, [Block(Color4.DarkGreen, Vector3.Zero, dim) ])]
    | Branch(vec, width, children) ->
        let b = Vector3.Cross(Vector3.UnitY, vec)
        let rot = Matrix4.CreateFromAxisAngle(b, Vector3.CalculateAngle(Vector3.UnitY, vec))
        let c = if b.Length > 0.001f then rot * Matrix4.CreateScale (vec.Length) else Matrix4.Identity
        let cc = List.collect (fun (b, a) -> _toPolygon (Matrix4.CreateTranslation (4.0f * b * Vector3.UnitY)) a) children
        [ Offset(c * mat, Block(Color4.Bisque.Mult(0.2), Vector3.Zero,  Vector3(width, 4.0f, width)) :: cc)]
let toPolygon t = match _toPolygon Matrix4.Identity t with | [ Offset(c, f) ] -> f | n -> n

Finally: A note on (un)randomness

Procedural Generation is about controlled chaos. Instead of manually designing individual elements, we have to specify a design space of what kinds of structure the program randomly generates. This theme came up multiple times during the course of this project. A couple examples:

  1. I put off implementing the spatial hash until halfway through the project. Randomly placed Points ( aren't really interesting in the long run. All the interesting details about placement happen around questions like "what is the minimum gap between two adjacent objects?" "Do nearby objects influence each other's orientation?" "do objects of similar size cluster together?" In the end, I found a pretty dumb generation strategy (generate large flat areas first, then add a few small objects to break up the space, then add large castle structures, then fill in the gaps with a final set of smaller objects" that yielded the results I wanted -- busy, but not too crowded. Since each new object had to align with at least one previously place object, the smaller units tended to clump and arrange themselves slightly on a grid.

  2. My initial stab at random towers was terrible -- when you choose a rule that says "pick 5 random tower levels and stack them on top of each other", you get what you'd expect -- a tower of barely related objects stacked on top of each other. In the final iteration, I redid the tower grammar to simply generate an N-sided cylinder and simply varied the style of the top and large-scale decorations. While this was more coherent, there should be even better rules available for generating structures closer to what you'd see in real life.

All in all, This was quite a fun project. Feel free to play with and extend the code - Even if you aren't familiar with F#, the overall structure of the code should make sense. If you're on OSX or Linux, the included Makefile should work assuming you have mono and fsharp installed. If you're on Windows, you're on your own, but try to open the included fsproj in Visual Studio.

FizzBuzz, Syntax Abuse, and the Pursuit of Perfect Code

FizzBuzz, Syntax Abuse, and the Pursuit of Perfect Code

If you're a programmer, you have likely seen the "FizzBuzz" problem at some point.

Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".

The nice thing about "FizzBuzz" as an interview question is that it is on the lower threshold of triviality, yet it manages to cover I/O, looping, conditionals, and arithmetic. With that said, here is how I'd write out FizzBuzz:

static void FizzBuzz() { 
	for(int i=0; i<100; i++) { 
			i % 15 == 0 ? "FizzBuzz" :
			i % 3 == 0 ? "Fizz" :
			i % 5 == 0 ? "Buzz" : 

Hmmm. For some reason, this small bit of code is already going to cause discussion. You could easily find a zillion different ways of writing FizzBuzz, the classic almost-trivial program, and everyone will have differing opinions on how to write it. As it stands, I've heard feedback such as "ugh, ternary operator is obfuscation" or "you need to comment that modulus operation since your maintainer won't understand it."

Ultimately, I can't say that these differing opinions are wrong. Everyone is a Blub Programmer, me included, So things we don't use look goddamn cryptic and things we've decided not to use look childlike in their simplicity. However, it seems to me that you'd be most productive in a team with other programmers of the same Blub-index, or at least in a team where everyone's index falls into the tolerance level for everyone else. Perhaps programmers get righteous in defending their favorite programming language or coding style in hopes that they convert others to their Blub level?

My most proficient languages are C#, F#, Javacript, Python, and Java. Perhaps it's just because it's what I write at work every day, but my Blub-level is somewhere between C# and F#. For every language that I write in, some parts feel really pleasant, like the ease of manipulating the shape of data in LINQ, or refining a DOM selection in jQuery (which Microsoft should totally have paid jresig to call Linq2DOM), or mocking an interface in Java with an anonymous instance. And other parts feel kind of awkward. But how much of that is the implicit tradeoffs we as programmers make every day versus the habits that I've built up, I'm not really sure.

[Code Bits] [Code Musing] [Language Rants]

Double Dispatch and Game logic and Message Handling

Double Dispatch and Game logic and Message Handling.

My previous survey of opinionated elf logic wasn't purely an academic exercise. I originally wrote the argument for dynamic multimethods in C# using Steve's Opinionated Elf example because I was explaining to someone how to handle Message routing between client and server in a multiplayer game. Namely, I have a set of Message types that all subclass a base Message:

class Message { }
class ChatMessage { }
class MoveMessage { }
class ActionEventMessage { }
class LoginMessage { }

and I was trying to explain how to divide up the logic between client-side and server-side components. you could write different client/server logic for individual message types.

class ClientsideMessageHandler { 
	void Route(Message m) { Handle((dynamic)m); } 
	void Handle(Message m) { /* unhandled message type ? */ }
	void Handle(ChatMessage m) { ChatWindow.Show(m.MessageSource, m.MessageText); }
	void Handle(MoveMessave m) { Entities.Find(m.EntityID).Move(m.newLocation); }
	// etc
class ServerMessageHandler { 
	void Route(Message m) { Handle((dynamic) m); }
	void Handle(Message m) { /* unhandled message type ? */ }
	void Handle(ChatMessage m) { /* route chat message to listening clients */ }
	void Handle(MoveMessage m) { /* validate then update nearby clients */ }

In my opinion this approach is better than explicit type-checking, both in terms of DRY and in terms of maintaining type heirarchies.

void Route(Message m) { 
	var c = m as ChatMessage; 
	if (c != null) { Handle(c); return; }
	.. etc..

Is DYNAMIC slow?!?!!

So after that example, lets examine the performance. With the caveat the benchmarks can be misleading, here are some numbers from running a simple MessageHandler that counts the number of different types of messages it's getting. I implemented a basic no-runtime-typing handler that maintains separate typed queues for each message type.

1 Million Message Calls:

So, a basic method dispatch takes a couple nanoseconds, adding in some basic type checking adds 10 nanoseconds, but with a caveat: this takes time proportional the number of types you're checking. for 20 or 30 types, the time would be 10 times as large. The Visitor Pattern manages to be the fastest double-dispatch mechanism, taking 17 nanoseconds regardless of how many types are used. Finally, dynamic turns out to be the slowest method, taking a full 235 nanoseconds per invocation. A quick sanity check in CPython:

>> timeit('for m in items: handler.handle(m)')

Shows that this is about the same as the cost of a dynamic method invocation in Python. So, how many messages are you going to be routing per second? 100? A 23 microsecond routing cost is fairly easy to swallow in exchange for the simple code. Routing a 100,000 messages per second? If you only have 10 microseconds to respond to each message, it might be better not to waste 200+ nanoseconds per message on the routing dispatch.