A Snippet of Functional C

MurmurHash2 is a pretty nifty bit of code: excellent performance in both the statistical and chronological senses, and under the ultra-liberal MIT license.

The kernel of the hash function comprises these 5 lines of C code. k is the next four bytes to be hashed, m and r are experimentally-found constants, and h is the accumulated hash value:

word mrmr_step(word k,
        word h, word m, word r) {
/*1*/ k *= m;
/*2*/ k ^= k >> r;
/*3*/ k *= m;
/*4*/ h *= m;
/*5*/ h ^= k;
      return h;
}

Like much low-level systems code in C, this takes advantage of mutability: the value of k in line 3 is not the same as k in line 5. But what if you wanted to do this computation without mutability, or even assignment? Even in the land of C, the rules, they are a-changin’ [pdf].

At first blush, it seems strange to do first-order functional programming in a language like C, which doesn’t provide much support for it. But it’s precisely because there’s so little magic to C that it’s such a good vehicle for seeing how things work under the covers. The real point of this post is not how to write functional code, as to explore how to compile functional code.

Anyways, back to the code. Mutability and immutability are sometimes interchangeable; for example, a += 3; a *= 5; a -= 1; does the same thing as a = ((a + 3) * 5) - 1. The key is the linear dependency chain: each mutated a value is used in precisely one place.

Examining the MurmurHash2 kernel above, most of the dependencies are linear. But there’s one branching dependency, from line 1 to line 2. There are two C-like ways of dealing with this. First, we could "copy" line 1 into line 2:

k = (k*m) ^ ((k*m) >> r);

The second way would be to introduce a new variable:

int km = k * m; km ^= km >> r;

But option 1 breaks down if line 1 has side-effects, and line 2 introduces a new assignment, which we wanted to avoid!

The trick to see is that functions provide variable bindings. Thus, we can translate the C code to functional pseudocode like so:

/*1*/ bind km  to k * m in
/*2*/ bind kx  to km ^ (km >> r) in
/*3*/ bind kxm to kx * m in
/*4*/ bind hm  to h * m in
/*5*/ hm ^ kxm;

Now, in this static single assignment form, it’s super easy to see what can be safely inlined; they are bindings that are only used once, such as hm in line 4. So we might optimize this to:

/*1*/ bind km to k * m in
/*2*/ bind kx to km ^ (km >> r) in
/*5*/ (h * m) ^ (kx * m);

I’ve left line 2 alone for clarity, but it’s easy to see that it could be trivially substituted into line 5. Now, how to translate back to C? In a language with anonymous functions and closures, it’s easy. Just like Scheme translates let into lambda, we could translate bind varname to arg in expr to JavaScript like (function (varname) { return expr; })(arg). So let’s do that to start:

(function          line2(km) {
  return (function line5(kx) {
/*5*/  return (h*m) ^ (kx*m);
/*2*/ })(km ^ (km >> r));
/*1*/ })(k * m);

Note that the code now reads, in some sense, bottom-to-top. Notice also that we’ve created closures, because the line marked 5 references h, which is defined in a parent scope (not shown). So this code isn’t valid C, because C does not have first-class functions. Instead, all C functions must be declared at the top level. Luckily for us, translating the above code to C is easy. We move the function definitions to the top level, and add parameters for their free variables. (This process is called closure conversion, or sometimes lambda lifting.) We’re left with this purely functional C code:

word line1(word km, word r) {
  return km ^ (km >> r);
}
word line5(word h, word kx, word m) {
  return (h * m) ^ (kx * m);
}
word mrmr_step_f(word k, word h,
                 word m, word r) {
  return line5(h, line2(k*m, r), m);
}

This is the code that would be generated by a compiler, given input like the 3-line pseudocode from above. Now, the interesting bit is: how efficient is this version, compared to the original? Since this is just C, we can use gcc‘s -S flag to compile both versions to optimized assembly and compare:


_Z11mrmr_step_fiiii:
.LFB970:
        imull   %edx, %edi
        movl    %edi, %eax
        sarl    %cl, %eax
        xorl    %edi, %eax
        imull   %edx, %eax
        imull   %esi, %edx
        xorl    %edx, %eax
        ret

_Z9mrmr_stepiiii:
.LFB971:
        imull   %edx, %edi
        imull   %edx, %esi
        movl    %edi, %eax
        sarl    %cl, %eax
        xorl    %edi, %eax
        imull   %edx, %eax
        xorl    %esi, %eax
        ret

Pretty cool! The assembly from the purely functional version is essentially identical to the low-level imperative code. And a modern out-of-order CPU will ensure that the dynamic execution is, in fact, identical for both versions.

Update 2010/02/11

There are two ways of handling the “extra” arguments added closure conversion. The way above, where free variables become parameters, is easy to do manually, and clearer for small numbers of parameters. The alternative is to create a separate structure holding the free variables, and pass a pointer to the structure. GCC is strong enough to compile this example to the same assembly as well. Not bad!

struct mrmr_step2_env { word r; };
static inline word
step2(mrmr_step2_env* env, word km) {
  return km ^ (km >> env->r);
}

struct mrmr_step5_env { word h; word m; };
static inline word
step5(mrmr_step5_env* env, word kx) {
  return (env->h * env->m) ^ (kx * env->m);
}

word mrmr_step_f(word k, word h,
                 word m, word r) {
  mrmr_step2_env e2; e2.r = r;
  mrmr_step5_env e5; e5.h = h; e5.m = m;
  return step5(&e5, step2(&e2, k*m));
}

3 thoughts on “A Snippet of Functional C

  1. Speed of murmur comes from the fact how it interacts within a loop. It is not the fact that mrmr_step is fast, but how pipelining interacts on two consecutive mrmr_step executions (inlined of course) in the loop.

    Interesting to know that gcc optimalises pure version to the same code 🙂

  2. What do you mean by saying that the value of `k` in line 3 is not the same as in line 5? `k` is not being modified in between.

    • Ah! Yes, it’s a bit confusing as written. The value of k *in* line 3 is the value set by line two; the value of k *in* line 5 is the value set by line 3. It is line 3 itself that modifies the variable in between the evaluation of line 3 and line 5!

Leave a comment