Adblock History and an Unhatched Easter Egg

The first browser I ever used was Netscape Navigator, and that allegiance continued over the years with the Mozilla Suite, then Camino, then Phoenix/Firebird/Firefox. Thinking about Camino makes me both happy and sad. It was far ahead of its time in terms of user experience, but it stagnated due to lack of attention, as its developers were plucked away to work on Safari or Phoenix. In 2002, it was the browser of 2005; but in 2008, it was still the browser of 2005.

Anyways, I digress. Around 2004, when I was in high school, I was a user of Firefox nightly builds. This was before the existence of extension versioning or The closest equivalent to AMO was mozdev, but that’s a bit like comparing the App Store to github. The mozillazine forums were the watering hole for users, extension developers, and browser core devs alike. At one point, a nightly build broke my favorite plugin, Adblock. After a shockingly long time (two whole days!) went by without a fix by the extension’s author, I decided to poke around and see if I could hack together a fix myself. Well, it worked; after a few more such tweaks, I wound up on an IRC channel with the project’s then-current maintainer/developer, who called himself “rue.” We spent most of the next year collaborating on the Next Big Release of the extension.

Unfortunately, we got sucked into Second System Syndrome, without having had the benefit of having fully built the first version ourselves. Basically, every rule of thumb that Joel Spolsky wrote about, we violated (unintentionally). We tried to rewrite the technical infrastructure, while simultaneously adding many ambitious new features. We were aiming for an Apple-style Grand Curtain Drop on a shiny new toy. Tempting, but foolish. While we were toiling away in secret, pressure was mounting for something, anything, to improve upon the actively-bitrotting public version. Eventually, a previous developer of Adblock, Wladimir Palant, returned to the scene and forked the project to create Adblock Plus.

One interesting side effect of an extremely long development cycle is that it allows for meandering diversions from the yellow brick road. The other developer, rue, found that Firefox extensions could alter the requested URL without having the alteration appear in the browser’s address bar. This held the potential for a great deal of mischief. On the more innocuous end, an April Fools Day prank might have reversed the URLs for and, or NFL/NHL, etc.  On the less innocuous end, rue prototyped a module that would, randomly with small probability, replace Amazon referral codes. It would have been the Office Space penny-shaving scheme, more or less. I objected on grounds of skeeviness and/or illegality, but it was a moot point, since we never released a completed version.  Adblock Plus ended up taking a bolder, more direct approach to monetization.

Anyways, while rue was busy poking at Firefox internals and de-obfuscating Gmail’s client-side code, I amused myself by devising an easter egg, and trying to conceal its presence in a non-obvious way. When it comes to such things, Firefox extensions have a distinct advantage over traditional software: they can put UI on a web server, so that a review of the extension’s source will see, at most, an innocuous-looking URL. While normal content served over the web is sandboxed, extensions can collude with a page to do things that a normal page cannot do. So I made a little recreation of the “Wake up, Neo…” scene from the Matrix.

Here is the easter egg in action; you’ll need to view it fullscreen to see the text:

If you have the sound up high enough, you’ll hear a clicky noise when each green character is “typed.” It’s actually the same noise used in the original movie! That’s called craftsmanship.😉  One other subtle point: Adblock’s UI is in the lower-right corner, as a statusbar item. The modern convention of using toolbar logo buttons wasn’t yet established at that time.

Had anyone used a packet sniffer or a lucky guess to figure out the URL for the easter egg and visited it directly, they would have seen this instead:

cypher - Internet Explorer_2015-02-02_19-30-44

The easter egg also altered Adblock’s entry in the Extensions list, changing the version from 0.6 to “The 1.0”, replacing the crosshair-style logo to an image of Agent Smith, and altering the description to include parodies of quotes from Agent Smith and Morpheus.

IE7 [Running] - Oracle VM VirtualBox_2015-02-02_20-23-56IE7 [Running] - Oracle VM VirtualBox_2015-02-02_22-18-41

Final useless random historical fact: Adblock was the 10th extension added to AMO. I have no idea how many there are now, probably in the mid thousands.

Email Spelunking

Going through some old files emails, it’s been interesting to dip into the not-so-distant past. Unfortunately sent messages were very sparsely archived, so I have only the incoming half of most email conversations. And sadly, all my email before 2002 appears to have been lost or corrupted. Oh well.

A few bits and pieces:

  • Xbox Live beta emails in Oct 2002 – “You still could get in!”
  • iTunes Music Store launches on Apr 29 2003: “Your music. Your way.”
  • Emails about the beta for the (recently defunct) Unison client by Panic.
  • Emails from, before they switched to (I joined in June 2005 – soon to be ten years!).   As late as August 2005, your Wall was a single mutable text field, so emails said “X has changed your wall.” By September, the terminology was changed to “X has written on your wall” as the list-of-posts model was adopted. Techie bonus: mail was sent with a user agent header of “ZuckMail”. Also, Wirehog.
  • At one point, I purchased the domain ILBEN.COM… so that I could write my email address as   …………… facepalm
  • Registered on June 10, 2002.
  • I seemed to have sent a lot of email to random internet folks:
    • Adam Iser, about how much I liked Adium
    • Dan Gillmor, about using Disk Copy to make disk images from DVDs
    • Nick Zitzmann, about adding Applescript support to Battlegram (a chat client for
    • Damien Gallop, about using RDC instead of VirtualPC…???
    • Rob Griffiths, about featuring LaunchBar as a Product of the Week on
    • I distinctly remember writing to Andy Ihnatko but those emails appear to have been lost to the sands of time.
    • Nobert Heger, about his experiences as an independent developer producing LaunchBar, to which he responded: 

      Success as a shareware developer depends on so many different factors, so it’s a bit difficult to give a general answer. You need a good idea, a good implementation, good PR and marketing, good support, etc. Failing in just one of these areas can make a huge difference in terms of success. The only advice I can give is to try it yourself.

  • Correspondence with Nigel McFarlane. Extremely bittersweet for me: he bravely and generously took a chance on me to write a few pages and get my name in a physical book… and committed suicide three months after it was published. Even though I barely knew him, the sense of loss and pain hasn’t diminished in the last ten years.
  • A handful of emails from my uncle, mostly followups to spirited arguments we had over things like whether the world would be better off if every television set vanished into the ether. In retrospect, seeing what I wrote makes me cringe – I was an ass, and being fifteen isn’t much of an excuse. I didn’t notice/appreciate it at the time, but with twelve years of elevated perspective, there was a great deal of wisdom (of the life-experience variety) shining through his words. But he understood what I did not:

    There are, unfortunately, some things that take years to be put into perspective, and even being extraordinarily bright provides no shortcut to such insights nor any guarantee of happiness.

On LLVM’s GC Infrastructure

LLVM’s documentation suggests that its getelementptr instruction is one of the most commonly misunderstood aspects of the system. In my experience, GEP is only number two on the list; number one is LLVM’s GC support: what it provides, what it doesn’t provide, and how to use it effectively.

The first two are the easy ones: LLVM does support stack maps, read and write barriers, fully-precise collection, and relocating collectors. It does not require a shadow stack, though it does provide one, which is helpful for first-time GC clients to get started with. LLVM’s overall GC design supports concurrent collectors and mutators, but the implementation lacks a few key ingredients to minimize the performance impact of GC in such situations.

There are a few missing features on the implementation side which obviously don’t need major design overhauls to fix. The big three on my list are code emission at safe points, safe points on back edges, and support for register-allocation of GC roots.

The last one is sort of interesting, because I’ve seen several proposals to change the design of LLVM’s GC infrastructure in a major way to obtain it. The reasoning goes like this: we currently have llvm.gcroot, which marks a stack slot address as being a GC root. But we want to store roots in machine registers, not stack slots, so we need a new intrinsic to annotate an SSA register as being a GC root as well. With such an intrinsic, LLVM would automatically store and load SSA registers to/from stack slots “behind the scenes.”

So, for example, suppose we start off by acquiring a GC-traced pointer:

%root = alloca (gcroot)
%p = … allocate a pointer …
store %p into %root

If we had a non-copying collector, we could get away with code like this:

%r = … maybe trigger GC…
… use %p …

But with LLVM’s current GC support, a copying collector must write this less-legible IR instead:

store %p into %root
%r = … maybe trigger GC…
%p2 = load %root
… use %p2 …

When repeated over and over, it definitely makes the IR harder to read. Wouldn’t it be nice if LLVM would automatically do all that stuff for you? Like this:

… (no alloca needed!)
%p = … allocate a pointer …
call llvm.gc.reg.root(%p)
%r = … maybe trigger GC…
… use %p …

It’s a natural desire; the first time I encountered LLVM’s GC support, I wanted the same thing. But it’s a wolf in sheep’s clothing. Remember: the fundamental property of LLVM is that an SSA register is an immutable value. Not just “observably equivalent as far as my high-level program can see,” but equivalent under any observation possible at the IR level!

The problem is that under the hypothetical gc.reg.root scheme, LLVM is free to do constant propagation before the rewrite to the verbose version, and if it does so, then the program’s semantics will have changed! The problem is that %p has been transmuted from a constant to a variable by virtue of being marked as a register root. But in doing so, we’ve taken the key benefit of LLVM’s SSA representation—the power its gets thanks to immutable bindings—and thrown it right out the window!


Take another look at the “verbose” pseudo-IR listed above. The common interpretation of this snippet is “spill %p into a stack slot, perform some operation, and then reload %p from the stack slot.”  This is reasoning about the IR according to its current implementation. But I’d argue that it’s better to think about the abstract semantics of the IR. Abstractly, we’re just encoding the fact that the value we think of as %p might change at some point in the IR; beforehand, we called it %p, and afterwards, we called it %p2. Since phi nodes aren’t really viable for most frontends, the only way LLVM allows us to encode mutability is via memory – i.e. alloca.  But the real point isn’t to set up stack slots, it’s just to reflect pointers updates that the GC might be making. As long as LLVM can verify that the “stack slot” address doesn’t escape, it is free to back that mutable storage by symbolically addressed memory: namely, machine registers! It won’t do that analysis now, but it could. That’s the point: no major design tweaks are needed to get register allocation of GC roots, just a bit of extra analysis internal to LLVM.

A separate question is how much of a difference this would make in practice. Effectively, the lack of register allocation means that GC roots are stored in caller-saved machine registers; maybe that’s a performance-killer, or maybe it’s a wash. To the best of my knowledge, nobody has ever carefully looked at the impact of naïve versus varying degrees of sophisticated treatment of GC roots in LLVM-compiled code.


Stepping back even more, I think the reason there’s so much confusion about LLVM’s GC support is that there’s a philosophical impedance mismatch with the rest of the project. Elsewhere, LLVM tries hard to allow unsophisticated frontends to get equivalent code quality as a more sophisticated frontend. The canonical example is that with mem2reg, a frontend needn’t worry about SSA conversion or dominance frontiers or anything like that. In general, LLVM’s philosophy is to move as much work from the front-end to the common LLVM infrastructure as possible.  But LLVM’s GC support is different. There’s no automatic transformation from “dead simple” to “efficient” use of GC primitives. Instead, it’s up to each client to do whatever analysis is needed for the most efficient use of the existing GC-related primitives. In practice, this means that clients need to do data-flow analysis if they want to reuse dead root slots. This is in stark contrast to the negligible level of Dragon-book sophistication otherwise expected of a front-end.


In terms of how to use LLVM’s GC support effectively, I’ve found it clearest to think of the process as a transformation between separate intermediate languages. The goal of the transformation is to add a minimal selection of root slots, along with just enough loads and stores to preserve correctness. The crucial invariant is that at each potential GC point, the live roots must reside in (abstract) memory. So: backwards analysis to determine liveness, and a forwards rewrite to add the loads and stores. Interestingly, this combination of a backwards analysis with a forwards program rewrite is not well-supported by existing dataflow tools or their underlying theoretical frameworks. Anyways, to implement the given invariant, you’ll need to know which types require tracing, and which operations may end up invoking the GC. You can sort of encode this information via metadata and pointer address spaces, if using LLVM IR as your source IR, but I think it’s easier to think of the two IRs as being completely separate.

Date-Preserving Patch Folding with Mercurial Queues

Mercurial Queues are a powerful tool for organizing changes into logically coherent commits. My current workflow often goes something like this:

  1. Make some changes implementing feature A (as a patch).
  2. Generate a series of patches moving on to feature B.
  3. Realize, after working on feature B, that feature A needs a small tweak, or a better commit message.

The relevant mq commands will get the job done in a jiffy. However, there’s a slight downside: after all that qpushing and qpopping, the patches in the series will lose their original dates, which means that qfinish will produce a tightly clustered series of patches that don’t actually reflect the true chronology of their development.

Fortunately, Mercurial provides the tools needed to preserve a meaningful chronological order. We’ll exploit hg log’s configurable formatting to make it generate a shell script that’ll set the dates for our patch series automatically. A pretty little hack, no? Here are the relevant incantations, assuming you want to modify a patch named P with revision number N:

  1. hg log -r[[N]]:tip --template "hg qref -d '{date|date}'; hg qpush;\n" >
  2. Run hg qgoto [[P]], then make whatever changes you need.
  3. Run bash, which will restore the original dates for all patches in the queue following the modified one.

Agrajag Rediscovered

Going through the files on an old family computer, I discovered a long-lost email exchange with Douglas Adams:

Subject: Re: Agrajag
Date: Fri, 11 Jun 1999 17:52:48 -0700
From: "Douglas Adams" <askdna@TDV.COM>
To: Martin Karel <karel@BELLATLANTIC.NET>

Irene –
I’m pretty bad at planning what’s going to happen next week. Alex is exactly right.

Douglas Adams | <http:;
The Digital Village | <http:;

Please – help save the last 650 mountain gorillas

> Hi,
> Please settle a debate between me and my brother. I held forth that
> you must have already had Agrajag in mind when you wrote the bit about
> the pot of flowers that, when falling, said "Oh no, not again!" But
> Alex thinks that you probably just randomly stuck that line in there,
> and came up with Agrajag later and it just happened to fit. If I’m
> right, then we are very impressed with your degree of planning ahead,
> seeing as how it was four years between the publishing dates of
> "Hitchhiker’s Guide" and "Life, the Universe, and Everything."
>            – Irene Karel,
>               an adoring fan

Perhaps worth noting: the saved text file was 1337 bytes.

Firefox Add-On Compatibility

It looks like Firefox is moving to a compatible-by-default model for extensions. I find this interesting, since I proposed this back in 2004 to no avail.

To be clear, I don’t mean to say that compatible-by-default would have been the right choice in 2004… Predictions about the past are almost hard as predictions about the future, and at least two major factors have changed since then. First, the state of the world for extensions now is more stable than it was in the Firefox 1.0 timeframe. And second, Firefox’s rapid release cycle exacerbates spurious compatibility problems, making this proposal significantly more attractive.

The Lessons of Lucasfilm’s Habitat

We were initially our own worst enemies in this undertaking, victims of a way of thinking to which we engineers are dangerously susceptible. This way of thinking is characterized by the conceit that all things may be planned in advance and then directly implemented according to the plan’s detailed specification. For persons schooled in the design and construction of systems based on simple, well-defined and well-understood foundation principles, this is a natural attitude to have.

I wonder what Intel is up to?

“A proposed long term solution: a new functional language that features implicit parallelism, dependent typing, and an effects type system”

“We are also working with an ISV on a compiler for a parallel functional language
– Strict, but not call by value for implicit parallelism (maybe
with lightweight annotations)
– Arrays and array comprehensions for data parallelism
– Effects system to contain impure features
– Atomic for state updates”

“For five years Intel’s Programming Systems Lab (PSL) has been collaborating with an external partner on a new functional programming language designed for productivity on many-core processors. While the language is not yet public, this talk outlines motivations behind the language and describes our experiences in implementing it using a variety of functional languages. The reference interpreter is written in Haskell and compiled with GHC while PSL’s performance implementation is written in SML and compiled with Mlton. We have also generated Scheme code compiled with PLT Scheme as part of a prototyping effort.”

A Small Example Illustrating Why Unification-Based Type Inference Is Not Always The Most User-Friendly Choice

let base = 10
let bitsPerDigit = logBase 2.0 (fromIntegral base)
let activeBits = fromIntegral $
  ceiling (bitsPerDigit * (Prelude.length "123"))

There's a missing fromIntegral above, which causes GHC to underline logBase and complain that it has no instance for (Floating Int). This is, on the face of things, completely mystifying. Both parameters are “obviously” floating! A second error points to ceiling, saying no instance for (RealFrac Int).

The second error is the more suggestive one, since the problem is that the type of * causes GHC to equate the type of bitsPerDigit with that of (Prelude.length clean), i.e. Int.  But it’s more than a little strange to force the programmer to reason about their code in the same order that the type checker does!

Comment on Comments


if (vars.empty()) { 
  // Store null env pointer if environment is empty 
      /* isVolatile= */ false);



if (vars.empty()) {

Citation Chains

Here’s a fun game to play on Google Scholar:

  1. Search for a relatively old paper in your field of choice. It will, presumably, be the top result.
  2. Note the number N in the “Cited by N” link under the top result’s abstract. Also note the year of publication.
  3. Click the “Cited by N” link.
  4. GOTO 2.

For example, here’s what results from searching for Luca Cardelli’s paper A polymorphic lambda-calculus with Type : Type:

Paper Name Pub
by <N>
per year
A polymorphic lambda-calculus with Type: Type 1986 72 4
The calculus of constructions 1986 1053 44
A framework for defining logics 1993 1155 68
Proof-carrying code 1997 1873 145
Xen and the art of virtualization 2003 2375 340
Globus toolkit version 4: Software for service-oriented systems 2006 1179 295
A taxonomy of data grids for distributed data sharing, management, and processing 2006 175 88
A toolkit for modelling and simulating data Grids: an extension to GridSim 2008 38 20
Service and utility oriented distributed computing systems […] 2008 11 4


It’s sort of nifty to see the “arch” of citations, and citations/year, over time.

Other interesting searches:

  • program slicing
  • featherweight

Not very interesting:

  • grain elevators

Coroutine Performance

I was curious this afternoon about how fast (or slow) coroutines can be. So I wrote a small microbenchmark in C, using libcoro’s  handwritten-assembly implementation of stack switching for symmetric coroutines.

The benchmark builds a complete binary tree of configurable size, then visits it twice. First, a conventional recursively-written function takes the tree root and a function pointer, and calls the function pointer to process each value stored in the tree.

The benchmark next re-visits the tree with the iteration encapsulated by a Lua-style asymmetric coroutine class, which yield()s tree node values back to the main routine.

One interesting operational consequence of doing the recursive visiting inside the coroutine is that the main program stack remains at constant depth for every processed node. This constant-stack-space property is also an effect of CPS transformations – not surprising, since both continuations and coroutines are general control abstractions.

Anyways, I found that when compiled with –O2 (gcc 4.4.3) and run on a Core 2, the overhead of using resume()/yield() compared to call/return was about 110 cycles per yield(). That cost is on par with loading a single word from RAM. Stack switching is an impressively efficient way of implementing a feature that lends significant flexibility and power to a language!

ANTLR Grammar Tip: LL(*) and Left Factoring

Suppose you wish to have ANTLR recognize non-degenerate tuples of expressions, like (x, y) or (f(g), a, b) but not (f) or (). Trailing commas are not allowed. The following formulation of such a rule will likely elicit a complaint (“rule tuple has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2; …”):

tuple : '(' (expr ',')+ expr ')' ;

ANTLR’s error message seems unhelpful at first glance, but it’s trying to point you in the right direction. What ANTLR is saying is that after it reaches a comma, it doesn’t know whether it should expect expr ')' or expr ',', and parsing expr could requiring looking at an arbitrary number of tokens. ANTLR recommends left-factoring, but the structure of the rule as written doesn’t make it clear what needs to be factored. Re-writing the rule to make it closer to ANTLR’s view makes it clearer what the problem is:

tuple_start : '(' tuple_continue;
    :    ( expr ',' tuple_continue
    |      expr ')');

Now it’s more obvious where the left-factoring needs to be applied, and how to do it by writing expr once, leaving only the choice of seeing a comma or a close-paren:

tuple_start : '(' tuple_continue;
    :    expr ( ',' tuple_continue
              | ')'  );

The rewritten expansion corresponds to an alternative structuring of the tuple rule that will satisfy ANTLR:

tuple : '(' expr (',' expr)+ ')';

The ANTLR wiki has a page on removing global backtracking from ANTLR grammars, which has a nice presentation of how to do left-factoring when the opportunity is obvious.


I was reviewing Ras Bodik’s slides on parallel browsers and noticed in the section on parallel parsing that chart parsers like CYK and Earley are theoretically O(n^3) in time, but in practice it’s more like O(n^1.2).

I wondered: wait, what does O(n^1.2) mean for practical input sizes? Obviously for large enough n, any exponential will be worse than, say, O(n log n), but what’s the cutoff? So, I turned to the handy Online Function Grapher and saw this:

n^1.2 vs n lg n

That is, for n less than 5 million, n^1.2 < n lg n, and they remain close up to about n = 9 million. So if Bodik’s estimate holds, chart parsing could indeed be a viable parsing strategy for documents of up to several megabytes. Happily, this limit size matches the current and expected sizes of web pages to a T.

The blue line at the bottom shows linear growth. It’s all too easy to think of linearithmic growth as being “almost as good” as linear growth. To quote Jason Evans discussing how replacing his memory allocator’s log-time red-black tree operations with  constant-time replacements yielded appreciable speedups:

“In essence, my initial failure was to disregard the difference between a O(1) algorithm and a O(lg n) algorithm. Intuitively, I think of logarithmic-time algorithms as fast, but constant factors and large n can conspire to make logarthmic time not nearly good enough.”