Adblock History and an Unhatched Easter Egg

The first browser I ever used was Netscape Navigator, and that allegiance continued over the years with the Mozilla Suite, then Camino, then Phoenix/Firebird/Firefox. Thinking about Camino makes me both happy and sad. It was far ahead of its time in terms of user experience, but it stagnated due to lack of attention, as its developers were plucked away to work on Safari or Phoenix. In 2002, it was the browser of 2005; but in 2008, it was still the browser of 2005.

Anyways, I digress. Around 2004, when I was in high school, I was a user of Firefox nightly builds. This was before the existence of extension versioning or The closest equivalent to AMO was mozdev, but that’s a bit like comparing the App Store to github. The mozillazine forums were the watering hole for users, extension developers, and browser core devs alike. At one point, a nightly build broke my favorite plugin, Adblock. After a shockingly long time (two whole days!) went by without a fix by the extension’s author, I decided to poke around and see if I could hack together a fix myself. Well, it worked; after a few more such tweaks, I wound up on an IRC channel with the project’s then-current maintainer/developer, who called himself “rue.” We spent most of the next year collaborating on the Next Big Release of the extension.

Unfortunately, we got sucked into Second System Syndrome, without having had the benefit of having fully built the first version ourselves. Basically, every rule of thumb that Joel Spolsky wrote about, we violated (unintentionally). We tried to rewrite the technical infrastructure, while simultaneously adding many ambitious new features. We were aiming for an Apple-style Grand Curtain Drop on a shiny new toy. Tempting, but foolish. While we were toiling away in secret, pressure was mounting for something, anything, to improve upon the actively-bitrotting public version. Eventually, a previous developer of Adblock, Wladimir Palant, returned to the scene and forked the project to create Adblock Plus.

One interesting side effect of an extremely long development cycle is that it allows for meandering diversions from the yellow brick road. The other developer, rue, found that Firefox extensions could alter the requested URL without having the alteration appear in the browser’s address bar. This held the potential for a great deal of mischief. On the more innocuous end, an April Fools Day prank might have reversed the URLs for and, or NFL/NHL, etc.  On the less innocuous end, rue prototyped a module that would, randomly with small probability, replace Amazon referral codes. It would have been the Office Space penny-shaving scheme, more or less. I objected on grounds of skeeviness and/or illegality, but it was a moot point, since we never released a completed version.  Adblock Plus ended up taking a bolder, more direct approach to monetization.

Anyways, while rue was busy poking at Firefox internals and de-obfuscating Gmail’s client-side code, I amused myself by devising an easter egg, and trying to conceal its presence in a non-obvious way. When it comes to such things, Firefox extensions have a distinct advantage over traditional software: they can put UI on a web server, so that a review of the extension’s source will see, at most, an innocuous-looking URL. While normal content served over the web is sandboxed, extensions can collude with a page to do things that a normal page cannot do. So I made a little recreation of the “Wake up, Neo…” scene from the Matrix.

Here is the easter egg in action; you’ll need to view it fullscreen to see the text:

If you have the sound up high enough, you’ll hear a clicky noise when each green character is “typed.” It’s actually the same noise used in the original movie! That’s called craftsmanship. ;-)  One other subtle point: Adblock’s UI is in the lower-right corner, as a statusbar item. The modern convention of using toolbar logo buttons wasn’t yet established at that time.

Had anyone used a packet sniffer or a lucky guess to figure out the URL for the easter egg and visited it directly, they would have seen this instead:

cypher - Internet Explorer_2015-02-02_19-30-44

The easter egg also altered Adblock’s entry in the Extensions list, changing the version from 0.6 to “The 1.0”, replacing the crosshair-style logo to an image of Agent Smith, and altering the description to include parodies of quotes from Agent Smith and Morpheus.

IE7 [Running] - Oracle VM VirtualBox_2015-02-02_20-23-56IE7 [Running] - Oracle VM VirtualBox_2015-02-02_22-18-41

Final useless random historical fact: Adblock was the 10th extension added to AMO. I have no idea how many there are now, probably in the mid thousands.

Email Spelunking

Going through some old files emails, it’s been interesting to dip into the not-so-distant past. Unfortunately sent messages were very sparsely archived, so I have only the incoming half of most email conversations. And sadly, all my email before 2002 appears to have been lost or corrupted. Oh well.

A few bits and pieces:

  • Xbox Live beta emails in Oct 2002 – “You still could get in!”
  • iTunes Music Store launches on Apr 29 2003: “Your music. Your way.”
  • Emails about the beta for the (recently defunct) Unison client by Panic.
  • Emails from, before they switched to (I joined in June 2005 – soon to be ten years!).   As late as August 2005, your Wall was a single mutable text field, so emails said “X has changed your wall.” By September, the terminology was changed to “X has written on your wall” as the list-of-posts model was adopted. Techie bonus: mail was sent with a user agent header of “ZuckMail”. Also, Wirehog.
  • At one point, I purchased the domain ILBEN.COM… so that I could write my email address as   …………… facepalm
  • Registered on June 10, 2002.
  • I seemed to have sent a lot of email to random internet folks:
    • Adam Iser, about how much I liked Adium
    • Dan Gillmor, about using Disk Copy to make disk images from DVDs
    • Nick Zitzmann, about adding Applescript support to Battlegram (a chat client for
    • Damien Gallop, about using RDC instead of VirtualPC…???
    • Rob Griffiths, about featuring LaunchBar as a Product of the Week on
    • I distinctly remember writing to Andy Ihnatko but those emails appear to have been lost to the sands of time.
    • Nobert Heger, about his experiences as an independent developer producing LaunchBar, to which he responded: 

      Success as a shareware developer depends on so many different factors, so it’s a bit difficult to give a general answer. You need a good idea, a good implementation, good PR and marketing, good support, etc. Failing in just one of these areas can make a huge difference in terms of success. The only advice I can give is to try it yourself.

  • Correspondence with Nigel McFarlane. Extremely bittersweet for me: he bravely and generously took a chance on me to write a few pages and get my name in a physical book… and committed suicide three months after it was published. Even though I barely knew him, the sense of loss and pain hasn’t diminished in the last ten years.
  • A handful of emails from my uncle, mostly followups to spirited arguments we had over things like whether the world would be better off if every television set vanished into the ether. In retrospect, seeing what I wrote makes me cringe – I was an ass, and being fifteen isn’t much of an excuse. I didn’t notice/appreciate it at the time, but with twelve years of elevated perspective, there was a great deal of wisdom (of the life-experience variety) shining through his words. But he understood what I did not:

    There are, unfortunately, some things that take years to be put into perspective, and even being extraordinarily bright provides no shortcut to such insights nor any guarantee of happiness.

On LLVM’s GC Infrastructure

LLVM’s documentation suggests that its getelementptr instruction is one of the most commonly misunderstood aspects of the system. In my experience, GEP is only number two on the list; number one is LLVM’s GC support: what it provides, what it doesn’t provide, and how to use it effectively.

The first two are the easy ones: LLVM does support stack maps, read and write barriers, fully-precise collection, and relocating collectors. It does not require a shadow stack, though it does provide one, which is helpful for first-time GC clients to get started with. LLVM’s overall GC design supports concurrent collectors and mutators, but the implementation lacks a few key ingredients to minimize the performance impact of GC in such situations.

There are a few missing features on the implementation side which obviously don’t need major design overhauls to fix. The big three on my list are code emission at safe points, safe points on back edges, and support for register-allocation of GC roots.

The last one is sort of interesting, because I’ve seen several proposals to change the design of LLVM’s GC infrastructure in a major way to obtain it. The reasoning goes like this: we currently have llvm.gcroot, which marks a stack slot address as being a GC root. But we want to store roots in machine registers, not stack slots, so we need a new intrinsic to annotate an SSA register as being a GC root as well. With such an intrinsic, LLVM would automatically store and load SSA registers to/from stack slots “behind the scenes.”

So, for example, suppose we start off by acquiring a GC-traced pointer:

%root = alloca (gcroot)
%p = … allocate a pointer …
store %p into %root

If we had a non-copying collector, we could get away with code like this:

%r = … maybe trigger GC…
… use %p …

But with LLVM’s current GC support, a copying collector must write this less-legible IR instead:

store %p into %root
%r = … maybe trigger GC…
%p2 = load %root
… use %p2 …

When repeated over and over, it definitely makes the IR harder to read. Wouldn’t it be nice if LLVM would automatically do all that stuff for you? Like this:

… (no alloca needed!)
%p = … allocate a pointer …
call llvm.gc.reg.root(%p)
%r = … maybe trigger GC…
… use %p …

It’s a natural desire; the first time I encountered LLVM’s GC support, I wanted the same thing. But it’s a wolf in sheep’s clothing. Remember: the fundamental property of LLVM is that an SSA register is an immutable value. Not just “observably equivalent as far as my high-level program can see,” but equivalent under any observation possible at the IR level!

The problem is that under the hypothetical gc.reg.root scheme, LLVM is free to do constant propagation before the rewrite to the verbose version, and if it does so, then the program’s semantics will have changed! The problem is that %p has been transmuted from a constant to a variable by virtue of being marked as a register root. But in doing so, we’ve taken the key benefit of LLVM’s SSA representation—the power its gets thanks to immutable bindings—and thrown it right out the window!


Take another look at the “verbose” pseudo-IR listed above. The common interpretation of this snippet is “spill %p into a stack slot, perform some operation, and then reload %p from the stack slot.”  This is reasoning about the IR according to its current implementation. But I’d argue that it’s better to think about the abstract semantics of the IR. Abstractly, we’re just encoding the fact that the value we think of as %p might change at some point in the IR; beforehand, we called it %p, and afterwards, we called it %p2. Since phi nodes aren’t really viable for most frontends, the only way LLVM allows us to encode mutability is via memory – i.e. alloca.  But the real point isn’t to set up stack slots, it’s just to reflect pointers updates that the GC might be making. As long as LLVM can verify that the “stack slot” address doesn’t escape, it is free to back that mutable storage by symbolically addressed memory: namely, machine registers! It won’t do that analysis now, but it could. That’s the point: no major design tweaks are needed to get register allocation of GC roots, just a bit of extra analysis internal to LLVM.

A separate question is how much of a difference this would make in practice. Effectively, the lack of register allocation means that GC roots are stored in caller-saved machine registers; maybe that’s a performance-killer, or maybe it’s a wash. To the best of my knowledge, nobody has ever carefully looked at the impact of naïve versus varying degrees of sophisticated treatment of GC roots in LLVM-compiled code.


Stepping back even more, I think the reason there’s so much confusion about LLVM’s GC support is that there’s a philosophical impedance mismatch with the rest of the project. Elsewhere, LLVM tries hard to allow unsophisticated frontends to get equivalent code quality as a more sophisticated frontend. The canonical example is that with mem2reg, a frontend needn’t worry about SSA conversion or dominance frontiers or anything like that. In general, LLVM’s philosophy is to move as much work from the front-end to the common LLVM infrastructure as possible.  But LLVM’s GC support is different. There’s no automatic transformation from “dead simple” to “efficient” use of GC primitives. Instead, it’s up to each client to do whatever analysis is needed for the most efficient use of the existing GC-related primitives. In practice, this means that clients need to do data-flow analysis if they want to reuse dead root slots. This is in stark contrast to the negligible level of Dragon-book sophistication otherwise expected of a front-end.


In terms of how to use LLVM’s GC support effectively, I’ve found it clearest to think of the process as a transformation between separate intermediate languages. The goal of the transformation is to add a minimal selection of root slots, along with just enough loads and stores to preserve correctness. The crucial invariant is that at each potential GC point, the live roots must reside in (abstract) memory. So: backwards analysis to determine liveness, and a forwards rewrite to add the loads and stores. Interestingly, this combination of a backwards analysis with a forwards program rewrite is not well-supported by existing dataflow tools or their underlying theoretical frameworks. Anyways, to implement the given invariant, you’ll need to know which types require tracing, and which operations may end up invoking the GC. You can sort of encode this information via metadata and pointer address spaces, if using LLVM IR as your source IR, but I think it’s easier to think of the two IRs as being completely separate.

Date-Preserving Patch Folding with Mercurial Queues

Mercurial Queues are a powerful tool for organizing changes into logically coherent commits. My current workflow often goes something like this:

  1. Make some changes implementing feature A (as a patch).
  2. Generate a series of patches moving on to feature B.
  3. Realize, after working on feature B, that feature A needs a small tweak, or a better commit message.

The relevant mq commands will get the job done in a jiffy. However, there’s a slight downside: after all that qpushing and qpopping, the patches in the series will lose their original dates, which means that qfinish will produce a tightly clustered series of patches that don’t actually reflect the true chronology of their development.

Fortunately, Mercurial provides the tools needed to preserve a meaningful chronological order. We’ll exploit hg log’s configurable formatting to make it generate a shell script that’ll set the dates for our patch series automatically. A pretty little hack, no? Here are the relevant incantations, assuming you want to modify a patch named P with revision number N:

  1. hg log -r[[N]]:tip --template "hg qref -d '{date|date}'; hg qpush;\n" >
  2. Run hg qgoto [[P]], then make whatever changes you need.
  3. Run bash, which will restore the original dates for all patches in the queue following the modified one.

An Apple Easter Egg, Of Sorts

Looks like someone at Apple has a bit of a sense of humor:


 sosumi for the legal section, eh?

Apparently they did the same thing for the 10 billion download promotion, too.

Agrajag Rediscovered

Going through the files on an old family computer, I discovered a long-lost email exchange with Douglas Adams:

Subject: Re: Agrajag
Date: Fri, 11 Jun 1999 17:52:48 -0700
From: "Douglas Adams" <askdna@TDV.COM>
To: Martin Karel <karel@BELLATLANTIC.NET>

Irene –
I’m pretty bad at planning what’s going to happen next week. Alex is exactly right.

Douglas Adams | <http:>;
The Digital Village | <http:>;

Please – help save the last 650 mountain gorillas

> Hi,
> Please settle a debate between me and my brother. I held forth that
> you must have already had Agrajag in mind when you wrote the bit about
> the pot of flowers that, when falling, said "Oh no, not again!" But
> Alex thinks that you probably just randomly stuck that line in there,
> and came up with Agrajag later and it just happened to fit. If I’m
> right, then we are very impressed with your degree of planning ahead,
> seeing as how it was four years between the publishing dates of
> "Hitchhiker’s Guide" and "Life, the Universe, and Everything."
>            – Irene Karel,
>               an adoring fan

Perhaps worth noting: the saved text file was 1337 bytes.

Firefox Add-On Compatibility

It looks like Firefox is moving to a compatible-by-default model for extensions. I find this interesting, since I proposed this back in 2004 to no avail.

To be clear, I don’t mean to say that compatible-by-default would have been the right choice in 2004… Predictions about the past are almost hard as predictions about the future, and at least two major factors have changed since then. First, the state of the world for extensions now is more stable than it was in the Firefox 1.0 timeframe. And second, Firefox’s rapid release cycle exacerbates spurious compatibility problems, making this proposal significantly more attractive.