On LLVM’s GC Infrastructure

LLVM’s documentation suggests that its getelementptr instruction is one of the most commonly misunderstood aspects of the system. In my experience, GEP is only number two on the list; number one is LLVM’s GC support: what it provides, what it doesn’t provide, and how to use it effectively.

The first two are the easy ones: LLVM does support stack maps, read and write barriers, fully-precise collection, and relocating collectors. It does not require a shadow stack, though it does provide one, which is helpful for first-time GC clients to get started with. LLVM’s overall GC design supports concurrent collectors and mutators, but the implementation lacks a few key ingredients to minimize the performance impact of GC in such situations.

There are a few missing features on the implementation side which obviously don’t need major design overhauls to fix. The big three on my list are code emission at safe points, safe points on back edges, and support for register-allocation of GC roots.

The last one is sort of interesting, because I’ve seen several proposals to change the design of LLVM’s GC infrastructure in a major way to obtain it. The reasoning goes like this: we currently have llvm.gcroot, which marks a stack slot address as being a GC root. But we want to store roots in machine registers, not stack slots, so we need a new intrinsic to annotate an SSA register as being a GC root as well. With such an intrinsic, LLVM would automatically store and load SSA registers to/from stack slots “behind the scenes.”

So, for example, suppose we start off by acquiring a GC-traced pointer:

%root = alloca (gcroot)
%p = … allocate a pointer …
store %p into %root

If we had a non-copying collector, we could get away with code like this:

%r = … maybe trigger GC…
… use %p …

But with LLVM’s current GC support, a copying collector must write this less-legible IR instead:

store %p into %root
%r = … maybe trigger GC…
%p2 = load %root
… use %p2 …

When repeated over and over, it definitely makes the IR harder to read. Wouldn’t it be nice if LLVM would automatically do all that stuff for you? Like this:

… (no alloca needed!)
%p = … allocate a pointer …
call llvm.gc.reg.root(%p)
…
%r = … maybe trigger GC…
… use %p …

It’s a natural desire; the first time I encountered LLVM’s GC support, I wanted the same thing. But it’s a wolf in sheep’s clothing. Remember: the fundamental property of LLVM is that an SSA register is an immutable value. Not just “observably equivalent as far as my high-level program can see,” but equivalent under any observation possible at the IR level!

The problem is that under the hypothetical gc.reg.root scheme, LLVM is free to do constant propagation before the rewrite to the verbose version, and if it does so, then the program’s semantics will have changed! The problem is that %p has been transmuted from a constant to a variable by virtue of being marked as a register root. But in doing so, we’ve taken the key benefit of LLVM’s SSA representation—the power its gets thanks to immutable bindings—and thrown it right out the window!

 

Take another look at the “verbose” pseudo-IR listed above. The common interpretation of this snippet is “spill %p into a stack slot, perform some operation, and then reload %p from the stack slot.”  This is reasoning about the IR according to its current implementation. But I’d argue that it’s better to think about the abstract semantics of the IR. Abstractly, we’re just encoding the fact that the value we think of as %p might change at some point in the IR; beforehand, we called it %p, and afterwards, we called it %p2. Since phi nodes aren’t really viable for most frontends, the only way LLVM allows us to encode mutability is via memory – i.e. alloca.  But the real point isn’t to set up stack slots, it’s just to reflect pointers updates that the GC might be making. As long as LLVM can verify that the “stack slot” address doesn’t escape, it is free to back that mutable storage by symbolically addressed memory: namely, machine registers! It won’t do that analysis now, but it could. That’s the point: no major design tweaks are needed to get register allocation of GC roots, just a bit of extra analysis internal to LLVM.

A separate question is how much of a difference this would make in practice. Effectively, the lack of register allocation means that GC roots are stored in caller-saved machine registers; maybe that’s a performance-killer, or maybe it’s a wash. To the best of my knowledge, nobody has ever carefully looked at the impact of naïve versus varying degrees of sophisticated treatment of GC roots in LLVM-compiled code.

 

Stepping back even more, I think the reason there’s so much confusion about LLVM’s GC support is that there’s a philosophical impedance mismatch with the rest of the project. Elsewhere, LLVM tries hard to allow unsophisticated frontends to get equivalent code quality as a more sophisticated frontend. The canonical example is that with mem2reg, a frontend needn’t worry about SSA conversion or dominance frontiers or anything like that. In general, LLVM’s philosophy is to move as much work from the front-end to the common LLVM infrastructure as possible.  But LLVM’s GC support is different. There’s no automatic transformation from “dead simple” to “efficient” use of GC primitives. Instead, it’s up to each client to do whatever analysis is needed for the most efficient use of the existing GC-related primitives. In practice, this means that clients need to do data-flow analysis if they want to reuse dead root slots. This is in stark contrast to the negligible level of Dragon-book sophistication otherwise expected of a front-end.

 

In terms of how to use LLVM’s GC support effectively, I’ve found it clearest to think of the process as a transformation between separate intermediate languages. The goal of the transformation is to add a minimal selection of root slots, along with just enough loads and stores to preserve correctness. The crucial invariant is that at each potential GC point, the live roots must reside in (abstract) memory. So: backwards analysis to determine liveness, and a forwards rewrite to add the loads and stores. Interestingly, this combination of a backwards analysis with a forwards program rewrite is not well-supported by existing dataflow tools or their underlying theoretical frameworks. Anyways, to implement the given invariant, you’ll need to know which types require tracing, and which operations may end up invoking the GC. You can sort of encode this information via metadata and pointer address spaces, if using LLVM IR as your source IR, but I think it’s easier to think of the two IRs as being completely separate.

Date-Preserving Patch Folding with Mercurial Queues

Mercurial Queues are a powerful tool for organizing changes into logically coherent commits. My current workflow often goes something like this:

  1. Make some changes implementing feature A (as a patch).
  2. Generate a series of patches moving on to feature B.
  3. Realize, after working on feature B, that feature A needs a small tweak, or a better commit message.

The relevant mq commands will get the job done in a jiffy. However, there’s a slight downside: after all that qpushing and qpopping, the patches in the series will lose their original dates, which means that qfinish will produce a tightly clustered series of patches that don’t actually reflect the true chronology of their development.

Fortunately, Mercurial provides the tools needed to preserve a meaningful chronological order. We’ll exploit hg log’s configurable formatting to make it generate a shell script that’ll set the dates for our patch series automatically. A pretty little hack, no? Here are the relevant incantations, assuming you want to modify a patch named P with revision number N:

  1. hg log -r[[N]]:tip --template "hg qref -d '{date|date}'; hg qpush;\n" > d.sh
  2. Run hg qgoto [[P]], then make whatever changes you need.
  3. Run bash d.sh, which will restore the original dates for all patches in the queue following the modified one.

An Apple Easter Egg, Of Sorts

Looks like someone at Apple has a bit of a sense of humor:

 sosumi

 sosumi for the legal section, eh?

Apparently they did the same thing for the 10 billion download promotion, too.

Agrajag Rediscovered

Going through the files on an old family computer, I discovered a long-lost email exchange with Douglas Adams:

Subject: Re: Agrajag
Date: Fri, 11 Jun 1999 17:52:48 -0700
From: "Douglas Adams" <askdna@TDV.COM>
To: Martin Karel <karel@BELLATLANTIC.NET>

Irene –
I’m pretty bad at planning what’s going to happen next week. Alex is exactly right.

Best,
———————————————————–
Douglas Adams | <http: http://www.douglasadams.com>;
———————————————————–
The Digital Village | <http: http://www.h2g2.com>;
———————————————————–

Please – help save the last 650 mountain gorillas
http://www.gorillas.org/index.html

> Hi,
>
> Please settle a debate between me and my brother. I held forth that
> you must have already had Agrajag in mind when you wrote the bit about
> the pot of flowers that, when falling, said "Oh no, not again!" But
> Alex thinks that you probably just randomly stuck that line in there,
> and came up with Agrajag later and it just happened to fit. If I’m
> right, then we are very impressed with your degree of planning ahead,
> seeing as how it was four years between the publishing dates of
> "Hitchhiker’s Guide" and "Life, the Universe, and Everything."
>
>            - Irene Karel,
>               an adoring fan

Perhaps worth noting: the saved text file was 1337 bytes.

Firefox Add-On Compatibility

It looks like Firefox is moving to a compatible-by-default model for extensions. I find this interesting, since I proposed this back in 2004 to no avail.

To be clear, I don’t mean to say that compatible-by-default would have been the right choice in 2004… Predictions about the past are almost hard as predictions about the future, and at least two major factors have changed since then. First, the state of the world for extensions now is more stable than it was in the Firefox 1.0 timeframe. And second, Firefox’s rapid release cycle exacerbates spurious compatibility problems, making this proposal significantly more attractive.

The Lessons of Lucasfilm’s Habitat

We were initially our own worst enemies in this undertaking, victims of a way of thinking to which we engineers are dangerously susceptible. This way of thinking is characterized by the conceit that all things may be planned in advance and then directly implemented according to the plan’s detailed specification. For persons schooled in the design and construction of systems based on simple, well-defined and well-understood foundation principles, this is a natural attitude to have.

http://www.fudco.com/chip/lessons.html

I wonder what Intel is up to?

 

http://cufp.galois.com/2007/slides/AnwarGhuloum.pdf

“A proposed long term solution: a new functional language that features implicit parallelism, dependent typing, and an effects type system”

“We are also working with an ISV on a compiler for a parallel functional language
- Strict, but not call by value for implicit parallelism (maybe
with lightweight annotations)
- Arrays and array comprehensions for data parallelism
- Effects system to contain impure features
- Atomic for state updates”

 

http://cufp.org/conference/sessions/2010/functional-language-compiler-experiences-intel

“For five years Intel’s Programming Systems Lab (PSL) has been collaborating with an external partner on a new functional programming language designed for productivity on many-core processors. While the language is not yet public, this talk outlines motivations behind the language and describes our experiences in implementing it using a variety of functional languages. The reference interpreter is written in Haskell and compiled with GHC while PSL’s performance implementation is written in SML and compiled with Mlton. We have also generated Scheme code compiled with PLT Scheme as part of a prototyping effort.”