I came across a very nice notation in the book Basics of Compiler Design that greatly clarified the various choices for bootstrapping a compiler. The notation was originally created by Harvey Bratman in 1961 (!) , but every other reference I’ve found on the subject is just terrible for conveying understanding. Mogensen’s presentation, on the other hand, is refreshingly clear.
The rules for T-diagrams are very simple. A compiler written in some language “C” (could be anything from machine code on up) that translates programs in language A to language B looks like this (these diagrams are from Torben Mogensen’s freely-available book on compilers):
Now suppose you have a machine that can directly run HP machine code, and a compiler from ML to HP machine code, and you want to get a ML compiler running on different machine code P. You can start by writing an ML-to-P compiler in ML, and compile that to get an ML-to-P compiler in HP:
From there, feed the new ML-to-P compiler to itself, running on the HP machine, and you end up with an ML-to-P compiler that runs on a P machine!
The T-diagram notation can also easily be extended with interpreters, which are simply vertical boxes that can be inserted between any two languages. For example, here’s a diagram depicting a machine D running an interpreter for language C that is, in turn, running a compiler in language C to translate a program from language A to language B:
After thinking about bootstrapping in an ad-hoc way, I just love the structure that these diagrams provide! Unfortunately, while the book itself is freely available online, there’s no HTML version of the book, only a PDF version. The easiest way to see the full presentation is probably to use Scribd and search for “t-diagram”.