ANTLR Grammar Tip: LL(*) and Left Factoring

Suppose you wish to have ANTLR recognize non-degenerate tuples of expressions, like (x, y) or (f(g), a, b) but not (f) or (). Trailing commas are not allowed. The following formulation of such a rule will likely elicit a complaint (“rule tuple has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2; …”):

tuple : '(' (expr ',')+ expr ')' ;

ANTLR’s error message seems unhelpful at first glance, but it’s trying to point you in the right direction. What ANTLR is saying is that after it reaches a comma, it doesn’t know whether it should expect expr ')' or expr ',', and parsing expr could requiring looking at an arbitrary number of tokens. ANTLR recommends left-factoring, but the structure of the rule as written doesn’t make it clear what needs to be factored. Re-writing the rule to make it closer to ANTLR’s view makes it clearer what the problem is:

tuple_start : '(' tuple_continue;
    :    ( expr ',' tuple_continue
    |      expr ')');

Now it’s more obvious where the left-factoring needs to be applied, and how to do it by writing expr once, leaving only the choice of seeing a comma or a close-paren:

tuple_start : '(' tuple_continue;
    :    expr ( ',' tuple_continue
              | ')'  );

The rewritten expansion corresponds to an alternative structuring of the tuple rule that will satisfy ANTLR:

tuple : '(' expr (',' expr)+ ')';

The ANTLR wiki has a page on removing global backtracking from ANTLR grammars, which has a nice presentation of how to do left-factoring when the opportunity is obvious.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s