Intuitive hygienic macros

Of the many differences between Lisp and Scheme, that between their respective facilities for defining macros is one of the most glaring and controversial. If I may be permitted to generalize for the sake of illustration, Schemers detest the unhygienic nature of Lisp macros, while Lispers find Scheme's syntax-rules confusing and inflexible. Both arguments have merit, and several efforts have been made to remedy the situation, most notably the syntax-case macro system available in several Schemes.

Other solutions notwithstanding, I'd like to present my particular contribution to this problem: a macro system which is hygienic, yet almost identical in usage to Lisp-style quasiquotation-based macros. I have implemented this system as a modification of Arc, Paul Graham's mzscheme-based lisp; but it's not necessary to know Arc or understand its internals to understand this article. In fact, Arc is clear enough that it should be almost self-evident what I'm doing, but just in case, here's an example macro definition in plain Arc:

(mac iflet (variable expression then . else)
  (w/uniq temporary
    `(let ,temporary ,expression
       (if ,temporary
         (let ,variable ,temporary ,then)
         (do ,@else)))))

This defines the iflet macro, which tests whether expression is true, and if so, binds it to variable and executes then, and otherwise, executes else as a do body. There are a few Arc-specific things of note here: Arc's do is Common Lisp's progn and Scheme's begin, rather than the usual Lisp/Scheme do loop; let binds a single variable to a value, no parentheses - it is not like let in other Lisps; w/uniq is an Arc macro that calls uniq - Arc's equivalent of gensym - to generate unique symbols for use in macroexpansion.

As you can see, macros in Arc are quasiquotation-based, and they behave for our purposes exactly like their counterparts in Common Lisp. Now, here's how you'd define that same macro using my hygienic quasiquotes:

(mac iflet (variable expression then . else)
  #`(let temporary #,expression
      (if temporary
        (let #,variable #,temporary #,then)
        (do #,@else))))

At first sight, it seems I've just prefixed all the quasiquotation symbols with "#". On second look, though, there's no use of w/uniq - the symbol temporary is automatically mangled to avoid variable collision: this form of quasiquotation is hygienic. Of course, this form of hygiene is easily emulated with gensyms; if it weren't, Lispers wouldn't have a leg to stand on in arguments for and against hygienic macros. But hygienic macros don't just avoid local symbol collisions; they avoid global collisions, too. Consider the following macro:

(mac square (x)
  (w/uniq tmp
    `(let ,tmp x
       (* ,tmp ,tmp))))

And suppose I called the macro as follows:

(let * 2 (square *))

Suddenly, everything falls apart. The * function in the macroexpansion of square, supposed to refer to the globally-bound multiplication function, is instead bound to the number 2. Admittedly, this example is highly contrived; square is better written as a function, and nobody would use * as a local variable name. But it serves to illustrate a principle that affects even perfectly useful and necessary macros: locally-bound symbols can shadow global variables.

Consider: what if the macro in question is part of some module's API, but the function it invokes is not? A collision in this case is much more plausible than if the function is, like *, part of the core language. Common Lisp solves this problem by way of its package system, which is based on symbol-renaming. This reduces the problem to avoiding collisions within a package, which is eminently pragmatic, but uglier than a general solution: it "tacks on", as it were, a solution to a problem with unhygienic macros to a solution to a totally different problem (namespacing).

Unfortunately, Arc lacks a module system, so although the principle behind the implementation of my hygienic quasiquotation could be applied to such, I am unable to demonstrate it. However, it at least handles the problem of local variables shadowing global ones, which is to say that square as defined below will not break if you invoke it after locally rebinding *:

(mac square (x)
  #`(let tmp #,x
      (* tmp tmp)))

Alright, enough already about how lovely my hygienic macro system is. How does it work?

Syntactic closures

The idea behind hygienic macros is that the bindings of symbols in the quasiquoted code are separate from those in the code from which the macro is invoked. In order to track this information, while still maintaining the lisp model in which a macro simply manipulates code as data, an extra layer of indirection is necessary.

Lisp has two datatypes which behave specially when understood as code: lists and symbols. Lists represent function calls, macro invocations, or special forms. Symbols represent variables. To implement hygienic macros, I add another such type: syntactic closures. Syntactic closures consist of a "syntactic environment" and a piece of code. When evaluated, the code is evaluated within that environment.

If this sounds familiar, that's because it is; a "closure" usually refers to a combination of a function and an environment. Although usually "closure" is used to refer only to functions which require a concrete representation of some part of their environment at runtime, it makes sense more generally to say that every function has an associated environment and forms a closure; when you invoke a function, the code in the function is not invoked with whatever symbol-to-value bindings are present at the moment, but with the bindings present where it was defined, augmented by binding the values passed to it to its parameters.

It is possible to do without this sort of environmental juggling entirely, and in fact there is a name for the distinction: with closures, one has lexical scoping; without them, one has dynamic scoping. The latter is much easier to implement, and used to be the norm for lisps; emacs lisp still uses it. Lexical scoping is generally agreed to be preferable, and for good reasons - reasons analogous to the arguments for hygienic macros. In fact, hygiene is very much the equivalent of lexical scoping for macros.

The difference between syntactic closures and their functional counterparts is that, just as a macro operates on code at compile-time while a function operates on values at run-time, a syntactic closure is an element of lisp code and its environment tracks bindings of symbols to variables (and is hence called "syntactic"), while a function closure is a value and its environment tracks bindings of variables to values.

On a side note, syntactic closures are not quite as simple as I make them seem. They don't just have special meaning when evaluated; they also alter the semantics of code when one appears as the parameter in a let-binding or a function. In such a case, the "code" they enclose represents the parameter much as it would normally, but instead of introducing a symbol-to-variable binding in the syntactic environment of the let-expression or function, it introduces it in the environment of the closure.

I imagine that if Arc (and lisps in general) were less homogenous languages - if there were more "kinds" of arc code than just expressions and "binders" such as function- and let-parameters - then there would be more special cases like this hanging around the usage of hygienic macros.

Hygienic quasiquotation

Although it's clear that syntactic closures are what we want, it may still be less than obvious how to apply them to solve our problem, and moreover how to put an intuitive quasiquotation-like interface on this solution. Thankfully, the solution's implementation follows fairly naturally from considering the semantics of syntactic closures as compared to the desired behavior of hygienic macros.

When we invoke a macro, we expect the most of the code of the macroexpansion to be evaluated in a separate syntactic environment; in particular, in the environment the macro was defined in. Hence, we enclose the resultant code in the macro's defining environment. The exceptions to this rule mostly involve the code which was passed into the macro, which we expect to maintain its source environment; so we must enclose it in the callee's environment.

Representing this solution in terms of quasiquoting is similarly simple: make the hygienic quasiquoter enclose its argument in the syntactic environment of the macro definition, and make the hygienic unquoter enclose its evaluated argument in the callee's syntactic environment. There are, however, two exceptional cases we must consider.

First, what if the macro wants, as an anaphoric macro might, to introduce a binding in the callee's environment whose symbol is not one of the arguments passed to it? The solution here is to hygienically unquote the desired symbol's quotation; it will then be enclosed in the callee's environment by the unquoter.

Second, what if the macro wants to generate and insert code via unquotation that is not supposed to be enclosed in the callee's environment? The solution here is to use the normal, unhygienic unquoter from within the hygienic quasiquoter. Any code within the generated code which does needs to be in the callee's environment can be hygienically unquoted as usual.

Implementation details

At this point, we will delve more deeply into the internals of the arc compiler, found in the file ac.scm. Feel free to skip this section if you only wanted the theory.

I have yet to fully flesh out this section. The basic idea, however, is that where the original arc compiler keeps track merely of a list of symbols currently lexically bound, the hygienic compiler keeps track of the current syntactic environment and of the bindings from conses of lexically-bound arc symbols with their syntactic environment to mzscheme symbols, and inserts the corresponding mzscheme symbols in place of the arc symbol. Arc symbols which are in the default environment are always bound to themselves, but symbols in other environments are bound to gensyms.

Using the hygienic version of Arc

As posted on the arc forums, the hygienic arc compiler is maintained as an anarki branch. It should be fully compatible with the master anarki branch, but as I'm not completely sure of this and moreover I'm not sure people would appreciate the massive changes it makes to ac.scm, I won't merge it into the main branch. In order to use the hygiene branch, from your local copy of the anarki repo, run:

git checkout --track -b hygiene origin/hygiene

After doing this, you will be on the hygiene branch. You should only need to run the above once. If you want to go back to the master branch, run "git checkout master"; if you want to go again to the hygiene branch, run "git checkout hygiene". For more information on branching in git, read the docs.