A Macro Facility for Lua using Token Filters

lhf's [tokenf patch] (see also [this writeup]) provides a simple but powerful hook into the stream of tokens that the Lua compiler sees. (In Lua, for a given module, compilation into bytecode and execution are distinct phases.) Basically you have to provide a global function called FILTER, which will be called in two very different ways. First, it will be called with two arguments; a function which you can use to get the next token (a 'getter') and the source file. Thereafter, it will be called with no arguments, but will be expected to return three values. (This is confusing at first, and these two functions should probably be given different names.)

The get function returns three values: line,token and value. Token has a few special values like '<name>' (any symbol), '<string>', '<number>', and '<eof>' but otherwise is the actual keyword or operator like 'function', '+', '~=', '...', etc. If the token is one of the special cases, then the value of the token is returned as the third value. (There is an instructive example with the tokenf distribution, called fdebug, which simply prints out these values.)

Token filters read and write tokens one at a time. Coroutines make it possible to maintain complex state; otherwise you would have to manage a state machine, which isn't such a fun form of programming.

The macro facility described here is pretty similar to the C preprocessor, although it works on an already predigested token stream and is not a separate program through which Lua code is passed.

A simple macro that takes two parameters is this:

macro.define('PLUS',{'L','C'},@ ((L)+(C)) @)

The stuff between a pair of @'s is a token literal. It evaluates as a table containing predigested tokens. Although this may sound fancy and perhaps over-clever, it is really just laziness; if the substitution was supplied as a string, it would need to be tokenized separately, which is awkward. This way, Lua does the tokenizing of the substitutions up front. (If people feed strongly about this, it would not be difficult to add a simple tokenizer in Lua.)

The following is a simple equivalent to a C-style assert, where the actual expression is converted into a string to form the optional second argument of assert() using the 'stringizing' function _STR():

macro.define('ASSERT',{'x'},@assert(x,_STR(x))@)

Testing your Macros

Macro definitions are placed in a separate file from the code to be preprocessed, and must be on your module path:

$ lua -lmacro -lmacro-defs test-macro.lua

Loading 'macro-defs' as a module means that 'macro-defs.lua' is compiled and executed before 'test-macro.lua', which is crucial because macros work in the pre-compilation phase.

Your macros can be tested interactively like this (note that errors are given at the 'correct' line):

D:\stuff\lua\tokenf>lua -lmacro -lmacro-defs -i
Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio
> = PLUS(10,20)
30
> = PLUS(10)
=stdin:1: PLUS expects 2 parameters, received 1
> ASSERT(2 > 4)
stdin:1: 2 > 4
stack traceback:
        [C]: in function 'assert'
        stdin:1: in main chunk
        [C]: ?

The substitution may be a function - this is where things get interesting:

macro.define('__FILE__',nil,function(ls) return macro.string(ls.source) end)

The nil second argument indicates that we have no parameters, and the third argument is a function which always receives a table containing the lexical state: source,line and get (the getter function currently being used). This function is expected to return a token list: in this case, {'<string>',ls.source} . Three convenience functions, macro.string(),macro.number() and macro.name(), are available.

In general, the substitution function receives all parameters passed to the macro:

   local mstring = macro.string
   local value_of = macro.value_of
   local define = macro.define

   define('_CAT',{'x','y'},function(ls,x,y)
	return mstring(value_of(x)..value_of(y))
   end)

This is the only way to handle variable length parameter lists, since otherwise the number of formal and actual parameters must match. Bear in mind that the parameters always come in the form of token lists, which have a particular abbreviated format. For example, {'<name>','A','+','<name>','B','*','<number>',2.3} .

Please note that macro definitions are Lua modules and so you are free to define local variables and functions.

Implementing a Try/Except Statement

You can also define a handler which provides parameters if a macro is intended to be called without a parameter list. This is the third argument to define(). As an actual useful example, here is how 'try' and 'except' can be defined as semantic sugar around pcall():

---- implementing try...except
local stack = {}
local push = table.insert
local pop = table.remove
local global = macro.global
local name = macro.name
local define = macro.define

define('try',{'L1','L2',handle_parms = true},
	@ local L1,L2 = pcall(function() @,
	function(ls)
		local L1 = global()
		local L2 = global()
		push(stack,{L1,L2})
		return name(L1),name(L2)
	end)

define('except',{'L1','L2',handle_parms = true},
	@ end) if not L1 then local e = L2 @,
	function(ls)
		local t = pop(stack)
		if not t then macro.error("mismatched try..except",ls.line) end
		return name(t[1]),name(t[2])
	end)

So, given code like this:

a = nil
try
  print(a.x)
except
  print('exception:',e)
end

The compiler would see the following code:

a = nil
local _1ML,_2ML = pcall(function()
 print(a.x)
end) if not _1ML then local e = _2ML
 print('exception',e)
end

The smartness of these macros (note that we can here keep track of nested try..except statements) means that we can try out new syntax proposals with a little work, without having to patch Lua itself. And writing macros in Lua is certainly an order of magnitude easier than writing syntax extensions in C!

As an example of more elaborate code generation, here is a using macro which works rather like the C++ statement. There is no true module scope in Lua, so a common trick is to 'unroll' a table:

local sin = math.sin
local cos = math.cos
...

Not only do we get nice unqualified names, but accessing local function references is faster than looking up functions in a table. Here is a macro that can generate the above code automatically:

macro.define('using',{'tbl'},
    function(ls,n)
        local tbl = _G[n[2]]
        local subst,put = macro.subst_putter()
        for k,v in pairs(tbl) do
            put(macro.replace({'f','T'},{macro.name(k),n},
                @ local f = T.f; @))
        end
        return subst
    end)

Here the substitution is a function, which is passed a name token (like {'<name>','math'}), assumes it refers to a globally available table, and then iterates over that table dynamically generating the required local assignments. subst_putter() gives you a token list and a put function; you can use the put function to fill the token list, which is then returned and actually substituted into the token stream. replace generates a new token list by replacing all occurrences of the formal parameters (first argument) with actual parameter values (second argument) in a token list. To use this, put the macro call at the start of your module:

  using (math)
   

Implementing Type Annotations

A common issue with dynamic languages is this: they are strongly typed, but the actual type at runtime is dynamic. In particular, you cannot look at a function definition and immediately deduce the parameter types, unless somebody has been kind and used comments. We cannot always depend on the kindness of strangers, but it is straightforward to define a set of macros which act as explicit type annotations. We want to define functions like this:

   function rep (String s, Number k)
     return s:rep(k)
   end
But what the compiler actually sees is this:
   function rep (s, k)
     _assert_type(s,'string','s')
     _assert_type(k,'number','k')
     return s:rep(k)
   end
Where _assert_type can be simply defined as:
  function _assert_type(value,typestr,parm)
    local t = type(value)
    if t ~= typestr then
        error(
            ("Argument '%s' expects a type of '%s', got '%s'"):format(parm,typestr,t),
            2)
    end
  end
In this way, we achieve two things: first, the function is more self-documenting, and second, the contract is enforced at runtime.

local subst_putter = macro.subst_putter
local set_trigger = macro.set_trigger
local insert_tokens = macro.insert_tokens
local replace = macro.replace
local mstring = macro.string
local subst = nil
local put

local function type_checker_macro(Fname,Tname)
    macro(Fname,{'arg',handle_parms = macro.next_token_grabber('<name>')},
        function(ls,arg)
            local line = ls.line
            if not subst then -- first argument in a list
                subst,put = subst_putter()
                set_trigger(')',true,function()
                    insert_tokens(line,subst)
                    subst = nil
                end)
            end
            -- in any case, put the assertion into the token list
            put(replace({'arg','tname'},{arg,mstring(Tname)},
                @ _assert_type(arg,tname,_STR(arg)); @))
            return arg
    end)
end

type_checker_macro('String','string')
type_checker_macro('Number','number')
type_checker_macro('Table','table')
type_checker_macro('Function','function')
type_checker_macro('Boolean','boolean')

This works as follows: we specify macro.next_token_grabber('<name>') as our parameter grabber; it will return the symbol following the macro. The substitution is a function; it just returns the argument it receives, which is that symbol. But the interesting stuff happens as a side-effect; we define a trigger, which fires on the end of the argument list and inserts all the assertions into the code immediately following the argument list.

Notice how families of related macros can be generated with similar code. It would not be difficult to generalize this scheme to handle a proper object-oriented hierarchy.

Implementing List Comprehensions

In PythonLists, FabienFleutot discusses a list comprehension syntax modelled on the Python one.

    x = {i for i = 1,5}

    {1,2,3,4,5}

Such a statement does not actually require much transformation to be valid Lua. We use anonymous functions:

   x = (function() local ls={}; for i = 1,5 do ls[#ls+1] = i; return ls end)()

However, to make it work as a macro, we need to choose a name (here 'L') since macros are not triggered on arbitrary tokens.

local token_append = macro.token_append

local function grab_counting_braces(get,endtoken)
    local level = 1 -- used to count { and }
    local tl = {}
    while true do
        line,token,value=get()
        if token == '<eof>' then return end
        if token == '{' then
            level = level + 1
        elseif token == '}' then
            if token == endtoken and level == 1 then
                return tl
            end        
            level = level - 1
        end
        if token == endtoken and level == 1 then
            return tl
        else
            token_append(tl,token,value)
        end
    end
end

macro.define('L',{'expr','loop_part',handle_parms=true},
    @ ((function() local t = {}; for loop_part do t[#t+1] = expr end; return t end)()) @,
    function(ls)
        local get = ls.getter
        local line,t = get()
        if t ~= '{' then macro.error("syntax: L{<expr> for <loop-part>}") end
        local expr = grab_counting_braces(get,'for')
        local loop_part = grab_counting_braces(get,'}')
        return expr,loop_part
    end)

The substitution is pretty straightforward, but we need a custom parameter grabber. This first needs to grab upto 'for', and then grab upto '}', keeping track of the brace level. By doing this, nested comprehensions work as expected:

  x = L{{j for j=1,3} for i=1,3}
  
  {{1,2,3},{1,2,3},{1,2,3}}

A particularly cool idiom is to grab the whole of standard input in one line:

   lines = L{line for line in io.lines()}

The source code for Lua Macro is available here: http://mysite.mweb.co.za/residents/sdonovan/lua/luamacro.zip

Lua 5.1

-- SteveDonovan