[Metalua-list] Offline globals protection with metalua?

Alexander Gladysh agladysh at gmail.com
Thu Dec 13 20:31:15 GMT+3 2007


Hi, all!

First of all, please excuse me for a long and watery post. The
practical part of the question starts two paragraphs above code sample
down below.

OK, maybe metalua is not the right tool to do what I want, but (1)
I've finally got a slice of time to implement some offline global
protection for our project and (2) for a long time I wanted to learn
metalua -- so I want to give it a chance.

By global(s) protection I mean some system for protection of global
environment. Lua distribution already provide `/etc/strict.lua' as an
example for runtime protection and `/test/global.lua' as one for
offline static validation.

We already use runtime global protection a-la `strict.lua' in our
(relatively large) Lua codebase. Only difference is that in our system
every global accessible variable (on read and write) must be declared
explicitly via call to `declare()' (for single 'symbol') or to
`exports()' functions (for a list of symbols; used most often). Like
this:

  declare 'global_function_1'
  exports { 'global_function_2', 'global_function_2' }

Runtime global protection system (no matter the declaration style)
proved to be invaluably useful. It saves us from infinite amount of
headaches. Our first large project with Lua didn't include any
protection, so I know what I'm talking about. (I believe I'm saying
obvious things here, but please bear with me.) However runtime
validation is runtime validation -- from time to time our global
protection chokes in dark and rarely executed spots of production code
on undeclared global variable access. (Of course it is a million times
better than dreaded hidden logic flaw due to global variables
entanglement that would happen in that place in absence of global
protection.) We write autotests, of course, but there is always left
that little piece of code we've not covered yet. Thus comes the need
for some static (offline) validation.

Due to the dynamical nature of the Lua language, I believe, it is
hardly possible to implement full static check, which would filter out
all imaginable global protection violation cases -- some of those
cases may appear in runtime only. That is, after all, I may want to
have two kinds of global environments with different sets of allowed
globals (say, one with system code, and another as a restrictive
sandbox for end-user's). Or I may want to generate some functions in
run-time and assign them to globals at names, generated at runtime
too. Actually, we have at least these both cases in our system. So,
static validator must coexist with runtime.

Static validation actually consists of two main tasks. First, to
detect which globals are allowed to be accessed, and, second, is to
detect which disallowed globals are accessed and where -- and report
such places to user. Static search for global accesses itself does not
pose a large problem -- one might simply take the `globals.lua'
approach. The problem is how to get the list of 'allowed' variables.

Which variables do we want to be allowed? Of course, variables we're
marking now with `declare()' and `exports()' calls. Then come standard
global variables from Lua -- OK, consider them as a fixed set and
hardcoded into exception-list. And, worse of all, globals from
assorted 3rd-party C-side libraries and from the `engine' itself --
OK, maybe hardcode them too. This would explode our static checker's
exception list in very ugly way, but it is bearable -- we use only a
limited number of 3rd party Lua modules, and rarely add something new
there, as well as we rarely change API of our engine.  As for
3rd-party Lua modules -- list of globals they're using to may be
fetched from their bytecode with the same `global.lua' approach.

Main issue is to be able to conveniently `declare' existing globals in
our Lua code, and to be able to add new globals as easily as we do it
now. And, hopefully, to declare each variable only once --
simultaneously for both static and runtime validators. Best solution
would simply reuse existing `declare()' and `exports()' calls.

By the way I think that the problem with `symbols' from 3rd-party
modules may be solved by forcing our code to declare not only `export'
variables but the global variables it `imports' (uses) too -- kind of
`import' keyword with autodeclaring all specified imports as locals as
a bonus. This may allow better (more strict) offline validator
implementation...

Back to the point. To find all declared globals, we need some parser
which would capture all `declare()' and `exports()' calls, with all
syntax variations possible in Lua, with string literal arguments,
skipping or failing on any non-literals. I think, for simplicity and
coherency, it should fail, and some other function (like
`declare_runtime()') should be used in the rare dynamic cases instead.
Of all existing solutions I have considered -- metalua, token filters,
lpeg-based parsers (lua-fish etc.), metalua is the closest to my
heart. However, I have little or no experience or even much knowledge
of either of these solutions -- including metalua -- so it is only an
intuition.

Another issue to consider -- is that we can not afford to switch our
production code from Lua to any another language or other Lua VM
codegenerator. Not just to enhance our global protection system
anyway. So the only option for this matter is to use metalua (or any
other solution) as a part of our validation toolchain. This have
positive effects as well -- the tool is not required to be any
`beautiful' or top-notch efficient (while it should be fast enough to
be usable in a precommit hook) -- it may be any quick-and-dirty
`hackish' solution that would do the job. (While I'd, of course,
prefer something beautiful and efficient...)

So, the promised practical part:

I've got an impression that metalua is backwards compatible with Lua
code. That is, any Lua source file is a metalua source file too. So, I
think, I'll write some metalua macro stuff to capture all those
declaration calls from the code, accumulate them in a file, then feed
this file and a result of `luac -p -l' to a `global.lua'-based script.
I've even written some draft code to try out and learn metalua -- it
should capture all calls to `declare'. I have not found how to match
on function calls, so I've decided to make `declare' a keyword.

I must note, by the way, that the process of writing of that small
piece of code was a pleasure. Even considering the usual frustration
of blind struggle with yet-unknown complicated system -- resulting
impression of the system is quite pleasant.

        -{ block:
          mlp.lexer:add { "declare" }
          mlp.expr:add {
            name = "global symbol declaration";
            "declare", gg.optkeyword("("), mlp.string, gg.optkeyword(")") ;
            builder = function(symbol)
              print("declare", symbol[2][1])
              return +{ declare( -{symbol[2]} ) }
            end
          }
        }

        declare "A"
        declare("B")
        declare 'C'
        declare('D')
        declare [[E]]
        declare([[F]])
        -- declare("invisible")

        return function() declare("G") end

        declare("H") -- Note this would have no effect in runtime
global protection, but I can bear with that static one considers that
G as declared -- this is a corner case anyway.

It works (seems to at least), but this is my first program in metalua
-- so I'm sure I've missed a lot there. One thing that disturbs me is
that mlc compiles my parsed results into bytecode -- I do not need
that. To get the list of declarations, I need parse step only, no
codegeneration. Another thing is that I'm reinventing the wheel here
-- metalua already knows how to parse function calls (OK, my case is
somewhat stricter -- I allow string literal arguments only).

Anyway. Next I plan to add reading of code to be parsed from stdin and
writing output to stdout -- to be shell-friendly. And, of course, add
a parser for `exports' calls...

So, the question: would it work and does it worth it? May be I'm
missing something? Is there a better approach?

...But now I think: what if it would be possible to implement second
part of static validation in metalua too? (That is, the detection of
global environment read and write access.) Approach, closest to
`global.lua', would work on the codegeneration level, I guess... but,
as far as I can see, metalua does not provide any API to the
codegeneration part on the required abstraction level. So, how hard
would it be to implement such validation on AST level? In which fasion
should it be implemented on AST level? Does it worth it?

Thanks in advance,
Alexander.



More information about the Metalua-list mailing list