[Metalua-list] Offline globals protection with metalua?

Fabien Fleutot fleutot at gmail.com
Fri Dec 14 11:03:56 GMT+3 2007


On Dec 14, 2007 12:31 AM, Alexander Gladysh <agladysh at gmail.com> wrote:

> Hi, all!


Welcome on board :)


> OK, maybe metalua is not the right tool to do what I want, but (1)
> I've finally got a slice of time to implement some offline global
> protection for our project and (2) for a long time I wanted to learn
> metalua -- so I want to give it a chance.


Analyzing code is definitely half of metalua's purpose, and yes, analyzing
without generating makes a lot of sense, even if that's not what  most
current samples are about (it tends to be less impressive from an outsider's
point of view).


> By global(s) protection I mean some system for protection of global
> environment. Lua distribution already provide `/etc/strict.lua' as an
> example for runtime protection and `/test/global.lua' as one for
> offline static validation.


So what you want is extracting the free variables (not bound as function
param, for variable, local var etc), and check that they are authorized.
This is a frequent need, and I was actually needing it for a hack, so I've
just written down that part of the code; feel free to share (consider it MIT
license):

require 'std'
require 'walk'

-{ extension 'match' }

--------------------------------------------------------------------------------
-- Scope handling: ':push()' saves the current scope, ':pop()' restores the
-- previously saved one. ':add(identifiers_list)' add identifiers to the
current
-- scope. Current scope is stored in '.current', as a string->boolean
hashtable.
--------------------------------------------------------------------------------

scope = { }
scope.__index = scope

function scope:new()
   local ret = { current = { } }
   ret.stack = { ret.current }
   setmetatable (ret, self)
   return ret
end

function scope:push()
   self.current = table.shallow_copy (self.current)
   table.insert (self.stack, self.current)
end

function scope:pop()
   self.current = table.remove (self.stack)
end

function scope:add (vars)
   for id in values (vars) do
      match id with `Id{ x } -> self.current[x] = true end
   end
end

--------------------------------------------------------------------------------
-- Return the string->boolean hash table of the names of all free variables
-- in 'term'.
--------------------------------------------------------------------------------
function freevars (term)
   local freevars = { }
   local scope    = scope:new()
   local cfg = { expr  = { pred = { "Function", "Id" } },
                 stat  = { pred = { "Forin", "Fornum", "Local", "Localrec" }
},
                 block = { pred = true } }


-----------------------------------------------------------------------------
   -- Check identifiers; add functions parameters to scope

-----------------------------------------------------------------------------
   function cfg.expr.down(x)
      match x with
      | `Id{ name } -> if not scope.current[name] then freevars[name] = true
end
      | `Function{ params, _ } -> scope:push(); scope:add (params)
      end
   end



-----------------------------------------------------------------------------
   -- Close the function scope opened by 'down()'

-----------------------------------------------------------------------------
   function cfg.expr.up(x)
      match x with
      | `Function{ ... } -> scope:pop()
      | `Id{ ... }       -> -- pass
      end
   end


-----------------------------------------------------------------------------
   -- Create a new scope and register loop variable[s] in it

-----------------------------------------------------------------------------
   function cfg.stat.down(x)
      match x with
      | `Forin{ vars, ... }    -> scope:push(); scope:add(vars)
      | `Fornum{ var, ... }    -> scope:push(); scope:add{var}
      | `Localrec{ vars, ... } -> scope:add(vars)
      | `Local{ ... }          -> -- pass
      | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'
         scope:push()
         for s in values (block) do walk.stat(cfg)(s) end -- no new scope
         walk.expr(cfg)(cond)
         scope:pop()
         return 'break' -- No automatic walking of subparts
      end
   end


-----------------------------------------------------------------------------
   -- Close the scopes opened by 'up()'

-----------------------------------------------------------------------------
   function cfg.stat.up(x)
      match x with
      | `Forin{ ... } | `Fornum{ ... } -> scope:pop()
      | `Local{ vars, ... }            -> scope:add(vars)
      | `Localrec{ ... }               -> -- pass
      -- `Repeat never happens, it cancels the 'up()' call by returning
'break'.
      end
   end


-----------------------------------------------------------------------------
   -- Create a separate scope for each block

-----------------------------------------------------------------------------
   function cfg.block.down() scope:push() end
   function cfg.block.up()   scope:pop()  end

   walk.block(cfg)(term)
   return freevars
end

It's based on:
- the 'match' extension, which is the ultimate tool for easy AST
manipulation. You really want to know this as soon as you want to do
anything serious with tree-like structures.
- walk, a library that generates AST reading and manipulation functions.

Both are currently under-documented, I'll try to fix that very fast (maybe a
quick&dirty blog post). Match is pretty simple to use, walk is much more
tricky to master (but the problem it addresses is way tougher. Reliable code
walker generation is still considered unsolved in Lisp). The latter is more
or less documented through comments in the source, if it helps.

For what you want to do, you might use freevars to get all free vars, then
re-parse to extract all declarations, and finally compare both sets; or you
can hack freevars to keep track of all declaration calls: `Call{ `Id
'declare', `String{ ? } }.

You can use walk to find all declarations:

-{ extension 'match' }

require 'mlc'
require 'walk'

--------------------------------------------------------------------------------
-- Where declared vars will be accumulated
--------------------------------------------------------------------------------
vars = { }

--------------------------------------------------------------------------------
-- Walker config: catch `Call statements, and insert the declared
-- variable into table 'vars'
--------------------------------------------------------------------------------
cfg = { stat = { pred = 'Call';
                 down = function (ast)
                    match ast with
                    | `Call{ `Id 'declare', `String{v} } -> vars[v] = true
                    | `Call{ `Id 'declare', ... } -> error "Invalid
'declare()' form"
                    | _ -> -- pass
                    end
                 end } }

--------------------------------------------------------------------------------
-- Walker itself
--------------------------------------------------------------------------------
xtract_decl = walk.block(cfg)

(This approach doesn't take into account the case where a local variable
'declare' shadows the declaration operator. I'm not sure what you want to do
in such a case; probably stop and return an error ?)

You might want to do slightly more than that, though: when this code will
find a call to, say, table.foreach(), it will check that 'table' is declared
(through 'declare()' or in a constants list), but it won't check that
'foreach' actually exists in module 'table'. It would be interesting to
handle that as well.

>From these bricks, I think you can design the exportation system you want.
I'd suggest using table.print() to save exported variable lists in transient
files, so that you don't reparse your source files needlessly when you want
to safely import the module's API.

       -{ block:
>          mlp.lexer:add { "declare" }
>          mlp.expr:add {
>            name = "global symbol declaration";
>            "declare", gg.optkeyword("("), mlp.string, gg.optkeyword(")") ;
>            builder = function(symbol)
>              print("declare", symbol[2][1])
>              return +{ declare( -{symbol[2]} ) }
>            end
>          }
>        }


This approach is more complex, and harder to make robust, than necessary. As
suggested above, since you don't actually want to change the syntax, you'd
better let the parser alone and analyze the resulting AST, which is more
abstract and easier to manipulate.


> One thing that disturbs me is
> that mlc compiles my parsed results into bytecode -- I do not need
> that.


You're interested by mlc.ast_of_luafile (filename) or mlc.ast_of_string(source).

So, the question: would it work and does it worth it?
>

It definitely does! And if you're allowed to contribute your work back to
the community, it's likely to interest other people in similar situations as
yours, and help disseminate metalua :)

I'm very interested by feedbacks, as always.

-- Fabien.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.luaforge.net/pipermail/metalua-list/attachments/20071214/745854d1/attachment-0001.html


More information about the Metalua-list mailing list