[Metalua-list] Re: metalua source parsing

Fabien Fleutot fleutot at gmail.com
Sat Sep 13 20:21:59 GMT+2 2008


> To me [being used as a lua code analyzer]  seems to be the most

direct way for metalua to spread and to mature.


I wholeheartedly agree! What a shame that it isn't always the funniest kind
of code to write ;-)

Coming back to your concerns with AST->SRC transformation, let me repeat
that with the sources and the AST, you do have both a manipulable form and
its source representation:

src = "for i = 1, --[[FOO]] 10 do\n\tbar(i)\nend; print 'hi'"
ast = mlc.ast_of_luastring (src)

-- ast looks like this, omitting lineinfo:
-- { `Fornum{ `Id "i", `Number 1, `Number 10, { `Call{ `Id "bar", `Id "i" }
} },
--   `Call{ `Id "print", `String "hi" } }

-- What's the loop body's structure?
loop_body = ast[1][4]
table.print(loop_body, 'nohash')

-- What's the source representation of the loop's body?
print(src:sub(loop_body.lineinfo.first[3], loop_body.lineinfo.last[3]))

-- What's the kind of quote used for the 'hi' string in the final statement?
hi_string = ast[2][2]
local i = hi_string.lineinfo.first[3]
hi_string_delimiter = src:sub(i, i)

Really I don't think that detailed informations in a 'hairy field' would be
more practical. I would personnaly rather extract whatever info I need from
the sources with lineinfo, as above, rather than remembering how the 'hairy
field' describes the difference between "function foo() end" and "foo =
function() end".

I can't help but requiring that people know AST structure to do interesting
stuff, but I really don't want to force them to memorize any additional
baroque API.

However, developping some helper lib(s) which allow to cleanly splice bits
of transformed code into an existing base source code is definitely a useful
and interesting project. Here's probably its first and main brick:

-- Retrieve the source code of a node
function get_src (ast, src)
   return src:sub(ast.lineinfo.first[3], ast.lineinfo.last[3])
end

2.3. And this info is already partially used. For example, comments
> are marked as long or short.


This is kept because I thought it might be useful to people who use comments
to embed metadata in a Lua-compatible way. They might want to support
different additional syntaxes in long and short comments. In retrospect, I'm
not sure it was worth it. Simply letting them recover that info from the
sources might be enough.

Supposing that the comment kind field wasn't available, and that a comment's
lineinfo pointed on the whole comment (including delimiters) rather than
only its content (a design mistake I'm going to fix), we could do this:

-- Return the kind of a comment: either 'short',
-- or a number indicating the number of '=' signs used in a long comment.
function comment_kind(comment, src)
   local txt, first, last = unpack(comment)
   local src_com = src:sub(first, last)
   if src_com :match "^%-%-[^[]" then return 'short' end
   else return #(src_com :match "^%-%-%[(=*)%[") end
end

3.4.1. Whitespace kinds (space vs. tab), trailing whitespaces and
> line-endings (CR vs. CRLF etc.). I think that we can afford to lose
> this info.


Again, keeping and using the sources elegantly solves this otherwise
pythonistically hairy problem :)


> Also we should handle stuff like do ;;;;;; end.


There's only one way to handle that :)

fabien at MacFabien:~/scratch$ lua
Lua 5.1.3  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> do ;;; end
stdin:1: unexpected symbol near ';'
> ^D
fabien at MacFabien:~/scratch$

Perfect code analysis library would let its user to change the AST and
> then would automagically generate adequate output given a set of code
> guidelines for new code generation. It is a question of complexity of
> that "automagic" for the both cases.


Depending on your rule set and the level of OCD of its author:
* Either it's a beautifier which simply fixes and changes a couple of points
(then, you modify the appropriate nodes and weave them back in place,
leaving the rest of the code intact),
* or it's a canonizer, and then it can be computed from AST without looking
at the sources at all.


> I agree, the simplest solution here would probably be to write
> specialized code to meddle directly with source. Rename all global
> variables, then update any aliases on the top with new ones. However,
> I'm afraid of need to write such specialized code for each arbitrary
> rule in my analyzer...


You "only" need to invest a bit of time to get proficient with the walker
library. It's a complex lib, but that's because it does intrinsically
complex stuff. The walk.id sub-library is even more scary at first, but it's
variable aware: it properly handles variable scopes, shadowing, knows which
vars are local or global, etc. These are the kind of stuff that you won't
get right with a naive source muncher.

-- Fabien.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.luaforge.net/pipermail/metalua-list/attachments/20080914/43893a48/attachment.htm


More information about the Metalua-list mailing list