[Metalua-list] Re: metalua source parsing

Alexander Gladysh agladysh at gmail.com
Sat Sep 13 19:19:07 GMT+2 2008


1. We're discussing the ultimate task of exact source-AST-source
transformation of plain Lua code.

The same task is interesting for full metalua itself indeed. But from
my point of view working with plain Lua is much more interesting.
While I can not afford to use metalua in production code until it is
"out of beta", I can and willing to use it in tools. Main use case for
metalua for me is writing of in-house lint / code inspector / code
beautifier tool for plain Lua. (By code inspector I mean some
call-graph / objects tree etc. generator for usage with, say, some
interactive documentation system.)

To me it seems to be the most direct way for metalua to spread and to
mature. However, with Fabien being the only major contributor (as I
see it from here at least), metalua may develop in only one direction
at a time. I think that the discussed set of features is not the most
important direction.

Furthermore, of course, the exactness of transformation is not
mandatory. Also, if we drop whole unimportant beautifier part, the
rest may be done fine without any code generation at all. Not too
fancy, but still very useful. So, I'm not saying that such feature is
a "must have". Still, I'm interested to discuss it -- for fun if not
for anything else.

2. Is exact source-AST-source transformation a task for metalua at all?

2.1. I do see that AST tree loses a lot of info from the lexer. From
brace and "secondary" keyword positions (like "in") and operator loss
(like ~= vs. not ==) to sugar-inflicted ambiguity (like function bar()
end vs. bar = function() end).

2.2. But the lexer indeed has all needed info.

2.3. And this info is already partially used. For example, comments
are marked as long or short.

2.4. Also to write with metalua is such fun! I'd like to write my code
analysis tool in it. :-)

3. What I'm suggesting:

3.1. Do not generate any "hairy mess" in AST unless explicitly asked.
Fork AST generator if feasible.

3.2. Put all "hairy mess" into special hairy-mess-place field in AST
node where it would not mess with other stuff. That place would hold
flags describing the exact syntax variants used, lineinfos for the
"secondary" keywords etc.

3.3.  Things I do not know what to do with yet:

3.4.1. Whitespace kinds (space vs. tab), trailing whitespaces and
line-endings (CR vs. CRLF etc.). I think that we can afford to lose
this info.

3.4.2. Optional semicolon delimiters between statements. May be
attributed to statements (with positions). Also we should handle stuff
like do ;;;;;; end.

3.4.3. Table constructor field delimeters (comma, semicolon or none).
Also may be attributed along with positions to table elements or even
to the `Table itself.

3.4.4 Comments. Not an AST node because may be used inside otherwise
atomic node:

for k, v -- comment
  in pairs(t) -- another
    do end

Comments are required to store metainformation in plain Lua sources.

4. What about splicing unchanged subtrees from original source?

I have to think about it.

Perfect code analysis library would let its user to change the AST and
then would automagically generate adequate output given a set of code
guidelines for new code generation. It is a question of complexity of
that "automagic" for the both cases.

It is getting late here, so I would start thinking tomorrow, but I
want to present an simple use case from practice. I already have a
working prototype in metalua which reports discussed rule violations
with recommended changes (in terms of paste and search-and-replace),
but does not change any source code. It was surprisingly quick and fun
to write.

4.1. Use case:
4.1.1. Rules:
   1. One may use only allowed globals variable
   2. Global variables must be aliased against sandboxing into locals
on the very top of file.
   3. Accesses to standard tables like math.min must be aliased as
math_min at the same place.
4.1.2. Generate new source (or a diff to an old one, does not matter),
where above rules are applied: globals aliased at the top of file (and
arranged neatly by groups), and variable names are replaced when
needed.

I agree, the simplest solution here would probably be to write
specialized code to meddle directly with source. Rename all global
variables, then update any aliases on the top with new ones. However,
I'm afraid of need to write such specialized code for each arbitrary
rule in my analyzer...

Alexander.



More information about the Metalua-list mailing list