[Metalua-list] Re: metalua source parsing

Alexander Gladysh agladysh at gmail.com
Sun Sep 14 06:07:59 GMT+2 2008


On Sun, Sep 14, 2008 at 2:21 AM, Fabien Fleutot <fleutot at gmail.com> wrote:
>
>> To me [being used as a lua code analyzer]  seems to be the most
>>
>> direct way for metalua to spread and to mature.
>
> I wholeheartedly agree! What a shame that it isn't always the funniest kind
> of code to write ;-)

:-) Please note that I'm not saying that you should abandon
everything, and start writting AST-source transformation support. I'm
not even implying that you would write anything on the subject anytime
soon. We're merely discussing the feature.

> Coming back to your concerns with AST->SRC transformation, let me repeat
> that with the sources and the AST, you do have both a manipulable form and
> its source representation:
>
> src = "for i = 1, --[[FOO]] 10 do\n\tbar(i)\nend; print 'hi'"
> ast = mlc.ast_of_luastring (src)
>
> -- ast looks like this, omitting lineinfo:
> -- { `Fornum{ `Id "i", `Number 1, `Number 10, { `Call{ `Id "bar", `Id "i" }
> } },
> --   `Call{ `Id "print", `String "hi" } }
>
> -- What's the loop body's structure?
> loop_body = ast[1][4]
> table.print(loop_body, 'nohash')
>
> -- What's the source representation of the loop's body?
> print(src:sub(loop_body.lineinfo.first[3], loop_body.lineinfo.last[3]))
>
> -- What's the kind of quote used for the 'hi' string in the final statement?
> hi_string = ast[2][2]
> local i = hi_string.lineinfo.first[3]
> hi_string_delimiter = src:sub(i, i)

Okay, maybe it is not as scary. :) Probably, one can even attribute
AST on the user side right after code is loaded (not sure it would be
good yet). A question: may I take random AST node and store some user
data there? What is the "official" approach for this?

Note BTW, that to get, say, position of "do" in the loop, user would
have to re-parse it again, and metalua would not help him as it does
not store needed info. (Maybe one can overload lexer somehow?) What if
I've changed loop expression, and want "do" to stay where it were if
possible?

> Really I don't think that detailed informations in a 'hairy field' would be
> more practical. I would personnaly rather extract whatever info I need from
> the sources with lineinfo, as above, rather than remembering how the 'hairy
> field' describes the difference between "function foo() end" and "foo =
> function() end".

I intend to do some prototyping in the next free metalua time slot,
then I would hopefully come with some arguments on one side or the
other. :-)

> I can't help but requiring that people know AST structure to do interesting
> stuff, but I really don't want to force them to memorize any additional
> baroque API.

Note that this API would be required for special cases only -- that
is, for *beautifying* AST-source transformation. For the rest of cases
existing data should be enough.

What would I expect from such API? All I need is a configurable
AST-to-source renderer (note that I would expect the same from decent
splice-based API):

1. Old nodes rendered as they was in original.
2. I can change representation of old nodes if needed (perhaps by
resetting them to "new").
3. I can fully control representation of new nodes if I wish.
4. Or I may let renderer to render given nodes according to configured
"coding guidelines" (positioning) rules. If I need canonization, I
simply reset positioning on whole tree and re-render it with needed
config.
5. Relative positioning inside old node subtrees stay the same if I
move subtree somewhere else in code. I also may indent/unindent
subtree freely.

Note that having these features available (along with the rest of
metalua), one may create a great refactoring-aware Lua editor!

<...>

> Supposing that the comment kind field wasn't available, and that a comment's
> lineinfo pointed on the whole comment (including delimiters) rather than
> only its content (a design mistake I'm going to fix), we could do this:
>
> -- Return the kind of a comment: either 'short',
> -- or a number indicating the number of '=' signs used in a long comment.
> function comment_kind(comment, src)
>    local txt, first, last = unpack(comment)
>    local src_com = src:sub(first, last)
>    if src_com :match "^%-%-[^[]" then return 'short' end
>    else return #(src_com :match "^%-%-%[(=*)%[") end
> end

It should work, but I worry about having to re-parse sources by
*custom-written* user code. I may agree with double parsing in general
for the simplicity sake (still, this would affect performance), but to
force user to duplicate existing lexer code...

<...>

>> Also we should handle stuff like do ;;;;;; end.
>
> There's only one way to handle that :)
>
> fabien at MacFabien:~/scratch$ lua
> Lua 5.1.3  Copyright (C) 1994-2008 Lua.org, PUC-Rio
>> do ;;; end
> stdin:1: unexpected symbol near ';'

Great! All hail to the Lua authors!

>> Perfect code analysis library would let its user to change the AST and
>> then would automagically generate adequate output given a set of code
>> guidelines for new code generation. It is a question of complexity of
>> that "automagic" for the both cases.

> Depending on your rule set and the level of OCD of its author:
> * Either it's a beautifier which simply fixes and changes a couple of points
> (then, you modify the appropriate nodes and weave them back in place,
> leaving the rest of the code intact),
> * or it's a canonizer, and then it can be computed from AST without looking
> at the sources at all.

These are two different tasks both needed at the same time. Note that
beautifying part may also be used in refactoring-aware editor.

>> I agree, the simplest solution here would probably be to write
>> specialized code to meddle directly with source. Rename all global
>> variables, then update any aliases on the top with new ones. However,
>> I'm afraid of need to write such specialized code for each arbitrary
>> rule in my analyzer...
>
> You "only" need to invest a bit of time to get proficient with the walker
> library. It's a complex lib, but that's because it does intrinsically
> complex stuff. The walk.id sub-library is even more scary at first, but it's
> variable aware: it properly handles variable scopes, shadowing, knows which
> vars are local or global, etc. These are the kind of stuff that you won't
> get right with a naive source muncher.

I'm not scared by walking the tree. I'm (was) scared by rendering it
back to sources.

Alexander.



More information about the Metalua-list mailing list