[Metalua-list] Re: metalua source parsing

Fabien Fleutot fleutot at gmail.com
Sat Sep 13 17:06:01 GMT+2 2008


Context: we've had a private discussion with Alexander which might interest
more people than us. Here's a sum-up, which I hope is reasonnably faithful.

Fabien wrote:

lineinfo fields, describing the positions of lexer tokens in the sources as
well as their surrounding comments, have been significantly changed in the
HEAD version of metalua. This is intended to help development of (lua or
metlua) source analysis tools with metalua.

current lineinfo format:
* each lexer token and AST node has a lineinfo field, which contains two
entries 'first' and 'last', each describing a source position.
* A source position has the usual 'line', 'column', 'char' fields, plus the
name of the source file, and optionally a 'comments' field.
* lineinfo.first.comments describes comments before the node, and
lineinfo.last.comments describes the comments after the point. This way you
can access comments after a node as well as those before it.

Now an unimplemented proposal:

I'm thinking that maybe it should be pushed even a bit further. One could
simply think in terms of 'spacings', which would be the (possibly empty)
sequence of spaces and comments between tokens. A spacing would be described
by:
- its initial and final positions, in terms of line / column / char /
filename,
- plus the comments embedded in that space.
Each token and AST node would then reference the spacing before and after
it.

Current version:

`Foo{ "bar", lineinfo = {
   first = { 1, 2, 3, "src.lua", comments = { ... } },
   last = { 4, 5, 6, "src.lua", comments = { ... } } } }

Proposed version:

`Foo{ "bar", lineinfo = {
   before = { -- spacing structure
      first = { 1, 2, 3, "src.lua" },
      last = { 4, 5, 6, "src.lua" },
      comments = { ... } },
   after = { -- spacing structure
      first = { 1, 2, 3, "src.lua" },
      last = { 4, 5, 6, "src.lua" },
      comments = { ... } } }

Alexander wrote:

I think that the ultimate goal is to allow user to reconstruct the
original source file from the AST as precisely as possible. (BTW, that
implies at least storing of quote types and of number of equal signs
inside long string/comment marks.)

For example, I'd like to automatically generate patches for my
(limited) lint tool for the cases where broken rule fix is obvious.
Now I would have to read the file again by hand and to do some
non-trivial manipulations.

If all I'd have to do is to change AST and then to 'render' it back to
source -- that would simplify my task greatly.

Think also about source code beautifiers like Artistic Style for C. It
would be great to have only write some AST data meddler to get such
tool.

However, such advanced code transformation functionality is usually
not needed when all we want is to do some AST transformation before
byte-code generation (I mean some metalua macro). It should be either
light enough or allow to be switched off.

Couldn't we include 'whitespace' stuff (including comments) into AST
as some (pseudo-)nodes?

[About the alternative proposal]

>From the practical point of view:

1. I like that before and after fields.
2. I'd join first and last tables, and moved source field somewhere
higher. A single tag can not start in one source and end in an
another, right?
3. Comments field looks fine from the first glance.

So:

> `Foo{ "bar", lineinfo = {
>    src = "src.lua",
>    before = { -- spacing structure
>       1, 2, 3,
>       4, 5, 6,
>       comments = { ... } },
>    after = { -- spacing structure
>       1, 2, 3,
>       4, 5, 6,
>       comments = { ... } } }

Position info is slightly less readable now. If this is a problem,
maybe we should use some string keys? Say:

> `Foo{ "bar", lineinfo = {
>    src = "src.lua",
>    before = { -- spacing structure
>       fl = 1, fc = 2, fn = 3,
>       ll = 4, lc = 5, ln = 6,
>       comments = { ... } },
>    after = { -- spacing structure
>       fl = 1, fc = 2, fn = 3,
>       ll = 4, lc = 5, ln = 6,
>       comments = { ... } } }

(Or use more verbose names.)

Still, I'd think more about why the spacing would be this way and not
the other, and how comfortable such implementation would be for the
intended use cases.


I think that maybe even some `Comment tag would be enough. We may
assume that whitespace kind (tabs vs. space), trailing whitespaces and
line ending types are irrelevant enough to be not saved anywhere.
We're not writing makefiles after all...

[...]

If we would add `Comment tag and some additional info on sources (like
optional quote type field on strings), I think it would be possible to
implement precise enough the source-AST-source conversion.

Storing quote types and equal sign number storing would make life
simpler anyway, because otherwise to render string or comment contents
one would have to parse it. If for strings it may be helped by
built-in %q format, there are no such tool for comments.

[...]

[ Follows some reflections about the numerous cases where it would be hard
to keep enough information in the AST to allow the regeneration of the
original source file ]

My answers in the next mail :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.luaforge.net/pipermail/metalua-list/attachments/20080913/24ea3013/attachment.htm


More information about the Metalua-list mailing list