Omgili, forum search, forums search, search forums, discussion search,discussions search, search discussions, board search, boards search, search boards
  Advanced Search

Re: compiler and metadata, request opinions...

On Sat, 25 Apr 2009 13:15:03 -0700, "cr88192" <...@hotmail.com

"Hans-Peter Diettrich" <...@aol.com
writing 3 parsers would mean maintaining 3 parsers, which is unecessary
since most of the syntax is common between the languages...

I have mostly been developing a "common superset" approach.

there are actually several different types of classes and structs:
struct/union: good old C struct/union;
struct/union(1): shared between C++ classes/structs, and C# structs
(currently N/A in Java);
class: C#/Java class, '__gc class' (or '__class') in C++ (C++ defaults to
'__nogc class');
interface: C#/Java interface, exported as '__interface' in C++.

1: these use the same tags at present, but are structurally different (using
different tags may be a good idea here, but at present they are recognized
by the structural difference in the ASTs).

there are different flags and flag semantics, which have not as of yet been
addressed.

this area is the point of greatest divergence in the current parsing and
processing logic...
another area is in the handling of namespaces (not fully resolved thus far).

the compiler will presently allow things to be done which are technically
not allowed in the respective languages:
using namespaces as an import mechanism in C++ (though, unless supported
explicitly, this would not allow importing types);
declaration of top-level and namespace-scoped variables and funtions in C#;
Java and C# both include a textual preprocessor;
...

information which describes things like:
all of the namespaces, classes (and class layouts), interfaces, functions
and signatures, ...

all of this stuff needs to be available for the runtime and compilers to
work properly (in part due to C# and Java not using the "include
teh-crapload of text" approach taken by C and C++...).

it is the same sort of thing which .NET drags along with its assemblies.
in Java (in the "proper"/JVM sense), this info is usually stored in the
class files along with the bytecode.

originally, I had wanted to store all of this in the object files, and so
when linked all this info would be conviniently embedded in the image along
with all the other code and data.

but, as a consequence of certain things being done at link time, and linking
being incremental in my framework, this approach could not be used (the
metadata would then need to be in a form which can be accessed apart from
having to link the image).

note that unlike in a more traditional C++ compile/link process, a lot of
info (such as the physical in-memory layout of objects) is not directly
handled by the compiler, but is instead left to dynamic link-time (OTOH, C
structs/unions are fixed at compile time).

I think something like this is likely needed to be able to compile a Java or
C#-like language to native-code object files (either that, or creating a
custom object format which behaves similarly to Java class files, rather
than acting like good old COFF or ELF...).

actually, I could embedd a lot of this kind of data in COFF or ELF files via
the use of special purpose sections, but this would require a little work
(and further creation of special linking tools, as almost invariably linking
it with something like GNU-LD would mess everything up...). as is, partial
linking via LD would be allowed (although there is not much reason to do
so...), but at the cost that if the tables are misplaced, it may not be
possible to properly link or load the code...

actually, by the time most of the metadata much comes into question, the
compiler is out of the process (the compiler runs, and spews out object code
and tables).

the linker and runtime use this information, but are physically disjoint
from the compiler.
(the compiler may also use some of this info from libraries, but mostly to
answer really basic questions like "is 'Foo' a class?", "what is the type of
Bar.z?", ...).

this issue, however, does make implementing templates/generics look a little
scary (since it is not entirely clear how to instantiate a generic without
having to call back into the compiler, which I regard as ugly...).

but, at least on the upside:
by the time the machinery will be in place for instantiating generics, the
machinery would also be in-place for handling expression-level eval (at
present, 'eval' can only be done at the module or function level...).



On Mon, 27 Apr 2009 14:08:53 +0200, Hans-Peter Diettrich <...@aol.com

cr88192 schrieb:

Parsers usually have to deal with semantics (for disambiguation...) as
well, in detail with context sensitive C-ish languages.

DoDi

On Tue, 28 Apr 2009 22:46:54 -0700, "cr88192" <...@hotmail.com

"Hans-Peter Diettrich" <...@aol.com
My parser does close to the minimum required to get the code parsed
(it handles declarations and typedefs, but little beyond
this). everything else goes into the AST, which in my case is
represented in an XML-based form (fairly similar to DOM). (I actually
prefer context-independent ASTs, but C can't be parsed in a
context-independent manner).

a lot of the rest of the issues (semantics, ...) are handled by the upper
compiler, which convert the AST's into the IL (an RPN-based IL I call
RPNIL). the ASTs recieved by the upper compiler are still mostly language
specific (apart from the large amount of comon features which exist between
the languages involved), and the upper compiler is aware which input
language is being used.

RPNIL no longer knows or cares what the input language is, as by the this
point the semantics are presumably normalized...