This appendix provides a short introduction to the source code of the
Asterix compiler. Figure 2 shows the structure of the compiler,
which is in part generated by tools (flex and yacc), and the transformation
process from Asterix source to executable. The operator
denotes compilation and linking by the C compiler (gcc).
int main(int argc, char **argv) { int yyparse(void); init(argc, argv); yyparse(); semantic_analyses(); if (!error_seen) code_generation(); terminate(); return error_seen ? 1 : 0; }
Since the lexical analysis code is called by the parser, the first relevant call in main() is the one to yyparse(), the parser generated by Bison. The result of the semantic actions during parsing is an AST, containing a list of global declarations, which is stored in the variable global_declarations. Next, semantic_analyses() (invoking type_check_declaration_list(global_declarations)) and code_generation() (invoking generate_declaration_list(global_declarations)) are called to perform the semantic analyses and code generation, respectively. These functions start a traversal of the AST by recursively calling other functions to process the branches. The code for the type_check_ and generate_ functions is stored in a number of .c files, for instance, declaration.c for everything to do with declarations. These files also contain functions to create and delete the structs that make up the AST.
For the first assignment it is important to understand the way Flex and Bison depend on each other. This is summarized in the following table.
Tool | Source | Generates | Depends on |
---|---|---|---|
Flex | lex.l | lex.yy.c | yylval variable |
containing yylex() | token #define's | ||
and yytext | |||
Bison | grammar.y | grammar.tab.c | yylex() |
containing yyparse() | |||
and yylval | |||
grammar.tab.h | |||
containing token #define's |
The code generated by Bison (yyparse()) calls the yylex() function, which is generated by Flex. Flex also creates the yytext variable, containing the string representation of the last recognized token. In a bottom up parser like Bison, all tokens in a rule are consumed before the rule is reduced. Since we may need the actual text of the token (or the position in the source file), Bison creates the variable yylval, in which the C code in the lex file can store this information. The new_token() function in lex.l allocates a structure containing the line number and text of the token. Bison then stores this in the $n parameters, which can be used in the Bison code. <445>> In a top down parser, tokens are consumed one by one as we process a rule, so we do not need this mechanism. We can just use new_token() where we need and yytext will contain the last recognized token. Apart from this, the picture when we replace Bison with LLnextgen is similar:
Tool | Source | Generates | Depends on |
---|---|---|---|
Flex | lex.l | lex.yy.c | token #define's |
containing yylex() | |||
and yytext | |||
LLnextgen | grammar.g | grammar.c | yylex() |
containing yyparse() | |||
grammar.h | |||
containing token #define's |
Note that the names of the generated files are different.
#VERSION = Bison | |
VERSION = LLnextgen |
Note that the Makefile assumes the grammar file for LLnextgen to be called grammar.g.
This results in two files containing C code: cb.c and cb.h. We can compile these with a normal C compiler like gcc to create an executable file. The compiler also has a run time system, containing code on which cb.c depends. The runtime system consists of the files rts.* and libcb.*. We need to link these files when compiling cb.c. For convenience a script called cbc is available that performs both steps. It compiles an Asterix source file into a.out.
<439>>
rulea( RuleANode **node ) { RuleBNode *nodeb; }: ruleb( &nodeb ) { *node = new_ruleanode( nodeb ); } ;
An exception to this approach are rules that produce a list. The easiest way to implement these is using two rules, one for the list and one for each individual item. The list rule creates a List node, and passes this to the item rule, which appends the item to the list instead of returning it through a pointer parameter.
rulelist( List **list ): { *list = new_list(); } ruleitem( *list )* ; ruleitem( List *list ) { RuleItemNode *item; } : { item = new_rule_item(); list_append( list, item ); } ;
See list.h for details on the available list operations.
Finally, the following table summarizes the purpose of each source file.
lex.l | Flex source file |
lex.yy.c | Flex output file |
grammar.y | Bison source file |
grammar.tab.* | Bison output files |
grammar.g | LLnextgen source file |
grammar.[ch] | LLnextgen output files |
declaration.* | Compiler sources |
expression.* | |
list.* | |
statement.* | |
type.* | |
scope.* | |
token.* | |
error.* | |
io.* | |
salloc.* | |
bool.h | |
lex.h | |
main.* | |
libcb.* | Run time system |
rts.* |
K.G. Langendoen 2006-12-12