# Copyright (C) 2008-2009, Parrot Foundation. # $Id: pdd29_compiler_tools.pod 38816 2009-05-16 04:08:22Z coke $ =head1 [DRAFT] PDD 29: Compiler Tools =head2 Version $Revision: 38816 $ =head2 Maintainer Will "Coke" Coleda Klaas-Jan Stol =head2 Definitions =head3 Compiler In this document, when we speak of a I<compiler>, we mean PCT-based compilers. =head3 HLL A High-Level Language. Examples are: Perl, Ruby, Python, Lua, Tcl, etc. =head2 Abstract This PDD specifies the Parrot Compiler Tools (PCT). =head2 Synopsis Creating a PCT-based compiler can be done as follows: =begin PIR .sub 'onload' :anon :load :init load_bytecode 'PCT.pbc' $P0 = get_hll_global ['PCT'], 'HLLCompiler' $P1 = $P0.'new'() $P1.'language'('Foo') $P1.'parsegrammar'('Foo::Grammar') $P1.'parseactions'('Foo::Grammar::Actions') .end .sub 'main' :main .param pmc args $P0 = compreg 'Foo' $P1 = $P0.'command_line'(args) .end =end PIR {{ this is the most important part; is this enough? }} The Parrot distribution contains a Perl script to generate a compiler stub, containing all necessary files. This generated compiler will compile out of the box. It is highly suggested to use this script to get started with the PCT. The script is located in C<tools/dev/mk_language_shell.pl>. {{ Not sure whether the mk_language_shell.pl script should be mentioned long term. In a sense, this script can also be considered part of the parrot compiler "tools", as it is used to create a compiler. }} =head3 Parser Synopsis grammar Foo is PCT::Grammar; rule TOP { <statement>* {*} } rule statement { <ident> '=' <expression> {*} } rule expression is optable { ... } proto infix:<+> is precedence('1') is pirop('n_add') { ... } rule 'term:' is tighter(infix:<+>) is parsed(&term) { ... } rule term { | <ident> {*} #= ident | <number> {*} #= number } =head3 Actions Synopsis {{ Is this a good idea? }} class Foo::Grammar::Actions; method TOP($/) { my $past := PAST::Block.new( :blocktype('declaration'), :node($/) ); for $<statement> { $past.push( $( $_ ) ); } make $past; } method statement($/) { make PAST::Op.new( $( $<ident> ), $( $<expression> ), :pasttype('bind'), :node($/) ); } method expression($/, $key) { ... } method term($/, $key) { make $( $/{$key} ); } =head3 Running the compiler Running the compiler is then done as follows: $ parrot foo.pbc [--target=[parse|past|post|pir]] <file> {{ other options? Maybe --target=pbc in the future, once PBC can be generated? }} =head2 Description The Parrot Compiler Tools are specially designed to easily create a compiler targeting the Parrot Virtual Machine. The tools themselves run on Parrot, which implies that no other external programs are needed. The PCT is a set of libraries and programs, to: =over 4 =item * create a parser =item * create an intermediate data structure (Abstract Syntax Tree) =item * generate executable Parrot code =back {{ Maybe just say it's used to create parrot-targeting compilers, not list these as =items }} =head2 Implementation The PCT is made up of the following libraries and programs: =over 4 =item * Parrot Grammar Engine (PGE) =item * Parrot Abstract Syntax Tree (PAST) classes =item * Parrot Opcode Syntax Tree (POST) classes =item * PCT::HLLCompiler class =item * PCT::Grammar class =back Although strictly speaking it is not part of the PCT, the Not Quite Perl (6) (NQP) language is typically used in all PCT-based compilers. NQP is a subset of the Perl 6 language, and is a high-level language as an alternative for PIR. =head3 Compilation phases A PCT-based compiler has by default four compilation phases, or I<transformations>. Phases can be removed and added through the API of the C<HLLCompiler> class. These are: =over 4 =item * source to parse tree The source is read, parsed and stored in a parse tree. =item * parse tree to PAST The parse tree is converted into a Parrot Abstract Syntax Tree. =item * PAST to POST The PAST is converted into a Parrot Opcode Syntax Tree. =item * POST to PIR The POST is converted into executable Parrot Intermediate Representation. =back =head4 Source to Parse Tree The first stage of a PCT-based compiler is done by the C<parser>. The parser is defined as a set of Perl 6 Rules, which is processed by the Perl 6 Rules compiler. This results in a generated PIR file that implements the parser. {{ Doesn't this make the Perl 6 rules compiler part of the PCT? }} During the first stage, the source (input string) is parsed, resulting in a C<parse tree>. =head4 Parse tree to PAST The second stage converts the parse tree into a Parrot Abstract Syntax Tree (PAST). PAST is a data structure consisting of PAST nodes, each of which represents a common HLL construct. While all languages differ in syntax, many constructs in different HLLs map to the same semantics. This second transformation is executed during the parse stage. The transformations of the parse tree nodes into PAST nodes is done by so-called parse actions, which are methods of a class that is specified through the C<parseactions> attribute of the HLLCompiler. Such classes are implemented in NQP. {{ How do we say that this is not obligatory; you could also use PIR, and in the future maybe other languages. }} =head4 PAST to POST The third transformation converts the PAST into a Parrot Opcode Syntax Tree (POST). PAST nodes represent HLL constructs, which are transformed into a set of low-level POST nodes. A POST node is a low-level node, representing a single instruction, label, or a subroutine. While a PAST is very close to a HLL program, a POST is much closer to PIR code. =head4 POST to PIR The last transformation generates PIR code from the POST. The generated PIR is then fed into the Parrot executable, and processed into Parrot Byte Code (PBC) by the PIR compiler. =head3 Parrot Grammar Engine The Parrot Grammar Engine (PGE) is a component that I<executes> regular expressions. Besides I<classic> regular expressions, it also understands Perl 6 Rules. Such rules are special regular expressions to define a grammar. The I<start> symbol in a grammar is named C<TOP>; this is the top-level rule that is executed when the parser is invoked. =head4 Operator precedence parsing {{ insert stuff about using an operator prec. table here }} =head3 Parrot Abstract Syntax Tree The PCT includes a set of PAST classes. PAST classes represent common language constructs, such as a C<while statement>. These are described extensively in L<docs/pdds/pdd26_ast.pod>. =head3 Parrot Opcode Syntax Tree =head4 POST::Node POST::Node is the base class for all other POST classes. =head4 POST::Op =head4 POST::Ops =head4 POST::Label =head4 POST::Sub =head3 PCT::Grammar The class C<PCT::Grammar> is a built-in grammar class that can be used as a parent class for a custom grammar. This class defines a number of rules and tokens that are inherited by child classes. Note that the concept of C<class> and C<grammar> are equivalent. The following rules are predefined: {{ is this necessary, or just a reference to the file? }} =over 4 =item ident =item ws =back =head3 PCT::HLLCompiler All PCT-based compilers use a HLLCompiler object as a compiler driver. It acts as a I<facade> for the compiler. This object invokes the different compiler phases. =head4 HLLCompiler API Methods {{ TODO: complete this }} =over 4 =item language =begin PIR_FRAGMENT $P0.'language'('Foo') =end PIR_FRAGMENT =item parsegrammar =begin PIR_FRAGMENT $P0.'parsegrammar'('Foo::Grammar') =end PIR_FRAGMENT =item parseactions =begin PIR_FRAGMENT $P0.'parseactions'('Foo::Grammar::Actions') =end PIR_FRAGMENT =item commandline_prompt =begin PIR_FRAGMENT $P0.'commandline_prompt'($S0) =end PIR_FRAGMENT sets the string in C<$S0> as a commandline prompt on the compiler in C<$P0>. The prompt is the text that is shown on the commandline before a command is entered when the compiler is started in interactive mode. =item commandline_banner =begin PIR_FRAGMENT $P0.'commandline_banner'($S0) =end PIR_FRAGMENT sets the string in C<$S0> as a commandline banner on the compiler in C<$P0>. The banner is the first text that is shown when the compiler is started in interactive mode. This can be used for a copyright notice or other information. =back =head2 Attachments None. =head2 References docs/pdd26_ast.pod http://dev.perl.org/perl6/doc/design/syn/S05.html =cut __END__ Local Variables: fill-column:78 End: