Sophie: parrot-doc-1.6.0-1mdv2010.0 i586

parrot-doc-1.6.0-1mdv2010.0.i586.rpm

# Copyright (C) 2008, Parrot Foundation.
# $Id: tutorial_episode_8.pod 36833 2009-02-17 20:09:26Z allison $

=head1 Episode 8: Hashtables and Arrays

Welcome to Episode 8! This is the second-last episode in this tutorial.
After this episode, we'll have a complete implementation of our Squaak language.
This episode focuses on aggregate data structures: arrays and hashtables. We'll
discuss the syntax to assign to them and to construct them. We'll see that
implementing the action methods is really easy, almost trivial. After that,
we'll make some notes on aggregates as arguments, and how they differ from the
basic data types when passing them around as subroutine arguments.

=head2 Arrays and Hashtables

Besides basic data types such as integer, floating-point and string, Squaak has
two aggregate data types: array and hashtable. An array is an object that can
store a sequence of values. The values in this sequence can be of different
types, unlike some languages that require all elements of an array to be the
same type. An example of using arrays is shown below:

 grades[0] = "A"
 grades[1] = "A+"
 grades[2] = "B+"
 grades[3] = "C+"

A hashtable stores key-value pairs; the key is used as index to store a value.
Keys must be string constants, but the value can be of any type. An example is
shown below:

 lastnames{"larry"}   = "wall"
 lastnames{"allison"} = "randal"

=head3 Array constructors

Just as there are integer literals (42) and string literals ("hello world")
that can be assigned to variables, you can have array literals. Below is the
grammar rule for this:

 rule array_constructor {
     '[' [ <expression> [',' <expression>]*]? ']'
     {*}
 }

Some examples are shown below:

 foo = []
 bar = [1, "hi", 3.14]
 baz = [1, [2, 3, 4] ]

The first example creates an empty array and assigns this to foo. The second
example shows the construction of three elements, assigning the array to bar.
Note that the elements of one array can be of different types. The third example
shows the construction of nested arrays. This means that element baz[1][0]
evaluates to the value 2 (indexing starts at 0).

=head3 Hashtable constructors

Besides array literals, Squaak supports hashtable literals, that can be
constructed through a hashtable constructor. The syntax for this is expressed
below:

 rule hash_constructor {
     '{' [<named_field> [',' <named_field>]* ]? '}'
     {*}
 }

 rule named_field {
     <string_constant> '=>' <expression>
     {*}
 }

Some examples are shown below:

    foo = {}
    bar = { "larry" => "wall", "allison" => "randal" }
    baz = { "a" => { "b" => 42} }

The first line creates an empty hashtable and assigns this to foo. The second
creates a hashtable with two fields: "larry" and "allison". Their respective
values are: "wall" and "randal". The third line shows that hashtables can be
nested, too. There, a hashtable is constructed that has one field, called "a",
and its value is another hashtable, containing a field "b" that has the value
42.

=head2 Implementation

You might think implementing support for arrays and hashtables looks rather
difficult. Well, it's not. Actually, the implementation is rather
straightforward. First, we're going to update the grammar rule for primary:

 rule primary {
     <identifier> <postfix_expression>*
     {*}
 }

 rule postfix_expression {
     | <index> {*}   #= index
     | <key> {*}     #= key
 }

 rule index {
     '[' <expression> ']'
     {*}
 }

 rule key {
     '{' <expression> '}'
     {*}
 }

A primary object is now an identifier followed by any number of
postfix-expressions. A postfix expression is either a hashtable key or an array
index. Allowing any number of postfix expressions allows to nest arrays and
hashtables in each other, allowing us to write, for instance:

 foo{"key"}[42][0]{"hi"}

Of course, you as a Squaak programmer must make sure that foo is actually a
hashtable, and that foo{"key"} yields an array, and so forth. Implementing this
is actually quite simple. First, let us see how to implement the action method
index.

 method index($/) {
     my $index := $( $<expression> );
     my $past  := PAST::Var.new( $index,
                                 :scope('keyed'),
                                 :viviself('Undef'),
                                 :vivibase('ResizablePMCArray'),
                                 :node($/) );

     make $past;
 }

First, we retrieve the PAST node for expression. Then, we create a keyed
variable access operation, by creating a PAST::Var node and setting its scope
to C<keyed>. If a C<PAST::Var> node has keyed scope, then the first child is
evaluated as the aggregate object, and the second child is evaluated as the
index on that aggregate.

But wait! The C<PAST::Var> node we just created has only one child!

Here's where the updated action method for primary comes in.
This is shown below.

 method primary($/) {
     my $past := $( $<identifier> );

     for $<postfix_expression> {
         my $expr := $( $_ );
         $expr.unshift( $past );
         $past := $expr;
     }

     make $past;
 }

First, the PAST node for identifier is retrieved. Then, for each
postfix-expression, we get the PAST node, and unshift the (current) C<$past>
onto it. Effectively, the (current) $past is set as the first child of C<$expr>.

And you know what $expr contains: that's the keyed variable access node, that
was created in the action method index.
After that, C<$past> is set to C<$expr>; either there's another
postfix-expression, in which case this $past will be set as the first child of
that next postfix-expression, or, the current $past is set as the result object.

=head2 Implementing Constructors

To implement the array and hashtable constructors, we're going to take advantage
of Parrot's Calling Conventions (PCC). The PCC supports, amongst others,
optional parameters, named parameters and slurpy parameters. If you're Dutch,
you might think that slurpy parameters make a lot of noise ("slurpen" is a
Dutch verb meaning drinking carefully, which you usually do if your beverage
is hot, making noise in the process), but you would be wrong. Slurpy parameters
will store all remaining arguments that have not yet been stored in other
parameters (implying that there can only be one slurpy (positional) parameter,
and it should come after all normal (positional) parameters). Parrot will
automatically create an aggregate to store these remaining arguments. Besides
positional slurpy parameters, you can also define a named slurpy parameter,
which will store all remaining named parameters, after all normal (named)
arguments have been stored.

You might be confused by now.

Let's look at an example, as this issue is worth a few brain cells to store.

 .sub foo
     .param pmc a
     .param pmc b
     .param pmc c :slurpy
     .param pmc k :named('x')
     .param pmc l :named('y')
     .param pmc m :named :slurpy

 .end

 foo(1, 2, 3, 4, 6 :named('y'), 5 :named('x'), 7 :named('p'), 8 :named('q') )

This will result in the following mapping:

 a: 1
 b: 2
 c: {3, 4}
 k: 5
 l: 6
 m: {"p"=>7, "q"=>8}

So, after the positional parameters (a, b), c is declared as a slurpy
parameters, storing all remaining positional parameters. Parameters k and l are
declared as named parameters, which have the respective names "x" and "y".
Using these names, values can be passed. After the named parameters, there's
the parameter m, which is both flagged as named and slurpy. This parameter will
store all remaining named arguments that have not yet been stored by the normal
named parameters.

The interesting parameters for us are "c" and "m". For the positional slurpy
parameter, Parrot creates an array, while for the named slurpy parameter a
hashtable is created. This happens to be exactly what we need! Implementing
the array and hash constructors becomes trivial:

 .sub '!array'
     .param pmc fields :slurpy
     .return (fields)
 .end

 .sub '!hash'
     .param pmc fields :named :slurpy
     .return (fields)
 .end

Array and hashtable constructors can then be compiled into subroutine calls to
the respective Parrot subroutines, passing all fields as arguments. (Note that
these names start with a "!", which is not a valid Squaak identifier. This
prevents us from calling these subs in normal Squaak code).

=head2 Basic data types and Aggregates as arguments

All data types, both basic and aggregate data types are represented by Parrot
Magic Cookies (PMCs). The PMC is one of the four built-in data types that Parrot
can handle; the others are integer, floating-point and string. Currently, the
PCT can only generate code to handle PMCs, not the other basic data types.
Parrot has registers for each its four built-in data types. The integer,
floating-point and string registers store the actual data value, while PMC
registers store a reference to the PMC object. This has consequences for how
PMCs are handled when passing them as arguments. When passing a PMC as an
argument, the invoked subroutine gets access to the PMC reference; in other
words, PMCs are passed by reference. This means that the subroutine can change
the original argument that was passed by the caller. Of course, it depends what
instructions are being generated, what the invoked subroutine does to the
references.
In Squaak, when passing basic data values, these cannot be changed by the
invoked subroutine. When assigning a new value to a parameter, a whole new
object is created and bound to the parameter identifier. No changes are made to
the original argument.
Aggregate data types are handled differently, however. When an invoked
subroutine assigns to an index or hashtable field of a parameter, then the
original argument is affected.
In other words, basic data types have by value semantics, while aggregate data
types have by reference semantics. A short example to demonstrate this:

 sub foo(a,b,c)
     a       = 42
     b[0]    = 1
     c{"hi"} = 2
 end

 var a = 0
 var b = []
 var c = {}
 foo(a,b,c)
 print(a, b[0], c{"hi"} ) # prints 0, 1, 2

=head2 What's Next?

This was the last episode to discuss implementation details to make Parrot
(run) Squaak. After doing this episode's exercises, your implementation should
be fairly complete. Next episode will be the last of this series, in which
we'll recap what we did, and demonstrate our language with a nice demo program.

=head2 Exercises

=over 4

=item *

We've shown how to implement keyed variable access for arrays, by implementing
the action method for index. The same principle can be applied to keyed access
for hashtables. Implement the action method for key.

=item *

Implement the action methods for array_constructor and hash_constructor. Use a
C<PAST::Op> node and set the pasttype to 'call'. Use the "name" attribute to
specify the names of the subs to be invoked (e.g., C<:name("!array")> ). Note
that all hash fields must be passed as named arguments. Check out PDD26 for
doing this, and look for a "named " method.

=item *

We'd like to add a little bit of syntactic sugar for accessing hashtable keys.
Instead of writing foo{"key"}, I'd like to write foo.key. Of course, this only
works for keys that do not contain spaces and such. Add the appropriate grammar
rule (call it "member") that enables this syntax, and write the associated
action method. Make sure this member name is converted to a string.

Hint: use a C<PAST::Val> node for the string conversion.

=back


=head2 Solutions to the exercises

=over 4

=item 1

    method key($/) {
    my $key := $( $<expression> );

    make PAST::Var.new( $key, :scope('keyed'),
                              :vivibase('Hash'),
                              :viviself('Undef'),
                              :node($/) );
    }

=item 2

    method named_field($/) {
        my $past := $( $<expression> );
        my $name := $( $<string_constant> );
        ## the passed expression is in fact a named argument,
        ## use the named() accessor to set that name.
        $past.named($name);
        make $past;
    }

    method array_constructor($/) {
        ## use the parrot calling conventions to
        ## create an array,
        ## using the "anonymous" sub !array
        ## (which is not a valid Squaak name)
        my $past := PAST::Op.new( :name('!array'),
                                  :pasttype('call'),
                                  :node($/) );
        for $<expression> {
            $past.push($($_));
        }
        make $past;
    }

    method hash_constructor($/) {
        ## use the parrot calling conventions to
        ## create a hash, using the "anonymous" sub
        ## !hash (which is not a valid Squaak name)
        my $past := PAST::Op.new( :name('!hash'),
                                  :pasttype('call'),
                                  :node($/) );
        for $<named_field> {
            $past.push($($_));
        }
        make $past;
    }

=item 3

    rule postfix_expression {
        | <key> {*}         #= key
        | <member> {*}      #= member
        | <index> {*}       #= index
    }

    rule member {
        '.' <identifier>
        {*}
    }

    method member($/) {
        my $member := $( $<identifier> );
        ## x.y is syntactic sugar for x{"y"},
        ## so stringify the identifier:
        my $key    := PAST::Val.new( :returns('String'),
                                     :value($member.name()),
                                     :node($/) );

        ## the rest of this method is the same
        ## as method key() above.
        make PAST::Var.new( $key, :scope('keyed'),
                                  :vivibase('Hash'),
                                  :viviself('Undef'),
                                  :node($/) );
    }

=back


=cut