Sophie: parrot-doc-1.6.0-1mdv2010.0 i586

parrot-doc-1.6.0-1mdv2010.0.i586.rpm

# $Id: intro.pod 38809 2009-05-16 01:33:10Z coke $

# One first version of this article was published on TPR 2.3
#
# Please feel free to edit it to suit latest parrot developments,
# and to be a good starting point for beginners.

=head1 Writing PIR

PIR (Parrot Intermediate Representation) is a way to program the
parrot virtual machine that is easier to use than PASM (Parrot
Assembler). PASM notation is like any other assembler-like format and
can be used directly, but it is more verbose and gives too much power
to the user. PIR abstracts common operations and conventions into a
syntax that more closely resembles a high-level language. PIR allows
the programmer to write code that more naturally expresses their
intent without worrying about setting up the exact details that PASM
requires to function properly.

This article will show the basics on programming in PIR. More advanced
topics will appear in later articles.

=head2 Getting Parrot

In order to test the PIR and PASM code in this article, a parrot virtual
machine is needed (henceforth just "parrot"). Parrot is available from
L<http://parrot.org>. Just download the latest release, or checkout
the current development version from the SVN tree. The programs in this
article were tested with Parrot 0.8.1.

Parrot is very easy to compile on unix-like and Microsoft Windows
operating systems: just run C<perl Configure.pl && make> in the root
directory of the parrot source and, if everything works correctly, a
C<parrot> executable should appear. At the moment of writing, the
C<make install> target does not work properly, so in this and other
articles it is assumed that the parrot executable is invoked from the
parrot root directory.

If you do not want to compile your own Parrot you can download a pre-compiled
binary from http://www.parrot.org/source.html.

=head2 Parrot Virtual Machine overview

Before we get started with the examples, here's a quick overview of
parrot's architecture.

Parrot is a register-based virtual machine. It provides 4 types of
registers. The register types are:

=over 4

=item I - integer

=item N - floating point

=item S - string

=item P - polymorphic container (PMC)

=back

In order to designate a register in PASM, use the character indicating
the type (C<I>, C<N>, C<S> or C<P>) and the register number. For instance,
in order to use register 10 of type integer, you'd write C<I10>. In this
series of articles, we will mainly focus on programming PIR.

In PIR, you would type the C<$> character in front of the register, to
indicate a I<virtual> register. For instance, the integer registers are
C<$I0>, C<$I1> and so on. The PMC registers hold arbitrary data objects
and are parrot's mechanism for implementing more complex behavior than the
ones that can be expressed using the other 3 register types alone.

A virtual register is mapped to an actual register by the register allocator.
You can use as many registers as you want, and the register allocator will
allocate them as needed.

PMCs will be covered in more detail in a future article. Examples in
this article will focus on the first 3 register types.

=head2 Simple Operators

Let me start with a simple and typical example:

=begin PIR

  .sub main :main
       print "hello world\n"
  .end

=end PIR

To run it, save the code in a C<hello.pir> file and pass it to the
parrot virtual machine:

   ./parrot hello.pir

Note that I am using a relative path to parrot given that I didn't
install it into the system.

The keywords starting with a dot (C<.sub> and C<.end>) are PIR directives.
They are used together to define subroutines. After the C<.sub> keyword
I use the name of the subroutine. The keyword that starts with a colon
(C<:main>) is a pragma that tells parrot that this is the main body of the
program and that it should start by executing this subroutine.
By the way, I could use C<.sub foo :main> and Parrot will use the C<foo>
subroutine as the main body of the program. The actual name of the
subroutine does not matter as long as it has the C<:main> pragma. If you
don't specify the <:main> pragma on any subroutine, then parrot will start
executing the first subroutine in the source file.
The full set of pragmas are defined in L<docs/pdds/pdd19_pir.pod>.

Before going into more details about subroutines and calling
conventions, let's compare some PIR syntax to the equivalent PASM.

If I want to add two integer registers using PASM I would use the
Parrot C<set> opcode to put values into registers, and the C<add>
opcode to add them, like this:

=begin PIR_FRAGMENT

   set $I1, 5
   set $I2, 3
   add $I0, $I1, $I2   # $I0 yields 5+3

=end PIR_FRAGMENT

PIR includes infix operators for these common opcodes. I could write
this same code as

=begin PIR_FRAGMENT

   $I1 = 5
   $I2 = 3
   $I0 = $I1 + $I2

=end PIR_FRAGMENT

There are the four arithmetic operators as you should be expecting, as
well as the six different comparison operators, which return a boolean
value:

=begin PIR_FRAGMENT

   $I1 = 5
   $I2 = 3
   $I0 = $I1 <= $I2   # $I0 yields 0 (false)

=end PIR_FRAGMENT

I can also use the short accumulation-like operators, like C<+=>.

Another PIR perk is that local variable names may be declared and used
instead of register names. For that I just need to declare the
variable using the C<.local> keyword with any of the four data
types available on PIR: C<int>, C<string>, C<num> and C<pmc>:

=begin PIR_FRAGMENT

   .local int size
   size = 5

=end PIR_FRAGMENT

Note that all registers, both numbered and named, are consolidated by
the Parrot register allocator, assigning these "virtual registers" to
actual registers as needed.  The register allocator even coalesces two
virtual names onto the same physical register when it can
prove that they have non-overlapping lifetimes, so there is no need to
be stingy with register names.  To see the actual registers used, use
C<pbc_disassemble> on the C<*.pbc> output. You can generate a Parrot
Byte Code (PBC) file as follows:

   ./parrot -o foo.pbc --output-pbc foo.pir

Then, use C<pbc_disassemble> in order to disassemble it:

   ./pbc_disassemble foo.pbc

=head2 Branching

Another simplification of PASM are branches. Basically, when I want to
test a condition and jump to another place in the code, I would write
the following PASM code:

=begin PASM

   le I1, I2, LESS_EQ
   # ...
 LESS_EQ:

=end PASM

Meaning, if C<$I1> is less or equal than C<$I2>, jump to label
C<LESS_EQ>. In PIR I would write it in a more legible way:

=begin PIR_FRAGMENT

   if $I1 <= $I2 goto LESS_EQ
   # ...
 LESS_EQ:

=end PIR_FRAGMENT

PIR includes the C<unless> keyword as well.

=head2 Calling Functions

Subroutines can easily be created using the C<.sub> keyword shown
before. If you do not need parameters, it is just as simple as I show in
the following code:

=begin PIR

  .sub main :main
     hello()
  .end

  .sub hello
    print "Hello World\n"
  .end

=end PIR

Now, I want to make my C<hello> subroutine a little more useful, such
that I can greet other people. For that I will use the C<.param>
keyword to define the parameters C<hello> can handle:

=begin PIR

  .sub main :main
     hello("leo")
     hello("chip")
  .end

  .sub hello
     .param string person
     print "Hello "
     print person
     print "\n"
  .end

=end PIR


If I need more parameters I just need to add more C<.param> lines.

To return values from PIR subroutines I use the C<.return> keyword,
followed by one or more arguments, just like this:

=begin PIR

  .sub _
    .return (10, 20, 30)
  .end

=end PIR

The calling subroutine can accept these values. If you want to retrieve
only one value (or only the first value, in case multiple values are
returned), write this:

=begin PIR_FRAGMENT

  $I0 = compute_it($I8, $I9)

=end PIR_FRAGMENT

To accept multiple values from such a function, use a parenthesized
results list:

=begin PIR_FRAGMENT

  ($I1, $I2, $I3) = compute_it($I8, $I9)

=end PIR_FRAGMENT

=head2 Factorial Example

Now, for a little more complicated example, let me show how I would
code Factorial subroutine:

=begin PIR

  .sub main :main
     $I1 = factorial(5)
     print $I1
     print "\n"
  .end

  .sub factorial
     .param int i
     if i > 1 goto recur
     .return (1)
  recur:
     $I1 = i - 1
     $I2 = factorial($I1)
     $I2 *= i
     .return ($I2)
  .end

=end PIR

This example also shows that PIR subroutines may be recursive just as in
a high-level language.

=head2 Named Arguments

As some other languages as Python and Perl support named arguments,
PIR supports them as well.

As before, I need to use C<.param> for each named argument, but you need to
specify a flag indicating the parameter is named:

=begin PIR

  .sub func
    .param int a :named("foo")
  .end

=end PIR

The subroutine will receive an integer named "foo", and inside of the
subroutine that integer will be known as "a".

When calling the function, I need to pass the names of the
arguments. For that there are two syntaxes:

=begin PIR_FRAGMENT

  func( 10 :named("foo") )    # or
  func( "foo" => 10 )

=end PIR_FRAGMENT

Note that with named arguments, you may rearrange the order of your
parameters at will.

=begin PIR

  .sub foo
    .param string a :named('name')
    .param int    b :named('age')
    .param string c :named('gender')
    # ...
  .end

=end PIR

This subroutine may be called in any of the following ways:

=begin PIR_FRAGMENT

  foo( "Fred", 35, "m" )
  foo( "gender" => "m", "name" => "Fred", "age" => 35 )
  foo( "age" => 35, "gender" => "m", "name" => "Fred" )
  foo( "m" :named("gender"), 35 :named("age"), "name" => "Fred" )

=end PIR_FRAGMENT

and any other permutation you can think of as long as you use the named
argument syntax. Note that any positional parameters must be passed
before the named parameters. So, the following is allowed:

=begin PIR

  .sub main
    .param int a
    .param int b :named("age")
    # ...
  .end

=end PIR

Whereas the following is not:

=begin PIR_INVALID

  .sub main
    .param int a :named("name")
    .param int b # cannot declare positional parameter after a named parameter
    # ...
  .end

=begin PIR_INVALID

It's also possible to use named syntax when returning values from
subroutines. Into the C<.return> command I'll use:

=begin PIR

  .return ( "bar" => 20, "foo" => 10)

=end PIR

and when calling the function, I will do:

=begin PIR

  ("foo" => $I0, "bar" => $I1) = func()

=end PIR

And C<$I0> will yield 10, and C<$I1> will yield 20, as expected.

=head2 Concluding

To conclude this first article on PIR and to let you test what you
learned, let me show you how to do input on PASM (hence, also in
PIR). There is a C<read> opcode to read from standard input.  Just
pass it a string register or variable where you wish the characters
read to be placed and the number of characters you wish to read:

=begin PIR

  read $S1, 100

=end PIR

This line will read 100 characters (or until the end of the line) and
put the read string into C<$S1>. In case you need a number, just
assign the string to the correct register type:

=begin PIR

  read $S1, 100
  $I1 = $S1

=end PIR

With the PIR syntax shown in this article you should be able to start
writing simple programs. Next article we will look into available
Polymorphic Containers (PMCs), and how they can be used.

=head2 Author

Alberto SimÃµes

=head2 Thanks

=over 4

=item * Jonathan Scott Duff

=back

=cut