Sophie: parrot-doc-1.6.0-1mdv2010.0 i586

parrot-doc-1.6.0-1mdv2010.0.i586.rpm

# Copyright (C) 2001-2006, Parrot Foundation.
# $Id: jit.pod 37201 2009-03-08 12:07:48Z fperrad $

=head1 NAME

docs/jit.pod - Parrot JIT Subsystem

=head1 ABSTRACT

This PDD describes the Parrot Just In Time compilation subsystem.

=head1 DESCRIPTION

The Just In Time, or JIT, subsystem converts a bytecode file to native machine
code instructions and executes the generated instruction sequence directly.

=head1 IMPLEMENTATION

Currently works on B<ALPHA>, B<Arm>, B<Intel x86>, B<PPC>, and B<SPARC version
8> processor systems, on most operating systems.  Currently only 32-bit INTVALs
are supported.

The initial step in generating native code is to invoke B<Parrot_jit_begin>,
which generally provides architecture specific preamble code.  For each parrot
opcode in the bytecode, either a generic or opcode specific sequence of native
code is generated.  The F<.jit> files provide functions that generate native
code for specific opcode functions, for a given instruction set architecture.
If a function is not provided for a specific opcode, a generic sequence of
native code is output which calls the interpreter C function that implements
the opcode.  Such opcode are handled by B<Parrot_jit_normal_op>.

If the opcode can cause a control flow change, as in the case of a branch or
call opcode, an extended or modified version of this generic code is used that
tracks changes in the bytecode program counter with changes in the hardware
program counter.  This type of opcode is handled by B<Parrot_jit_cpcf_op>.

While generating native code, certain offsets and absolute addresses may not be
available.  This occurs with forward opcode branches, as the native code
corresponding to the branch target has not yet been generated.  On some
platforms, function calls are performed using program-counter relative
addresses.  Since the location of the buffer holding the native code may move
as code is generated (due to growing of the buffer), these relative addresses
may only be calculated once the buffer is guaranteed to no longer move.  To
handle these instances, the JIT subsystem uses "fixups", which record locations
in native code where adjustments to the native code are required.

=head1 FILES

=over 4

=item jit/${jitcpuarch}/jit_emit.h

This file defines B<Parrot_jit_begin>, B<Parrot_jit_dofixup>,
B<Parrot_jit_normal_op>, B<Parrot_jit_cpcf_op>, B<Parrot_jit_restart_op> and
optionally B<Parrot_jit_vtable*_op>.  In addition, this file defines the macros
and static functions used in F<.jit> files to produce binary representations of
native instructions.

For moving registers from processor to parrot and vice versa, the
B<Parrot_jit_emit_mov*> functions have to be implemented.

=item jit/${jitcpuarch}/core.jit

The functions to generate native code for core parrot opcodes are specified
here. To simplify the maintenance of these functions, they are specified in a
format that is pre-processed by F<jit2c.pl> to produce a valid C source file,
F<jit_cpu.c>. See L<Format of .jit Files> below.

=item src/jit.h

This file contains definitions of generic structures used by the JIT subsystem.

The B<op_jit> array of B<jit_fn_info_t> structures, provides for each opcode, a
pointer to the function that generates native code for the opcode, whether the
generic B<Parrot_jit_normal_op> or B<Parrot_jit_cpcf_op> functions or an opcode
specific function. B<Parrot_jit_restart_op> is like B<Parrot_jit_cpcf_op> with
the addition to check for a zero program counter. The B<Parrot_jit_vtable*_op>
functions are defined as B<Parrot_jit_normal_op> or B<Parrot_jit_cpcf_op> and
may be implemented to do native vtable calls (s. F<jit/i386/jit_emit.h> for an
example).

The B<Parrot_jit_fixup> structure records the offset in native code where a
fixup must be applied, the type of fixup required and the specific information
needed to perform the parameters of the fixup.  Currently, a fixup parameter is
either an B<opcode_t> value or a function pointer.

The B<Parrot_jit_info> structure holds data used while producing and executing
native code.  An important piece of data in this structure is the B<op_map>
array, which maps from opcode addresses to native code addresses.

=item src/jit.c

B<parrot_build_asm>() is the main routine of the code generator, which loops
over the parrot bytecode, calling the code generating routines for each opcode
while filling in the B<op_map> array.  This array is used by the JIT subsystem
to perform certain types of fixups on native code, as well as by the native
code itself to convert bytecode program counters values (opcode_t *'s) to
hardware program counter values.

The bytecode is considered an array of B<opcode_t> sized elements, with
parallel entries in B<op_map>.  B<op_map> is initially populated with the
offsets into the native code corresponding to the opcodes in the bytecode. Once
code generation is complete and fixups have been applied, the native code
offsets are converted to absolute addresses.  This trades the low up-front cost
of converting all offsets once, for the unknown cost of repeatedly converting
these offsets while executing native code.

See F<src/jit/skeleton/jit_emit.h> for details.

=item tools/build/jit2c.pl

Preprocesses the .jit files to produce F<jit_cpu.c>.

=back

=head1 Defines in jit_emit.h

The architecture specific F<jit_emit.h> file communicates some defines and
tables with F<jit.c> and F<languages/imcc/imc.c>. The structure of the file and
the defines must therefore follow a specific syntax.

=head2 Overall structure

    #if JIT_EMIT

    ... emit code

    #else

    ... defines
    static const jit_arch_info arch_info = {
       ... initialization of maps
       ... and possibly private static functions
    }

    #endif

See F<src/jit/skeleton/jit_emit.h> for a more detailed explanation.

=head2 Defines

XXX most are moved into C<jit_arch_info> now.

=over 4

=item INT_REGISTERS_TO_MAP

This is the amount of integer registers to be mapped to processor registers.
The corresponding B<intval_map[]> has to have exactly this amount of register
numbers. A register with the value of zero can not be in the list.

=item FLOAT_REGISTERS_TO_MAP

When this is defined, it works like above for floating point registers.

=item PRESERVED_INT_REGS

When this is defined, it's the amount of integer registers, that are preserved
over function calls. These preserved registers have to be first in
B<intval_map>. When this is not defined, it is assumed that B<all> registers
are preserved over function calls.

=item PRESERVED_FLOAT_REGS

Same for floating point registers.

=item jit_emit_noop(pc)

=item JUMP_ALIGN

If these are defined, B<JUMP_ALIGN> should be a small number stating the
desired alignment of jump targets is B<1 << JUMP_ALIGN>.  The B<jit_emit_noop>
gets called with the unaligned B<pc> repeatedly, until the B<pc> has the
desired alignment. So the function can either emit a one byte B<noop>
instruction, or a B<noop> like instruction (sequence) with the desired size, to
achieve the necessary padding.  The emitted code must not have any side
effects.

=item ALLOCATE_REGISTERS_PER_SECTION

Normally F<jit.c> does register allocation per section, but there is a somewhat
experimental feature, to allocate registers per basic block.

=item MAP

Jit code generated by the F<imcc> JIT optimizer used negative numbers for
mapped registers and positive numbers for non mapped parrot registers. To use
this feature, the definition of mapped registers can be redefined like so:

    #define MAP(i) OMAP(i)
    #undef MAP
    #define MAP(i) (i) >= 0 ? 0 : OMAP(i)

=item Parrot_jit_emit_get_base_reg_no(pc)

This macro should return the register number of the register
base pointer. 

=back

See F<src/jit/i386/jit_emit.h> for actual usage of these defines.

=head1 Format of .jit Files

Jit files are interpreted as follows:

=over 4

=item I<op-name> { \n I<body> \n }

Where I<op-name> is the name of the Parrot opcode, and I<body> consists of C
syntax code which may contain any of the identifiers listed in the following
section.

The closing curly brace has to be in the first column.

=item Comment lines

Comments are marked with a I<;> in the first column. These and empty lines are
ignored.

=item Identifiers

In general, prefixing an identifier with I<&> yields an address.  The I<*>
prefix specifies a value.  Since Parrot register values vary during code
execution, their values can not be obtained through identifier substitution
alone, therefore offsets are used for accessing registers.

To obtain register offsets, a set of macros exists, that have C<OFFS> in
their names:

B<REG_OFFS_INT(reg_no)> ...

B<ROFFS_INT(n)> ...

B<INT_CONST[n]>

Gets replaced by the C<INTVAL> constant specified in the I<n>th argument.

B<NUM_CONST[n]>

Gets replaced by the C<FLOATVAL> constant specified in the I<n>th argument.

B<MAP[n]>

The I<n>th integer or floating processor register, mapped in this section.

Note: The register with the physical number zero can not be mapped.

=begin unimp

B<STRING_CONST_strstart[n]>

Gets replaced by C<strstart> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_buflen[n]>

Gets replaced by C<buflen> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_flags[n]>

Gets replaced by C<flags> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_strlen[n]>

Gets replaced by C<strlen> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_encoding[n]>

Gets replaced by C<encoding> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_type[n]>

Gets replaced by C<type> of the C<STRING> constant specified in the I<n>th
argument.

B<STRING_CONST_language[n]>

Gets replaced by C<language> of the C<STRING> constant specified in the I<n>th
argument.

=end unimp

B<NATIVECODE>

Gets replaced by the current native program counter.

B<*CUR_OPCODE[n]>

Gets replaced by the address of the current opcode in the Parrot bytecode.

B<ISRn> B<FSRn>

The I<n>th integer or floating point scratch register.


=item B<TEMPLATE> I<template-name> { \n I<body> \n }

Defines a template for similar functions, e.g. all the binary ops taking three
variable parameters.

=item I<template-name> I<perl-subst> ...

Take a template and do all substitutions to generate the implementation for
this jit function.

Example:

    TEMPLATE Parrot_set_x_ic {
    if (MAP[1]) {
        jit_emit_mov_ri<_N>(NATIVECODE, MAP[1], <typ>_CONST[2]);
    }
    else {
        jit_emit_mov_mi<_N>(NATIVECODE, &INT_REG[1], <typ>_CONST[2]);
    }
    }

    Parrot_set_i_ic {
    Parrot_set_x_ic s/<_N>/_i/ s/<typ>/*INT/
    }

    Parrot_set_n_ic {
    Parrot_set_x_ic s/<_N>/_ni/ s/<typ>/&INT/ s/INT_R/NUM_R/
    }

The jit function B<Parrot_set_i_ic> is based on the template
B<Parrot_set_x_ic>, the I<s/x/y/> are substitutions on the template body, to
generate the actual function body. These substitutions are done before the
other substitutions.

s. F<jit/i386/core.jit> for more.

=back

=head2 Naming convention for jit_emit functions

To make it easier to share F<core.jit> files between machines of similar
architecture, the jit_emit functions B<should> follow this syntax:

jit_emit_I<<op>>_I<<args>>_I<<type>>

=over 4

=item I<<op>>

This is the operation like B<mov>, B<add> or B<bxor>. In normal cases this is
the PASM name of the op.

=item I<<args>>

B<args> specify the arguments of the function in the PASM sequence B<dest>,
B<source> ... The B<args> consist of one letter per argument:

=over 4

=item B<r>

A mapped processor register.

=item B<m>

A memory operand, the address of the parrot register.

=item B<i>

An immediate operand, i.e. an integer constant.

=back

=item I<<type>>

Specifies if this operation works on integer or floating point arguments. If
all arguments are of the same type, only one type specifier is needed.

=over 4

=item B<i>

An integer argument

=item B<n>

A float argument.

=back

Examples:

=over 4

=item B<jit_emit_sub_rm_i>

Subtract integer at memory from integer processor register.

=item B<jit_emit_mov_ri_ni>

Move integer constant (immediate) to floating point register.

=back

=back

=head1 ALPHA Notes

The access to Parrot registers is done relative to C<$6>, all other memory
access is done relative to C<$27>, to access float constants relative to C<$7>
so you must preside the instruction with I<ldah $7,0($27)>.

=head1 i386 Notes

Only 32 bit INTVALs are supported. Long double FLOATVALs are ok.

There are four mapped integer registers B<%edi>, B<%esi>, B<%ecx>, and B<%edx>.
The first 2 of these are callee saved, they preserve their value around extern
function calls.

Four floating point operations the registers B<ST1> ... B<ST4> are mapped and
considered as preserved over function calls.

The register C<%ebx> holds the register frame pointer.

=head1 EXAMPLE

Let's see how this works:

B<Parrot Assembly:>

 set I0,8
 set I2,I0
 print I2
 end

B<Parrot Bytecode:> (only the bytecode segment is shown)

 +--------------------------------------+
 | 73 | 0 | 8 | 72 | 2 | 0 | 21 | 2 | 0 |
 +-|------------|------------|--------|-+
   |            |            |        |
   |            |            |        +----------- end (no arguments)
   |            |            +-------------------- print_i (1 argument)
   |            +--------------------------------- set_i_i (2 arguments)
   +---------------------------------------------- set_i_ic (2 arguments)

Please note that the opcode numbers used might have already changed.  Also
generated assembly code might be different.

B<Intel x86 assembly version of the Parrot ops:>

B<Parrot_jit_begin>

    0x817ddd0 <jit_func>:   push   %ebp
    0x817ddd1 <jit_func+1>: mov    %esp,%ebp
    0x817ddd3 <jit_func+3>: push   %ebx
    0x817ddd4 <jit_func+4>: push   %esi
    0x817ddd5 <jit_func+5>: push   %edi

  normal function header till here, now push interpreter

    0x817ddd6 <jit_func+6>: push   $0x8164420

  get jit function table to %ebp and
  jump to first instruction

    0x817dddb <jit_func+11>:    mov    0xc(%ebp),%eax
    0x817ddde <jit_func+14>:    mov    $0x81773f0,%ebp
    0x817dde3 <jit_func+19>:    sub    $0x81774a8,%eax
    0x817dde9 <jit_func+25>:    jmp    *%ds:0x0(%ebp,%eax,1)

B<set_i_ic>

    0x817ddee <jit_func+30>:    mov    $0x8,%edi

B<set_i_i>

    0x817ddf3 <jit_func+35>:    mov    %edi,%ebx

B<Parrot_jit_save_registers>

    0x817ddf5 <jit_func+37>:    mov    %edi,0x8164420
    0x817ddfb <jit_func+43>:    mov    %ebx,0x8164428

B<Parrot_jit_normal_op>

    0x817de01 <jit_func+49>:    push   $0x81774c0
    0x817de06 <jit_func+54>:    call   0x804be00 <Parrot_print_i>
    0x817de0b <jit_func+59>:    add    $0x4,%esp

B<Parrot_jit_end>

    0x817de0e <jit_func+62>:    add    $0x4,%esp
    0x817de14 <jit_func+68>:    pop    %edi
    0x817de16 <jit_func+70>:    pop    %ebx
    0x817de18 <jit_func+72>:    pop    %esi
    0x817de1a <jit_func+74>:    pop    %ebp
    0x817de1c <jit_func+76>:    ret

Please note the reverse argument direction. PASM and JIT notations use
I<dest,src,src>, while F<gdb> and the internal macros in F<jit_emit.h> have
I<src,dest>.

=head1 Debugging

Above listing was generated by F<gdb>, the GNU debugger, with a little help
from Parrot_jit_debug, which generates a symbol file in I<stabs> format, s.
B<info stabs> for more (or less :-()

The following script calls F<ddd> (the graphic debugger fronted) and attaches
the symbol file, after it got built in F<parrot_build_asm>.

    # dddp
    # run ddd parrot with given file
    # gdb confirmations should be off
    parrot -o $1.pbc -d1 $1.pasm
    echo "b runops_jit
    r -D4 -R jit $1.pbc
    n
    add-symbol-file $1.o 0
    s
    " > .ddd

    ddd --command .ddd parrot &

Run this with e.g. I<dddp t/op/jit_2>, then turn on the register status,
I<step> or I<nexti> through the source, or set break points as with any other
language.

You can examine parrot registers via the debugger or even set them and you can
always step into external opcode and look at I<*interpreter>.

The tests F<t/op/jit*.t> have some test cases for testing register allocation.
These tests are written for a mapping of 4 processor registers. If your
processor architecture has more mapped registers, reduce them to 4 and run
these tests.

=head2 Example for a debug session

  $ cat j.pasm
        set I0, 10
        set N1, 1.1
        set S2, "abc"
        print "\n"
        end
  $ dddp j

(ddd shows above source code and assembly (startup code snipped):

    0x815de46 <jit_func+30>:    mov    $0xa,%ebx
    0x815de4b <jit_func+35>:    fldl   0x81584c0
    0x815de51 <jit_func+41>:    fstp   %st(2)
    0x815de53 <jit_func+43>:    mov    %ebx,0x8158098
    0x815de59 <jit_func+49>:    fld    %st(1)
    0x815de5b <jit_func+51>:    fstpl  0x8158120
    0x815de61 <jit_func+57>:    push   $0x815cd90
    0x815de66 <jit_func+62>:    call   0x804db90 <Parrot_set_s_sc>
    0x815de6b <jit_func+67>:    add    $0x4,%esp
    0x815de6e <jit_func+70>:    push   $0x815cd9c
    0x815de73 <jit_func+75>:    call   0x804bcd0 <Parrot_print_sc>
    0x815de78 <jit_func+80>:    add    $0x4,%esp
    0x815de7b <jit_func+83>:    add    $0x4,%esp
    0x815de81 <jit_func+89>:    pop    %edi
    0x815de83 <jit_func+91>:    pop    %ebx
    0x815de85 <jit_func+93>:    pop    %esi
    0x815de87 <jit_func+95>:    pop    %ebp
    0x815de89 <jit_func+97>:    ret
  (gdb) n
  (gdb) n
  (gdb) n
  (gdb) p I0
  $1 = 10
  (gdb) p N1
  $2 = 1.1000000000000001
  (gdb) p *S2
  $3 = {bufstart = 0x815ad30, buflen = 15, flags = 336128, bufused =
  3, strstart = 0x815ad30 "abc"}
  (gdb) p &I0
  $4 = (INTVAL *) 0x8158098

XXX (p)rinting register contents like shown above is currently not supported.

=head1 SEE ALSO

F<docs/dev/jit_i386.pod>, F<jit/skeleton/jit_emit.h>