Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 27c66b1329522147a7556322a54c1562 > files > 11

php-amf-0.9.1-11mdv2009.1.i586.rpm


Project: AMF extension for PHP for encoding and decoding ActionScript messages
Author: Emanuele Ruffaldi emanuele.ruffaldi@gmail.com
License: PHP License 3.0 
Revision: 1.3
Last Revision: 13th Feb 2007

This document describes the encoding and decoding details used by AMF for PHP. Please check for ISSUE and TODO for open aspects

TODO: charsets in recordsets

=Encoding and Decoding details=

==Encoding==
This is the mapping from PHP types to AMF types.

Every string is represented in UTF8. When it is not the case conversion is needed. In the implementation we provide
this using AMF_TRANSLATE_CHARSET and AMF_TRANSLATE_CHARSET_NOASCII. That generate a AMFE_TRANSLATE_CHARSET event
both in encoding and decoding.

===AMF0===
This is the encoding by PHP type.

IS_BOOL: AMF0_BOOLEAN
IS_NULL: AMF0_NULL
IS_LONG: AMF0_NUMBER converted into double
IS_DOUBLE: AMF0_NUMBER 
IS_STRING: AMF0_STRING or AMF0_LONGSTRING
IS_OBJECT: AMF0_REFERENCE if cached. Otherwise handled as following:
    # if classname is "stdclass" then AMF0_OBJECT as anonymous object
    # if no callback is specified then AMF0_TYPEDOBJECT with the classname
    # else invoke the callback. The callback should return (value, type, classname). When type is not specified it is interpreted as AMFC_TYPEDOBJECT. When classname is not specified it is the same classname of the original.
        AMFC_XML: encode value as an XML string AMF0_XML.
        AMFC_OBJECT: encode value as a AMF0_OBJECT. Value can be an object or an array
        AMFC_TYPEDOBJECT: encode value as a AMF0_TYPEDOBJECT with specified classname. Value can be an object or an array
        AMFC_ANY: encode value as a mixed type
        AMFC_ARRAY: encode value as array. Value can be an object or an array
        AMFC_NONE: AMF0_UNDEFINED
        AMFC_RAW: use the string as direct encoding 
        AMFC_BYTEARRAY: encoded just as a string
        
        TODO AMFC_RECORDSET
        
    Note: private and protected members of an object are not serialized. They are marked with a key starting with \0
    
IS_RESOURCE: as IS_OBJECT using empty classname. If it is not handled by the callback generates AMF0_UNDEFINED and an error
IS_ARRAY: AMF0_REFERENCE if cached. Otherwise handled as following:
    The keys are analized in one pass for understanding the type of data. For this purpose
    negative indices are considered string keys.
    
    # only string keys => AMF0_OBJECT 
    # string keys and numeric keys => AMF0_MIXEDARRAY
    # only numeric keys contiguous => AMF0_ARRAY
    # only numeric keys and not contiguous => AMF0_MIXEDARRAY 

    Special Case - support for recordsets. A recordset can be optimized with a compact class representation
    A recordset is triggered by an array with a key __amf_recordset__=1
    
    Recordset array("__amf_recordset__" => 1, rows => ... , cols => array("c1","c2",...), id => "")
    
    The key "rows" is an array of rows each with the same number of elements as the "cols" array.
    
    In AMF0 it is represented with a typed object of class RecordSet
    
    typedobject class=RecordSet
        "serverInfo" => object(
            "totalCount" => total rows count
            "initialData" => array of rows. Each row is an array of column values
            "cursor" => 1
            "serviceName" => "PageAbleResult"
            "columnNames" => array(c1,...,cN)
            "version" => 1
            "id" => the same as the id field of the above array, if present otherwise ""
        )
===AMF3===
This is the encoding by PHP type.

IS_BOOL: AMF3_TRUE or AMF3_FALSE
IS_NULL: AMF3_NULL
IS_LONG: AMF3_INT if it can be represented in 29 bits: $d >= -268435456 && $d <= 268435455 otherwise it is a double
IS_DOUBLE: AMF3_NUMBER
IS_STRING: AMF3_STRING with caching
IS_OBJECT: cached reference or following handling:
    
    # if classname is "stdclass" then generate AMF3_OBJECT with the same classname
    # if no callback is specified then AMF0_TYPEDOBJECT with the classname
    # else invoke the callback. The callback should return (value, type, classname). When type is not specified it is interpreted as AMFC_TYPEDOBJECT. When classname is not specified it is the same classname of the original.
    
        AMFC_XML: encode value as an XML string AMF3_XML. ISSUE should be this string cached?
        AMFC_OBJECT: encode value as AMF3_OBJECT with anonymous class. Value can be an object or an array
        AMFC_TYPEDOBJECT: encode value as a AMF3_OBJECT with specified classname. Value can be an object or an array
        AMFC_ANY: encode value as a mixed type
        AMFC_ARRAY: encode value as array. Value can be an object or an array
        AMFC_NONE: AMF3_UNDEFINED
        AMFC_RAW: use the string as direct encoding 
        AMFC_BYTEARRAY: AMF3_BYTEARRAY
        
        TODO special AMFC_RECORDSET
           
    Note: private and protected members of an object are not serialized. They are marked with a key starting with \0
    
IS_RESOURCE: as IS_OBJECT using empty classname. If it is not handled by the callback generates AMF0_UNDEFINED and an error

IS_ARRAY: cached reference or following handling:
    The keys are analized in one pass for understanding the type of data. For this purpose
    negative indices are considered string keys. 
    
    # only string keys or not contiguos numeric keys => AMF3_OBJECT anonymous
    # others => AMF3_ARRAY
    
    ISSUE maybe an array with not contiguos numeric keys could be an AMF3_ARRAY with no numeric values and only keys

    Special Case - support for recordsets. A recordset can be optimized with a compact class representation extremely efficient in AMF3. A recordset is triggered by an array with a key __amf_recordset__=1
    
    Recordset array("__amf_recordset__" => 1, rows => ... , cols => array("c1","c2",...) )
    
    The key "rows" is an array of rows each with the same number of elements as the "cols" array.
    
    This special array is encoded with an anonymous class that has a fixed number of fields as the columns of the recordset. Then
    the data is represented by an array of objects of this anonymous types containing all the data. Such representation is 
    extremely packed and efficient, without the names of the fields The pseudo representation of such recordset is as following.
    
        AMF3_ARRAY with length rowcount 
            AMF0_OBJECT with inline class definition (name="", static object, members = "c1","c2",...) and data "a00","a01","a02",...
            AMF0_OBJECT of last class definition and packed data "a10","a11",...
            AMF0_OBJECT of last class definition and packed data "a20","a21",...
            AMF0_OBJECT of last class definition and packed data "a30","a31",...
            
    TODO if ArrayCollection mode then prepend with object "flex.messaging.io.ArrayCollection" as externalizable class

    
    
ISSUE predefined class members in class definitions are not used, except for the case of Recordset. Eventually it could be optimized by using
PHP class definition


==Decoding==

When a string is going to be received there is a charset encoding process from the Flash UTF8 format to the PHP encoding.
Such encoding option can be activated using the flag AMF_TRANSLATE_CHARSET

===AMF0===

AMF0_UNDEFINED: NULL
AMF0_NULL: NULL
AMF0_BOOL: true or false
AMF0_NUMBER: double
AMF0_STRING: string
AMF0_LONGSTRING: string
AMF0_XML: string  Invoke AMFE_POST_DECODE_XML callback for transforming XML data into object
AMF0_AMF3: AMF3 mode
AMF0_MOVIECLIP, AMF0_UNSUPPORTED,AMF0_RECORDSET: unsupported
AMF0_MIXEDARRAY: array with string and integer keys. A string key that can be a negative integer is treated as an integer key.
AMF0_ARRAY: array with only positive numeric keys
AMF0_OBJECT,AMF0_TYPEDOBJECT: 

    # invoke the callback for mapping the classname to an object or array. Only if the classname is not empty. If the callback returns NULL then continue otherwise use this value
    # otherwise if the AMF_ASSOCIATIVE_DECODE is set the object is decoded as array, with the additional key _explicitType with the classname
    # otherwise if the AMF_ASSOCIATIVE_DECODE is NOT set the decoder tries to instantiate the object with the classname or "stdcase" if the class is not existant or it is an anonymous class
    
    If the callback has been provided and the flags AMF_POST_DECODE is present invoke the post decode callback for transformation of the object or replacement

===AMF3===

AMF3_UNDEFINED: NULL
AMF3_NULL: NULL
AMF3_FALSE: false
AMF3_TRUE: true
AMF3_INTEGER: integer
AMF3_NUMBER: double
AMF3_STRING: string
AMF3_BYTEARRAY: string Invoke AMFE_POST_BYTEARRAY callback for possibly transforming the data
AMF3_XMLSTRING and AMF3_XML: string  Invoke AMFE_POST_DECODE_XML callback for transforming XML data into object
AMF3_DATE: double TODO invoke postdecode callback?
AMF3_ARRAY: array with strink and numeric keys. Converts strings of negative keys into numbers
AMF3_OBJECT: 

    If the class is not marked as externalizable
    
    # invoke the callback for mapping the classname to an object or array using AMFE_MAP, only if the classname is not empty
    # otherwise if the AMF_ASSOCIATIVE_DECODE is set the object is decoded as array, with the additional key _explicitType with the classname
    # otherwise if the AMF_ASSOCIATIVE_DECODE is not set the decoder tries to instantiate the object with the classname or "stdcase" if the class is not existant or it is an anonymous class
    
    If the class is marked as externalizable - the callback is invoked with AMFE_MAP_EXTERNALIZABLE. If the callback returns NULL or it is not present then the decoder tries
    to read another value from the decoding stream. If the callback returns not NULL then the value is used for the decoding.
    
    TODO externalizable should receive the input data stream process it some way
    
    If the callback has been provided and the flags AMF_POST_DECODE is present invoke the post decode callback for
    transformation of the object or replacement


=Callbacks=

These are some comments on the typical callbacks in PHP

==Typical Encoding Callback ==

    if(PHP5 && $classname == "domdocument") return ($value->saveXml(), AMFC_XML);
    if(!PHP5 && $classname == "domdocument") return ($value->dump_mem(), AMFC_XML);
    if(!PHP5 && $classname == "simplexmlelement") return ($value->asXML(), AMFC_XML);

==Typical Decoding Callback==

This is the typical decoding callback for PHP.

AMFE_MAP:
    An adapter mapping from Database data should build an array array("__amf_recordset__"=> 1, "rows" => array(r1,...,rN)        
    
AMFE_MAP_EXTERNALIZABLE:
    $classname = $arg;
    if($classname == "flex.messaging.io.ArrayCollection") return NULL; // that is AMFC_ANY
    elsif($classname == "flex.messaging.io.ObjectProxy") return NULL; // that is AMFC_ANY
    else ERROR!
    
AMFE_POST_DECODE:
    if($isObject && method_exists($value, 'init'))
        $obj->init();
    
AMFE_POST_XML:
    parse XML

    
=Optimization=

These are some of the optimization performed by the encoder and decoder.

Caching: during encoding objects are cached using the effective address as a key. If the long type has the same length of a pointer it is possible to use the pointer as the key instead of a string representation of the address

Output: the encoder generates a string. Such string can be short or extremely long. Moreover it could contains the content of other
PHP strings that are possibily long (like for text representation of XML data). In the initial version the PHP smartstr structure was
used but it was extremely inefficient. We have optimized the output generation using a linked list of buffers that are allocated
on demanding. The size of the buffer starts with 64 and grows exponentially to a maximum of 128k. The content of each buffer is made
of a sequence of raw strings or references to zval containing strings. A PHP string is stored as a reference only if its length is bigger
than a certain threshold (128) because the fragmentation of the buffer reduces the performances.

For example

    part - "..." ref "......" ref end
    part - ref "..." ref ref ref "... ... " end 
    
 ISSUE a possibile optimization to be adopted is to avoid the caching of long strings
 
=Recordset=

Apart the special handling of __amf_recordset__ it could be interesting to generate the data directly from the database with
reduced memory usage
       
AMFC_RECORDSET specifies a recordset. The value is an array that describes the recordset. The recordset can be fully specified or by parts. 
    
In fully specified we need to provide all the data at once

value = array( "rows" => array(r1, ..., rN), "cols" => array(c1, ... cN) )  

TODO In callback mode the data is provided by callback in multiple iterations. Initially it just returns the structure and on following
invocations it should return some rows

Initial: value = array( "rows" => rowcount, "cols" => array(c1, ... cN), ... )
Following: value = array( "rows" => array(r1,...,rN)) 
Last: when the rowcount is reached or the callback returns NULL the iteration is terminated.