Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 6acad63121842f76ef0a00597605dae2 > files > 8

libsyck0-0.55-7mdv2010.0.i586.rpm

$Id: README.EXT,v 1.5 2005/05/19 04:51:31 why Exp $

This is the documentation for libsyck and describes how to extend it.

= Overview =

Syck is designed to take a YAML stream and a symbol table and move
data between the two.  Your job is to simply provide callback functions which
understand the symbol table you are keeping.

Syck also includes a simple symbol table implementation.

== About the Source ==

The Syck distribution is laid out as follows:

  lib/             libsyck source (core API)
    bytecode.re    lexer for YAML bytecode (re2c)
    emitter.c      emitter functions
    gram.y         grammar for YAML documents (bison)
    handler.c      internal handlers which glue the lexer and grammar
    implicit.re    lexer for builtin YAML types (re2c)
    node.c         node allocation and access
    syck.c         parser funcs, central funcs
    syck.h         libsyck definitions
    syck_st.c      symbol table functions
    syck_st.h      symbol table definitions
    token.re       lexer for YAML plaintext (re2c)
    yaml2byte.c    simple bytecode emitter
  ext/             ruby, python, php, cocoa extensions
  tests/           unit tests for libsyck
    YTS.c.rb       generates YAML Testing Suite unit test
                   (use: ruby YTS.c.rb > YTS.c)
    Basic.c        allocation and buffering tests
    Parse.c        parser sanity
    Emit.c         emitter sanity

== Using SyckNodes ==

The SyckNode is the structure which YAML data is loaded into
while parsing.  It's also a good structure to use while emitting,
however you may choose to emit directly from your native types
if your extension is very small.

SyckNodes are designed to be used in conjunction with a symbol
table.  More on that in a moment.  For now, think of a symbol
table as a library which stores nodes, assigning each node a
unique identifier.

This identifier is called the SYMID in Syck.  Nodes refer to
each other by SYMIDs, rather than pointers.  This way, the
nodes can be free'd as the parser goes.

To be honest, SYMIDs are used because this is the way Ruby
works.  And this technique means Syck can use Ruby's symbol
table directly.  But the included symbol table is lightweight,
solves the problem of keeping too much data in memory, and
simply pairs SYMIDs with your native object type (such as
PyObject pointers.)

Three kinds of SyckNodes are available:

1. scalar nodes (syck_str_kind):
   These nodes store a string, a length for the string
   and a style (indicating the format used in the YAML
   document).

2. sequence nodes (syck_seq_kind):
   Sequences are YAML's array or list type.
   These nodes store a list of items, which allocation
   is handled by syck functions.

3. mapping nodes (syck_map_kind):
   Mappings are YAML's dictionary or hashtable type.
   These nodes store a list of pairs, which allocation
   is handled by syck functions.

The syck_kind_tag enum specifies the above enumerations,
which can be tested against the SyckNode.kind field.

PLEASE leave the SyckNode.shortcut field alone!!  It's
used by the parser to workaround parser ambiguities!!

=== Node API ===

  SyckNode *
  syck_alloc_str()
  syck_alloc_seq()
  syck_alloc_str()

    Allocates a node of a given type and initializes its 
    internal union to emptiness.  When left as-is, these
    nodes operate as a valid empty string, empty sequence
    and empty map.

    Remember that the node's id (SYMID) isn't set by the
    allocation functions OR any other node functions herein.
    It's up to your handler function to do that.

  void
  syck_free_node( SyckNode *n )

    While the Syck parser will free nodes it creates, use
    this to free your own nodes.  This function will free
    all of its internals, its type_id and its anchor.  If
    you don't need those members free, please be sure they
    are set to NULL.

  SyckNode *
  syck_new_str( char *str, enum scalar_style style )
  syck_new_str2( char *str, long len, enum scalar_style style )

    Creates scalar nodes from C strings.  The first function
    will call strlen() to determine length.

  void
  syck_replace_str( SyckNode *n, char *str, enum scalar_style style )
  syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style )

    Replaces the string content of a node `n', while keeping
    the node's type_id, anchor and id.

  char *
  syck_str_read( SyckNode *n )

    Returns a pointer to the null-terminated string inside scalar node
    `n'.  Normally, you might just want to use:

      char *ptr = n->data.str->ptr
      long len = n->data.str->len

  SyckNode *
  syck_new_map( SYMID key, SYMID value )

    Allocates a new map with an initial pair of nodes.

  void
  syck_map_empty( SyckNode *n )

    Empties the set of pairs for a mapping node.

  void
  syck_map_add( SyckNode *n, SYMID key, SYMID value )

    Pushes a key-value pair on the mapping.  While the ordering
    of pairs DOES affect the ordering of pairs on output, loaded
    nodes are deliberately out of order (since YAML mappings do
    not preserve ordering.)

    See YAML's builtin !omap type for ordering in mapping nodes.

  SYMID
  syck_map_read( SyckNode *n, enum map_part, long index )

    Loads a specific key or value from position `index' within
    a mapping node.  Great for iteration:

      for ( i = 0; i < syck_map_count( n ); i++ ) {
        SYMID key = sym_map_read( n, map_key, i );
        SYMID val = sym_map_read( n, map_value, i );
      }

  void
  syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id )

    Replaces a specific key or value at position `index' within
    a mapping node.  Useful for replacement only, will not allocate
    more room when assigned beyond the end of the pair list.

  long
  syck_map_count( SyckNode *n )

    Returns a count of the pairs contained by the mapping node.

  void
  syck_map_update( SyckNode *n, SyckNode *n2 )

    Combines all pairs from mapping node `n2' into mapping node
    `n'.

  SyckNode *
  syck_new_seq( SYMID val )

    Allocates a new seq with an entry `val'.

  void
  syck_seq_empty( SyckNode *n )

    Empties a sequence node `n'.

  void
  syck_seq_add( SyckNode *n, SYMID val )

    Pushes a new item `val' onto the end of the sequence.

  void
  syck_seq_assign( SyckNode *n, long index, SYMID val )

    Replaces the item at position `index' in the sequence
    node with item `val'.  Useful for replacement only, will not allocate
    more room when assigned beyond the end of the pair list.

  SYMID
  syck_seq_read( SyckNode *n, long index )

    Reads the item at position `index' in the sequence node.
    Again, for iteration:

      for ( i = 0; i < syck_seq_count( n ); i++ ) {
        SYMID val = sym_seq_read( n, i );
      }

  long
  syck_seq_count( SyckNode *n )

    Returns a count of items contained by sequence node `n'.

== YAML Parser ==

Syck's YAML parser is extremely simple.  After setting up a
SyckParser struct, along with callback functions for loading
node data, use syck_parse() to start reading data.  Since
syck_parse() only reads single documents, the stream can be
managed by calling syck_parse() repeatedly for an IO source.

The parser has four callbacks: one for reading from the IO
source, one for handling errors that show up, one for
handling nodes as they come in, one for handling bad
anchors in the document.  Nodes are loaded in the order they
appear in the YAML document, however nested nodes are loaded 
before their parent.

=== How to Write a Node Handler ===

Inside the node handler, the normal process should be:

1. Convert the SyckNode data to a structure meaningful
   to your application.

2. Check for the bad anchor caveat described in the
   next section.

3. Add the new structure to the symbol table attached
   to the parser.  Found at parser->syms.

4. Return the SYMID reserved in the symbol table.

=== Nodes and Memory Allocation ===

One thing about SyckNodes passed into your handler:
Syck WILL free the node once your handler is done with it.
The node is temporary.  So, if you plan on keeping a node
around, you'll need to make yourself a new copy.

And you'll probably need to reassign all the items
in a sequence and pairs in a map.  You can do this
with syck_seq_assign() and syck_map_assign().  But, before
you do that, you might consider using your own node structure
that fits your application better.

=== A Note About Anchors in Parsing ===

YAML anchors can be recursive.  This means deeper alias nodes
can be loaded before the anchor.  This is the trickiest part
of the loading process.

Assuming this YAML document:

  --- &a [*a]

The loading process is:

1. Load alias *a by calling parser->bad_anchor_handler, which
   reserves a SYMID in the symbol table.

2. The `a' anchor is added to Syck's own anchor table,
   referencing the SYMID above.

3. When the anchor &a is found, the SyckNode created is
   given the SYMID of the bad anchor node above.  (Usually
   nodes created at this stage have the `id' blank.)

4. The parser->handler function is called with that node.
   Check for node->id in the handler and overwrite the
   bad anchor node with the new node.

=== Parser API ===

 See <syck.h> for layouts of SyckParser and SyckNode.

 SyckParser * 
 syck_new_parser()

  Creates a new Syck parser.

 void
 syck_free_parser( SyckParser *p )

  Frees the parser, as well as associated symbol tables
  and buffers.

 void
 syck_parser_implicit_typing( SyckParser *p, int on )

  Toggles implicit typing of builtin YAML types.  If
  this is passed a zero, YAML builtin types will be
  ignored (!int, !float, etc.)  The default is 1.

 void
 syck_parser_taguri_expansion( SyckParser *p, int on )

  Toggles expansion of types in full taguri.  This
  defaults to 1 and is recommended to stay as 1.
  Turning this off removes a layer of abstraction
  that will cause incompatibilities between YAML
  documents of differing versions.

 void 
 syck_parser_handler( SyckParser *p, SyckNodeHandler h )

  Assign a callback function as a node handler.  The
  SyckNodeHandler signature looks like this:

    SYMID node_handler( SyckParser *p, SyckNode *n )

 void
 syck_parser_error_handler( SyckParser *p, SyckErrorHandler h )

  Assign a callback function as an error handler.  The
  SyckErrorHandler signature looks like this:

   void error_handler( SyckParser *p, char *str )

 void
 syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h ) 

  Assign a callback function as a bad anchor handler.
  The SyckBadAnchorHandler signature looks like this:

   SyckNode *bad_anchor_handler( SyckParser *p, char *anchor )

 void
 syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r )

   Assigns a FILE pointer as an IO source and a callback function
   which handles buffering of that IO source.

   The SyckIoFileRead signature looks like this:

     long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip );

   Syck comes with a default FILE handler named `syck_io_file_read'.  You
   can assign this default handler explicitly or by simply passing in NULL
   as the `r' parameter.

  void
  syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r )

    Assigns a string as the IO source with a callback function `r'
    which handles buffering of the string.

    The SyckIoStrRead signature looks like this:

      long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip );

   Syck comes with a default string handler named `syck_io_str_read'.  You
   can assign this default handler explicitly or by simply passing in NULL
   as the `r' parameter.

  void
  syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r )

    Same as the above, but uses strlen() to determine string size.

  
  SYMID
  syck_parse( SyckParser *p )

    Parses a single document from the YAML stream, returning the SYMID for
    the root node.

== YAML Emitter ==

Since the YAML 0.50 release, Syck has featured a new emitter API.  The idea
here is to let Syck figure out shortcuts that will clean up output, detect
builtin YAML types and -- especially -- determine the best way to format
outgoing strings.

The trick with the emitter is to learn its functions and let it do its
job.  If you don't like the formatting Syck is producing, please get in
contact the author and pitch your ideas!!

Like the YAML parser, the emitter has a couple of callbacks: namely,
one for IO output and one for handling nodes.  Nodes aren't necessarily
SyckNodes.  Since we're ultimately worried about creating a string, SyckNodes
become sort of unnecessary.

=== The Emitter Process ===

1. Traverse the structure you will be emitting, registering all nodes
   with the emitter using syck_emitter_mark_node().  This step will
   determine anchors and aliases in advance.

2. Call syck_emit() to begin emitting the root node.

3. Within your emitter handler, use the syck_emit_* convenience methods
   to build the document.

4. Call syck_emit_flush() to end the document and push the remaining
   document to the IO stream.  Or continue to add documents to the output
   stream with syck_emit().

=== Emitter API ===

 See <syck.h> for the layout of SyckEmitter.

 SyckEmitter * 
 syck_new_emitter()

  Creates a new Syck emitter.

 SYMID
 syck_emitter_mark_node( SyckEmitter *e, st_data_t node )

  Adds an outgoing node to the symbol table, allocating an anchor
  for it if it has repeated in the document and scanning the type
  tag for auto-shortcut.

 void
 syck_output_handler( SyckEmitter *e, SyckOutputHandler out )

  Assigns a callback as the output handler.

    void *out_handler( SyckEmitter *e, char * ptr, long len ); 

  Receives the emitter object, pointer to the buffer and a count
  of bytes which should be read from the buffer.

 void
 syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler


 void
 syck_free_emitter