Sophie: wise-2.2.0-7mdv2010.0 i586

wise-2.2.0-7mdv2010.0.i586.rpm

\documentstyle{article}
\begin{document}

\newcommand{\programtext}[1]{{\tt #1}}

\title{Wise2 API (version 2.1.19b)}
\author{Ewan Birney\\
Sanger Centre\\
Wellcome Trust Genome Campus\\
Hinxton, Cambridge CB10 1SA,\\
England.\\
Email: birney@sanger.ac.uk}

\maketitle
 
\newpage
\tableofcontents
\newpage

\section{Overview}

This document describes the API of the Wise2 system. The API
(application programming interface) allows other programmers
use the functionality in the Wise2 package directly, rather
than treating the executables as a black box through which
you get ASCII output.

If you want to learn more about the Wise2 package itself, the
algorithms in it or what it is used for, look for the Wise2 
documentation (available as postscript), probably in the same
place that you found this documentation!

The API is accessible in 3 different ways: As a C function calls
made inside the Wise2 package namespace - this is the way 
the current executables (eg, genewise) access the API, as C
function calls made from outside the Wise2 package namespace - this
is for people writing C programs with their own set of functions who
do not want name clashes of things like ``Sequence'' (in this API
the name is exported as ``Wise2\_Sequence''), and finally as a Perl
API, using the XS extension code where C function calls which are
dynamic loaded into the Perl interpretter can be executed as if
they were standard Perl commands.

Probably the most usable is the Perl API. Perl is a very forgiving
language, and it is easier to learn for novice programmers - in
particular memory management is handled for you. For people who want
to use the Wise2 api from inside their own C program, I would use the
external api. For people who want to extend wise2 programs to do other
things, the internal api.

\section{WARNING - still in alpha}

After playing around with the API for a while, I have realised that
a number of things are not clean enough in the interface. I am not
currently considering the API to the 2.1.x series stable. An aim
for the 2.2 series is to make a stable and useful API, that is
well documented.

However, this API does work, and there is this documentation for it,
so it maybe worth people who like this sort of thing to play around
with it. Anyone who uses the API gets huge guru points from me...
 
\section{API generation}

The API is not manually generated but rather is generated by the
Dynamite compiler. Dynamite is a language which I wrote specifically
for the Wise2 project: it is a cranky but useful language based heavily
on C (it converts its source code to C), with a portion dedicated to
dynamic programming code (a common algorithm in bioinformatics). It also
has a lightweight object model that supports scalars and lists of types.
Because the API is generated through Dynamite, you can expect consistent
documentation and memory handling of all the functions and objects. 

\section{Getting Started for the impatient}

Here is 3 different ways of using the Wise2 API to reverse complement a
sequence. Once in perl, once using the name space protected API, and
once using the internal API.

These three programs all make the same output, using the same code.
It is only how the programming is presented to the user (once in
perl, twice in C) which changes.

\subsection{Perl reverse complement}

\begin{verbatim}

#!/usr/local/bin/perl


use Wise2; # loads in Wise2 api

$file = shift; # first argument

if( !defined $file ) {
    print "You must give a file to revcom for a reverse to work!";
    exit(1);
}

$seq = &Wise2::Sequence::read_fasta_file_Sequence($file);
$rev = $seq->revcomp(); 

print "Original sequence\n\n";
$seq->write_fasta(STDOUT);
print "Reversed sequence\n\n";
$rev->write_fasta(STDOUT);
\end{verbatim}

\subsection{Wise2 external API calls}

\begin{verbatim}
#include "dyna_api.h"


int main(int argc,char ** argv)
{
  Wise2_Sequence * seq;
  Wise2_Sequence * rev;

  if( argc != 2 ) {
    fprintf(stderr,"have to give an argument for a file");
    exit(1);
  }

  seq = Wise2_read_fasta_file_Sequence(argv[1]);

  if( seq == NULL ) {
    fprintf(stderr,"Unable to read fasta file in %s\n",argv[1]);
    exit(1);
  }
  
  rev = Wise2_reverse_complement_Sequence(seq);

  printf("Original sequence\n\n");
  Wise2_write_fasta_Sequence(seq,stdout);
  printf("Revcomp sequence\n\n");
  Wise2_write_fasta_Sequence(rev,stdout);
 
  Wise2_free_Sequence(seq);
  Wise2_free_Sequence(rev);
}

\end{verbatim}

\subsection{Wise2 internal API calls}

\begin{verbatim}
#include "dyna.h"


int main(int argc,char ** argv)
{
  Sequence * seq;
  Sequence * rev;

  if( argc != 2 ) {
    fprintf(stderr,"have to give an argument for a file");
    exit(1);
  }

  seq = read_fasta_file_Sequence(argv[1]);

  if( seq == NULL ) {
    fprintf(stderr,"Unable to read fasta file in %s\n",argv[1]);
    exit(1);
  }
  
  rev = reverse_complement_Sequence(seq);

  printf("Original sequence\n\n");
  write_fasta_Sequence(seq,stdout);
  printf("Revcomp sequence\n\n");
  write_fasta_Sequence(rev,stdout);
 
  free_Sequence(seq);
  free_Sequence(rev);
}

\end{verbatim}

\section{Navigating the source code}

The Wise2 api has a bewildering number of objects and
functions, and the biggest problem in using the API is
knowing which objects can be made from what. This next
section walks you through at an object level how to
do some common tasks. This list is in no way complete, but
it is better than just browsing around the index.

A very good place to start is to read the scripts in the 
perl/scripts area (halfwise.pl does not use the Wise2 API
but all the others do). 

\subsection{Making a translation of a DNA sequence}
\begin{itemize}
\item Build a codon table object from a file (\ref{object_CodonTable})
\item Build a sequence object, from a file or strings (\ref{object_Sequence})
\item Use the translate function on the Sequence object
\end{itemize}

\subsection{Comparing two sequences using smith waterman}
\begin{itemize}
\item Build a Comparison matrix object from a file (\ref{object_CompMat})
\item Build two Sequence objects, from a file or strings (\ref{object_Sequence})
\item Optionally convert the Sequence objects into Protein objects (\ref{object_Protein}). This ensures you have proteins
\item Read in the comparison matrix using CompMat (\ref{object_CompMat})
\item Use one of the algorithm calls in sw_wrap module (\ref{module_sw_wrap})
\item Show the alignment using a call in the seqaligndisaply module (\ref{module_seqaligndisplay})
\end{itemize}

\subsection{Running a smith waterman search of  single protein sequence vs a db}
\begin{itemize}
\item Read in a sequence object and convert it to a protein object (\ref{object_Protein},\ref{object_Sequence})
\item Make a protein database from the single protein object (\ref{object_ProteinDB})
\item Make a protein database from a single fasta file (\ref{object_ProteinDB})
\item Using one of the calls to the sw_wrap module, make a Hscore object (\ref{module_sw_wrap})
\item Show the Hscore object using a show function (\ref{object_Hscore})
\item Retrieve individual protein objects from the database by taking out the
DataEntry objects (\ref{object_DataEntry}) and passing them into the ProteinDB object (\ref{object_ProteinDB}), giving you a protein object
\item optionally align them as in the above section
\end{itemize}

\subsection{Running a genewise on a single protein vs a single DNA sequence}
See the script genewise.pl in the distribution
\begin{itemize}
\item Make a Sequence object from a strings or a file (\ref{object_Sequence})
\item Make that a protein object (\ref{object_Protein})
\item Make a Sequence object from a string or a file (\ref{object_Sequence})
\item Make that a Genomic object (\ref{object_Genomic})
\item Add any additional repeat areas from external information to the genomic object
\item Read in a gene frequency counts (\ref{object_GeneFrequency})
\item Read in a codon table (\ref{object_CodonTable})
\item Make a random DNA model (\ref{object_RandomModelDNA})
\item Make an algorithm type (\ref{module_gwrap})
\item Build an entire parameter set for genewise using Wise2::GeneParameter21_wrap (\ref{module_gwrap})
\item Run the actual algorithm (\ref{module_gwrap})
\item show the alignment using genedisplay (\ref{module_gwrap})
\end{itemize}

\section{Concepts and overview of the API}

The API is organised in the following way. There are 4 main areas of source
code in the wise2 package
\begin{itemize}
\item wisebase - base memory, string and error handling libraries
\item dynlibsrc - generic bioinformatics objects
\item models - specific Wise2 objects
\item HMMer2 - HMMER 2 (Sean Eddy's HMM package)
\end{itemize}

The API is mainly derived from the dynlibsrc and models directories. There
is no distinction in the API of one directory from another 

\section{The reference section}

The reference section is built automatically from the 
Dynamite source. This means that the function names,
argument lists and in nearly all cases the documentation
should be completely up to date with whatever version
you got this documentation from.

The code is divided up into modules: each module having
potentially a number of objects in it and a number of
free standing functions (factory functions). The documentation
lists each object and the fields in the object which are
accessible by the Perl API and the external API (more fields
maybe accessible by the internal API, but generally these are
not fields that you are expected to use). Fields can either
be scalar or list types. In either case the scalar or list 
can either be a basic type or another object type. The following
access methods are available for scalar types

\subsection{Accessing fields in objects}
\label{accessing_fields}

In both the external API and the Perl API you can access all the
fields via function calls. In Perl these function calls have the
correct names space system to be called using the OOP syntax
of Perl.

\subsubsection{Perl scalar accessors}

\begin{itemize}
\item \$obj\->\emph{fieldname}() gets the value of this field
\item \$obj\->set\_\emph{fieldname}(\emph{new value}) sets the value of this field
\end{itemize}

For example
\begin{verbatim}

   $name = $seq->name();       # get the name of a sequence
   $seq->set_name('NewName');  # set the name of a sequence

\end{verbatim}

\subsubsection{External C scalar accessors}

\begin{itemize}
\item Wise2\_access\_\emph{fieldname}\_\emph{ObjectName}(obj) gets the value of this field
\item Wise2\_replace\_\emph{fieldname}\_\emph{ObjectName}(obj,\emph{new value}) sets the value of this field
\end{itemize}

For example
\begin{verbatim}

   char * name;
   Wise2_Sequence * seq;

   /* ... get a sequence object somehow ... */
	
   name = Wise2_access_name_Sequence(seq);
   
   Wise2_replace_name_Sequence(seq,"NewName");

\end{verbatim}

\subsubsection{Perl List accessors}

\begin{itemize}
\item \$obj\->each\_\emph{fieldname}() Gives a Perl array of all the items in a list
\item \$obj\->length\_\emph{fieldname} Length of the list
\item \$obj\->\emph{fieldname}(\$i) The ith member of the list
\item \$obj\->add\_\emph{fieldname}(\$another\_obj) Adds another object to the list
\item \$obj\->flush\_\emph{fieldname}() Destroys all the items in a list, sets list size to zero
\end{itemize}

\subsubsection{External API List accessors}

\begin{itemize}
\item Wise2\_access\_\emph{fieldname}\_\emph{ObjectName}(obj,i) access the ith position in the list
\item Wise2\_flush\_\emph{fieldname}\_\emph{ObjectName}(obj) Flushes the list
\item Wise2\_add\_\emph{fieldname}\_\emph{ObjectName}(obj,added\_object) Adds an object onto the end of the list
\end{itemize}

\subsection{Object Construction and handling}

The good news is that in the Perl API \emph{all} the memory handling is managed
between the Perl memory handling method and the Wise2 handling method. Bascially
you can completely forget about these things and code normally in Perl and
all the memory is handled for you.

In the C external API, as in any C program, the programmer is responsible for the
memory, and you need to read the documentation as to whether the objects you 
recieve from function calls need explict frees or not.

\subsubsection{Low level Object Constructing}

In both Perl and in C you have the possibility of making a new object from scratch.

\begin{itemize}
\item In Perl it is \$obj = new Wise2::\emph{ObjectName};
\item in C it is obj = \emph{ObjectName}\_alloc for objects with no lists, and obj = \emph{ObjectName}\_alloc\_std 
for objects with lists (this is a mistake in the API I know).
\end{itemize}

However I would read carefully the documentation for an object first, as in some cases the
objects have to made through specific functions. These are likely to be things like
new\_\emph{ObjectName} or such like. They are likely to be ``factory'' functions, that is
functions not attached to any object. 

\subsubsection{Object deconstructors}

In Perl you don't have to worry about this (heaven).


In the C API you have two functions to handle the memory of objects. 
The objects have a reference counted memory: when the free function is
called it decrements the object reference count and if this count
hits 0 then the object itself is free'd. To up the reference count you
call the hard\_link\_\emph{ObjectName} function.

\begin{itemize}
\item free\_\emph{ObjectName}(obj) Releases this pointer on this object
\item obj = hard\_link\_\emph{ObjectName}() Adds this pointer to object, increasing the
reference count
\end{itemize}

\section{Wise2 Specific Modules}

There are a number of modules which are specific to Wise2 algorithms.
These should be the starting point for how to use Wise2 algorithms:
try to find a function in these modules which provide the functionality
that you want. Then figure out how to make the appropiate objects
to use this functionality.

\begin{description}
\item[gwrap] \ref{module_gwrap} The gwrap module has the main entry points
for the genewise algorithm and how to build parameters for it
\item[estwrap] \ref{module_estwrap} The estwrap module has the main entry
points for the estwise algorithm
\item[sw\_wrap] \ref{module_sw_wrap} The sw\_wrap has the main entry points for
the smith waterman algorithm
\item[genedisplay] \ref{module_genedisplay} The pretty ascii output used
for genewise and estwise output
\item[seqaligndisplay] \ref{module_seqaligndisplay} The pretty ascii output used
for smith waterman alignments
\item[threestatemodel] \ref{module_threestatemodel} profile-HMM support
\item[threestatedb] \ref{module_threestatedb} profile-HMM database support
\item[genefrequency] \ref{module_genefrequency} Raw counts for the genewise model
\item[geneparameter] \ref{module_geneparameter} probabilities for the genewise model
\item[cdparser] \ref{module_cdparser} probabilities for the estwise model
\end{description}

\section{Dynamite library modules}

\subsection{Sequence modules}
\begin{description}
\item[sequence] \ref{module_sequence} Basic sequences
\item[sequencedb] \ref{module_sequencedb} Basic sequence database
\item[protein] \ref{module_protein} Protein specific type
\item[proteindb] \ref{module_proteindb} Protein database
\item[genomic] \ref{module_genomic} Genomic specific type
\item[genomicdb] \ref{module_genomicdb} Genomic database
\item[cdna] \ref{module_cdna} Cdna specific type
\item[cdnadb] \ref{module_cdnadb} Cdna database
\end{description}

\subsection{Generic probabilistic modelling support}

\begin{description}
\item[probability] \ref{module_probability} Probability to log space conversions
\item[codon] \ref{module_codon} Codon Table support
\item[compmat] \ref{module_compmat} Protein Comparison matrix support
\item[codonmat] \ref{module_codonmat} Codon Matrix comparison matrix
\item[codonmapper] \ref{module_codonmapper} Codon bias/substitution errors support
\end{description}

\subsection{Generic Database Searching}

\begin{description}
\item[hscore] \ref{module_hscore} High Score list
\item[histogram] \ref{module_histogram} Extreme Value distribution fitting 
\item[dbimpl] \ref{module_dbimpl} Database Implementation 
\end{description}

\subsection{Generic Dynamite algorithm support}

\begin{description}
\item[aln] \ref{module_aln} Label alignments
\item[packaln] \ref{module_packaln} Raw (low level) alignments
\item[basematrix] \ref{module_basematrix} Memory management for DP implementations
\end{description}