Sophie: polyxmass-doc-0.9.0-1mdv2007.0 noarch

polyxmass-doc-0.9.0-1mdv2007.0.noarch.rpm

\chapter[\pxd] {\pxd: Definition Of Polymer Chemistries} 

\label{chap:polyxdef}

After having completed this chapter you will be able to accomplish the
very first steps needed to use the \pxm\ framework's features at best.
In order to use the program, indeed, it is required that the polymer
chemistry on which you would like to experiment be defined according
to a number of rules that will be detailed in the remaining sections
of this chapter.

The \pxd\ module is easily called by pulling down the ``\pxd'' menu
item from the \pxm program's menu. The user may accomplish two
different tasks in the \pxd module:

\begin{itemize}
\item Edit an atom definition;
\item Edit a polymer chemistry definition.
\end{itemize}

\renewcommand{\sectitle}{Editing an atom definition}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}



The editing of an atom definition is performed through the user
interface that shows up when the user selects one of the two submenu
items shown in Figure~\vref{fig:polyxdef-atomdef-menu}.


\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-atomdef-menu.png} 
  \end{center}
  \caption[\pxd atom definition menu]{\textbf{\pxd atom definition
      menu} The user might ask that an atom definition file be opened
    for editing or that a new atom definition be started empty for
    \textit{ex nihilo} editing.}
  \label{fig:polyxdef-atomdef-menu}
\end{figure}

When the user asks that an existing atom definition file be found, a
``chooser window'' shows up like the one shown on
Figure~\vref{fig:polyxdef-atomdef-open-def-wnd}.

\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-atomdef-open-def-wnd.png} 
  \end{center}
  \caption[\pxd atom definition choosing window]{\textbf{\pxd atom
      definition choosing window} The user might either select an atom
    definition already registered to the \pxm software suite (upper
    frame) or select an atom definition that is not registered (lower
    frame).}
  \label{fig:polyxdef-atomdef-open-def-wnd}
\end{figure}


When the atom definition editor shows up, the user sees an interface
that allows the addition/removal of isotopes or atoms. This interface
(Figure~\vref{fig:polyxdef-atomdef-definition-wnd}) makes it trivial
to edit to the highest level of refinement the definitions of the
atoms to be used in the \pxm software suite.

\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-atomdef-definition-wnd.png}
  \end{center}
  \caption[\pxd atom definition window]{\textbf{\pxd atom definition
      window} The atom items must contain isotope items, otherwise the
    atom does not have any ``raison d'\^etre''.}
  \label{fig:polyxdef-atomdef-definition-wnd}
\end{figure}

Using the atom definition window is absolutely easy. The main idea is
that an atom does not exist as something valuable for doing chemistry
until it does not have at least one isotope defined as part of it.
This means that to define a new atom, the \guilabel{Add Atom} button
should be clicked, which triggers the creation of a new empty item in
the treeview shown in
Figure~\vref{fig:polyxdef-atomdef-definition-wnd}. At this point, the
user must first name the new atom and give it a symbol (to edit a
cell, just click onto it, make the required editing and validate by
pressing \kbdEnterKey). Next, the user adds an isotope to that atom
item.  Clicking onto the \guilabel{Add Isotope} button will trigger
the creation of an empty isotope. The user fills the \guilabel{Mono
  Mass} monoisotopic mass field of the newly created empty item. The
same has to be done for the \guilabel{Abundance} isotopic abundance
field.

Each time a new monoisotopic mass/isotopic abundance pair is either
edited, added or removed from an atom item, the average mass of that
atom is recomputed and shown in the \guilabel{Avg Mass} atom average
mass cell.

\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-atomdef-error-check-wnd.png}
  \end{center}
  \caption[\pxd atom syntax-checking window]{\textbf{\pxd atom
      syntax-checking window} The atom items must contain isotope
    items, otherwise the atom does not have any ``raison d'\^etre''.
    Here, the syntax-checking function has found an error, and the
    message is displayed in the window overlaid onto the definition
    window}
  \label{fig:polyxdef-atomdef-error-check-wnd}
\end{figure}

The user may ---at any moment--- ask that the syntactic validity of
the atoms in the definition be checked. For that, clicking onto the
\guilabel{Check Syntax} button is enough. If something goes wrong, a
window shows up to describe the error(s) that were encountered. In our
example of Figure~\vref{fig:polyxdef-atomdef-error-check-wnd}, we see
that the syntax-checking function has detected that atom ``Carbon''
has no isotopic data whatsoever; and that is a real error, as we were
mentioning earlier.

Once the atom definition is completed, the user has to register it to
the \pxm software suite. This task is described in a later chapter
about the configuration/data files hierarchy of the \pxm software.


\renewcommand{\sectitle}{Editing a polymer chemistry definition}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

Editing a polymer chemistry definition is performed using the
carefully crafted user interface that shows up when the user selects
one of the two submenu items shown in
Figure~\ref{fig:polyxdef-polchemdef-menu}.


\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-polchemdef-menu.png} 
  \end{center}
  \caption[\pxd polymer chemistry definition menu]{\textbf{\pxd
      polymer chemistry definition menu} The user might ask that a
    polymer chemistry definition file be opened for editing or that a
    new polymer chemistry definition be started empty for \textit{ex
      nihilo} editing.}
  \label{fig:polyxdef-polchemdef-menu}
\end{figure}

When the user asks that an existing polymer chemistry definition file
be found, a ``chooser window'' shows up like the one shown on
Figure~\vref{fig:polyxdef-polchemdef-open-def-wnd}.

\begin{figure}
  \begin{center}
    \includegraphics [scale=3]
    {figures/raster/polyxdef-polchemdef-open-def-wnd.png} 
  \end{center}
  \caption[\pxd polymer chemistry definition choosing
  window]{\textbf{\pxd polymer chemistry definition choosing window}
    The user might either select a polymer chemistry definition
    already registered to the \pxm software suite (upper frame) or
    select a polymer chemistry definition that is not registered
    (lower frame).}
  \label{fig:polyxdef-polchemdef-open-def-wnd}
\end{figure}


When the polymer chemistry definition editor shows up, the user sees
an interface that allows the addition/removal of a number of chemical
items that define the polymer chemistry
(Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd}). For
example, the user might define any number of monomers to be later used
in order to create polymer sequences. Equally important is the ability
to define any kind of chemical modification
(Figure~\vref{fig:polyxdef-polchemdef-modifs-definition-wnd}). Doing
chemical or enzymatic cleavages on polymer sequences is something
rather common in experimental laboratories, and the user can model any
kind of chemical/enzymatic cleavage
(Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd}).
Also, it is of crucial importance that the user be able to define any
kind of gas phase fragmentations for his newly-defined polymer
chemistry
(Figure~\vref{fig:polyxdef-polchemdef-fragmentations-definition-wnd}).
Also, d

\begin{figure}
  \begin{center}
    \includegraphics [scale=2.9]
    {figures/raster/polyxdef-polchemdef-whole-definition-wnd.png}
  \end{center}
  \caption[\pxd polymer chemistry definition window]{\textbf{\pxd
      polymer chemistry definition window} The window lets the user
    define with great flexibility the chemical entities that
    characterize the polymer chemistry being defined. Here the monomer
    definition treeview is displayed.}
  \label{fig:polyxdef-polchemdef-whole-definition-wnd}
\end{figure}


\begin{figure}
  \begin{center}
    \includegraphics [scale=2]
    {figures/raster/polyxdef-polchemdef-modifs-definition-wnd.png}
  \end{center}
  \caption[\pxd chemical modifications definition]{\textbf{\pxd
      chemical modifications definition} The user may define any
    number of chemical modifications to be later applied to the whole
    polymer sequence or onto any individual monomer.}
  \label{fig:polyxdef-polchemdef-modifs-definition-wnd}
\end{figure}

\begin{figure}
  \begin{center}
    \includegraphics [scale=2]
    {figures/raster/polyxdef-polchemdef-cleavages-definition-wnd.png}
  \end{center}
  \caption[\pxd cleavages definition]{\textbf{\pxd cleavages
      definition} The user may define any number of chemical/enzymatic
    cleavages to be later applied to the polymer sequence.}
  \label{fig:polyxdef-polchemdef-cleavages-definition-wnd}
\end{figure}

\begin{figure}
  \begin{center}
    \includegraphics [scale=2]
    {figures/raster/polyxdef-polchemdef-fragmentations-definition-wnd.png}
  \end{center}
  \caption[\pxd fragmentations definition]{\textbf{\pxd fragmentations
      definition} The user may define any number of gas-phase
    fragmentation patterns to be later applied to the whole polymer
    sequence or onto any polymer selection (oligomer).}
  \label{fig:polyxdef-polchemdef-fragmentations-definition-wnd}
\end{figure}


Now that we have made a quick overview of what a polymer chemistry
definition looks like, we have to go through some details. 

First off, we should immediately explain what the reference to an atom
definition file is for, at the top of
Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd} (under the
label \guilabel{Atom Definition To Use}): \pxm is now able to cope
with different atom definitions. Each polymer chemistry definition
must unequivocally state what atom definition it has to work with. The
combobox list item that is shown on the figure reads \guival{basic}.
This is where the user should mention what atom definition file is to
be used for using with the polymer chemistry definition that is being
worked on. The combobox list widget lists all the available atom
definitions at the time the window was opened. In the figure it only
lists one item: \guival{basic}, which is the basic atom definition
file that is installed by the \pxm-common essential \pxm package.

Note that the user is given the opportunity to select an atom
definition file that is not yet registered to the \pxm system. To
locate such a file on disk, the user should just use the
\guilabel{Locate Atom Definition} button. Once the user chooses a file
on disk, its name will be shown in the text entry widget below the
combobox list.

Telling what atom definition file is to be used by any given polymer
chemistry definition is of primary importance, because any
mass-related computation will be performed by looking at the formulae
of each chemical entity in the polymer sequence; the transformation of
a chemical formula to a molecular mass is based upon the lookup of
what a given atom symbol should weigh. This lookup step is done by
going into the atom definition file looking for the atom of the proper
symbol, and checking what its isotope(s) is(are). \textsl{Thus, we say
  that the resolution of the \pxm mass spectrometric software suite is
  isotopic.}

It is necessary to let the \pxd module know the contents of that
selected atom definition file, so that they can be used by the polymer
chemistry definition being elaborated in this session. This is
achieved by clicking onto the \guilabel{Read Atom Definition} button.
Clicking that button triggers the parsing of the file whose name is
displayed in the text entry widget sitting right above the buttons. If
an error occurs while parsing the atom definition file, then a message
is displayed to inform the user.

It is only when the \pxd module has completed successfully the parsing
of the atom definition window (the result will be displayed in a
timeout manner in the messages text entry widget at the bottom of the
window), that the user can start defining the new polymer chemistry.
We will review that process in a detailed manner below.

The atom definition that is associated to the polymer chemistry must
be registered to the \pxm software suite at the time the polymer
chemistry definition is used. The way this association is performed
will be described in a later chapter.



\renewcommand{\sectitle}{Various Identification And Singular Data}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle} 

``Identification data'' are pieces of information that should be
defined in order to describe the polymer chemistry (these are
non-chemical pieces of information). For example, an identification
datum is the polymer chemistry definition type. ``Singular data'' are
pieces of information that are not present in more than one copy in
the polymer definition. An example of a singular datum is the string
that describes how the elongating polymer sequence should be left- or
right-capped so that it gets to its ``finished state'', after the
polymerization has terminated.

Looking at Figure~\ref{fig:polyxdef-polchemdef-whole-definition-wnd}
while reading the following paragraphs will help. This and subsequent
figures illustrate the process by which a polymer chemistry definition
``protein'' is defined.

As the reader can see, there are a number of identification and
singular data to be entered at the top of the polymer chemistry
definition window; these are described in the list below:

\begin{itemize}
\item \guilabel{Polymer Definition Type} \guival{protein} String
  describing the type of the new polymer chemistry definition being
  elaborated;
\item \guilabel {Polymer Endings' Chemistry (Caps)} Description of the
  chemical capping reaction that should happen either on the left end
  (\guilabel{Left Cap}) or on the right end (\guilabel{Right Cap}) of
  the polymer sequence, once it is successfully polymerized. As shown,
  this chemistry is divided into two pieces of information:

  \begin{itemize}
  \item \guilabel{Left Cap} \guival{+H} String describing the actform
    that should be applied to the left end of the elongating polymer
    sequence;
  \item \guilabel{Right Cap} \guival{+OH} String describing the
    actform that should be applied to the right end of the elongating
    polymer sequence;
  \end{itemize}

\item \guilabel {Maximum Number of Allowed Characters For A Monomer
    Code} \guival{1} This integer value indicates the maximum number
  of characters that may be used to describe monomer codes. See below
  for details about this critical value;
\item \guilabel {Polymer Ionization Rule} This rule describes the
  manner in which the polymer sequence should be ionized by default,
  when the mass is calculated. This rule actually holds two elements:
  \begin{itemize} 
  \item \guilabel {Actform} \guival{+H} String describing what
    chemical reaction should be applied to the polymer in order to
    ionize it.  Here we ask that all the polymer sequences of polymer
    chemistry definition ``protein'' be protonated once by default;
  \item \guilabel {Charge} \guival{1} Signed numerical value
    indicating what charge the polymer will hold once the ionization
    rule's actform has been applied to it. Here, it is asked that the
    proteins bear one positive charge after that the default
    mono-protonation mentioned above has taken place.
  \end{itemize}
\end{itemize}

Now that we have defined the identification and singular data for the
polymer, we will go on with another type of data: ``plural data''.
Conversely to what said previously about singular data, plural data
are pieces of information that can be present in more than one copy in
the polymer chemistry definition. An example of plural data is the
data pertaining to the monomers.


\renewcommand{\sectitle}{Various Plural Data}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}



\subsection*{The Monomers}



\label{sect:monomers}

The monomers are the constitutive blocks of the polymer sequence.
Their definition should be done with great care, as all the mass
calculations are based on the formulae of the defined monomers.
Remember that in our \pxm' jargon, ``monomer'' stands \emph{not} for
the molecule that you bought from the chemicals vendor in order to
synthesize the polymer; it stands for this molecule \emph{less} the
chemical group(s) that left it when the polymerization occurred. If
this sounds strange to you, you definitely should read
chapter~\vref{chap:basics-polymer-chemistry} for a detailed
explanation of the \pxm\ specialized words.

The lower part of
Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd} shows how
easy it is to define a new monomer: this is as easy as entering three
strings in each column of a row (that may be created by clicking onto
the \guilabel {Add} button). Note that none of the two \guilabel{Name}
and \guilabel{Formula} strings are limited in size.

The case of the \guilabel{Code} string is a bit more complicated and
depends on the value that is entered in the \guilabel {Maximum Number
  of Allowed Characters For A Monomer Code} field. In our example,
this value is \guival{1}, which means that we are allowed to use only
one character to describe a monomer's code. Thus, we can see in the
figure that all the monomers have a single-character code. It is
possible however, to use another value, for example 3. In this case
there is a general rule which is enforced in \pxd: 

\begin{center}
  \fbox{\parbox{0.9\textwidth}{\textsl {``The first character of a
        monomer code must be uppercase, while the remaining characters
        (if any) must be lowercase.''} That means that ---in our
      example of 3-character codes--- `A', ``Al'', ``Ala'' would be
      perfectly fine, while ``Alan'', ``AL'', `a', ``AlA'' would be
      wrong.}}
  \end{center}. 
  

The mechanism here is highly sophisticated, contrary to what may look
like, because you have to imagine what goes on in the different \pxm 
modules, in particular in the polymer sequence editor (\pxe): how are
monomer codes keyed-in if `A' and ``Ala'' are valid monomer codes in
a polymer chemistry definition? The magic is described in the chapter
about \pxe. Not conforming to the instructions above will yield unpredictable
results.


\subsection*{The Modifications}


\label{sect:modifications}

Oft-times a polymer will be modified chemically by the user. This is
especially true when the user tries to mimick polymer chemical
modifications that arise in biochemical processes, in particular
regulatory modifications, like protein phosphorylations, for example.
Indeed, a biopolymer is modified more often than not. A modification
can be a phosphorylation onto a protein residue (on an alcohol
function-bearing residue) like a seryl residue, for example, or an
acetylation onto a amino function-bearing residue, like a lysyl
residue. The \pxm\ mass spectrometry framework gives the user the
entire freedom to define any number of modifications. Let us see how;
once again, looking at
Figure~\vref{fig:polyxdef-polchemdef-modifs-definition-wnd} will help.
Indeed, this figure shows, amongst others, how a
\emph{Phosphorylation} modification is defined. Most evidently, a
modification is defined by a \guilabel{Name} string (of unlimited
length) and by an \guilabel{Actform} string (of unlimited length). The
syntax of an actform should by now be somewhat familiar to the reader.
In the \emph{Phosphorylation} case, it can be read like this:
---\textsl{``The polymer looses a proton and gains H2PO3''}.  When the
polymer is modified with this modification, its masses will change by
the mass corresponding to this ``reaction''. Of course, the fact that
the actform is written in this way is related to the fact that a
chemist always thinks in terms of ``leaving'' and ``entering'' groups.
However, a user might perfectly write ``+HPO3'' instead of
``-H+H2PO3'', or even more precisely ``-H+H3PO4-OH''. Any of these
actforms are exactly identical from a molecular mass point of view
(and thus also from the \pxm's perspective).


\subsection*{The Cleavage Specifications}

\label{sect:cleavespecif}

It is common practice ---in biopolymer chemistry, at least--- to cut a
polymer into pieces using molecular scissors like the following:

\begin{itemize}
\item proteases, for proteins;
\item nucleases, for nucleic acids;
\item glycosidases, for saccharides\dots
\end{itemize}

For each different polymer type, the molecular scissors are specific.
Indeed, a protease will not cleave a polysaccharide. The
specificity of a cleaving enzyme is thus something that should be
described in each polymer chemistry definition, since this specificity
is indeed polymer chemistry-specific. Here we show the way that the
user can define the cleavage specificity of a molecular scissor. As
usual, looking at
Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd} might
help in reading the following paragraphs.

By looking at this figure, it should be obvious that defining a
cleavage specification gets a little more involved than what we saw
earlier for modifications. This is true only for certain chemical
reagents that modify the substrate they cleave, which is not that
frequent. In the
Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd}, the
first cleavage specification is ``CyanogenBromide'' (note that there
is no space between \emph{Cyanogen} and \emph{Bromide} in the
\guilabel{Name} column entry).

Let us analyze the data entered by the user in order to fully qualify
this cleavage agent (which, conversely to the other ones listed in the
\guilabel {Name} column of the treeview shown in the figure, is not a
protease but a chemical reagent):

\begin{itemize}
\item \guilabel{Name} \guival{CyanogenBromide} This is merely the name
  of the cleavage agent;
\item \guilabel{Pattern} \guival{M/} This tells the \pxm\ framework
  where to cleave in the polymer sequence when a CyanogenBromide
  cleavage is asked. The syntax of the cleavage pattern is detailed
  below;
\item \guilabel{Left Code} and \guilabel{Left Actform} (Empty) This is
  a special case for those cleavage agents that not only cut a polymer
  sequence (usually it is a hydrolysis) but that also modify the
  substrate in such a way that must be taken into account by \pxm\ so
  that it computes correct molecular masses for the resulting
  oligomers. These rules are optional. However, if \guilabel{Left
    Code} is filled with something, then it is compulsory that
  \guilabel {Left Actform} be filled with something valid also, and
  conversely;
\item \guilabel{Right Code} and \guilabel {Right Actform} \guival{M}
  and \guival{-CH2S+O3}, respectively. Same explanation as above.
  Here, what we say is that each oligomer resulting from the cleavage
  of the polymer sequence at a `M' monomer should be modified using
  the \guilabel {Right Actform} actform. Since the cleavage occurs
  right of `M', it is logical that a `M' is found right of the
  oligomer that was generated upon a ``CyanogenBromide'' cleavage. A
  special case in which a `M' may be found at the right end of an
  oligomer, without resulting from a polymer sequence cleavage, is if
  the `M' was at the right end of the polymer sequence. Of course
  this case is evaluated and if it is found, the the actform is not
  applied.
\end{itemize}

\noindent In order to best explicate the cleavage specification
pattern syntax I shall provide below some examples:

\begin{itemize}
\item \textbf{Trypsin = K/;R/;-K/P} \textsl {``Trypsin cuts right of a
    `K' and right of a `R'. But it does not cut right of a `K' if this K
    is immediately followed by a P''};
\item \textbf{EndoAspN = /D} \textsl {``EndoAspN cuts left of a D''};
\item \textbf{Hypothetical = T/YS; PGT/HYT; /MNOP; -K/MNOP} \textsl
  {``Hypothetical cuts after `T' if it is followed by YS and also cuts
    after `T' if preceded by PG and followed by HYT. Also, Hypothetical
    cuts prior to `M' if `M' is followed by NOP and if `M' is not preceded
    by K''}.
\end{itemize}

\noindent Please, \emph{do} note that the letters above correspond to
monomer codes and \emph{not} to monomer names. If, for example, we
were defining a ``Trypsin'' cleavage specification pattern ---in a
protein polymer chemistry definition with the standard 3-character
monomer codes--- we would have defined it this way: ``Trypsin =
Lys/;Arg/;-Lys/Pro''. 

\medskip

Now comes the time to explain in more detail what the \guilabel{Left
  Code} and \guilabel {Left Actform} (along with the \guilabel {Right}
siblings) are for. For this, we shall consider that we have the
following polymer sequence (1-character monomers codes):

\[\mathrm{THIS{\bf M}WILL{\bf M}BECUT{\bf M}ANDTHAT{\bf M}ALSO}\]

If we cleave this polymer using ``CyanogenBromide'' and if the
cleavage is total,\footnote{Cleavage occurs at every possible
  position, right of each monomer `M'.} we shall get the following
oligomers:

\[\mathrm{THIS{\bf M}\ WILL{\bf M}\ BECUT{\bf M}\ ANDTHAT{\bf M}\ ALSO}\]
 
But if there is a partial cleavage, we would \emph{also} get one or
more of these oligomers: 

\[\mathrm{THISMWILL{\bf M}\ BECUTMANDTHAT{\bf M}\ ALSO\ WILLMBECUT{\bf M}\
  ANDTHATMALSO}\] and so on\dots \vspace {\baselineskip}

\noindent Now, the biochemist knows that when a protein is cleaved
with cyanogen bromide, the cleavage occurs effectively right of
monomer `M' (this we also know already) \emph{and} that the `M'
monomer that underwent the cleavage is changed from a methionyl
residue to an homoseryl residue (this chemical change involves this
actform: ``-CH2S+O''). The following two lines of oligomers should
definitely ``undergo the actform'', one time only for each oligomer:

\[\mathrm{THIS{\bf M},\ WILL{\bf M},\ BECUT{\bf M},\ ANDTHAT{\bf M}}\] 

and

\[\mathrm{THISMWILL{\bf M},\ BECUTMANDTHAT{\bf M},\ WILLMBECUT{\bf M}}\] 

while the two oligomers shown below should not ``undergo the actform''
because (even if one of them does contain a `M' monomer) the
cleavage \emph{did not occur} at a this `M' monomer:

\[\mathrm{ALSO\ ANDTHATMALSO}\]

This example should clarify why we clearly indicate --in the cleavage
specification for ``CyanogenBromide''-- that the oligomers resulting
from this cleavage should ``undergo the `-CH2S+O' actform''
\textsl{only if they have a `M' as their right end monomer code}.

This would be of crucial importance, if we had a cleavage agent that
would cleave not only right of `M' but at some other places: we
really would need to specify these rules in a careful way. For
example, imagine you had noted --in your many cyanogen bromide
experiments-- that more often than rarely cyanogen bromide would
cleave right of `C' (cysteine) residues, but with no chemical
modification of the `C' monomer.\footnote {This is a purely
  hypothetical situation that I never observed personally!} In this
case, you would be glad that the possibility is given to you to
specify that the generated oligomers should ``undergo the `-CH2S+O'
actform'' only if they have a `M' as their right end monomer, so
that `C'-terminated oligomers are not chemically modified. You would
thus safely define this pattern:~``M/;C/''\dots\ The logical
conditions that the user can set forth for a cleavage reaction are
called (in an intuitive manner) \emph{cleavage rule}s.

Now that we got trained to think in an abstract way with these
cleavage rules, we can proceed to yet meatier stuff: the fragmentation
specifications. A polymer chemistry definition can hold as many
fragmentation specifications as necessary. A fragmentation
specification holds a number of pieces of information, amongst which
there is a compound datum describing logical conditions similar to,
but more complex than cleavage rules: \emph{fragmentation rules}. Each
fragmentation specification might have zero or more (with no
limitation) fragmentation rules. We review this complex matter in the
next section.

\subsection*{The Fragmentation Specifications}

\label{sect:fragspecif}

As you might have noticed reading page~\pageref
{sect:polymer-fragmentation}, the fragmentation specification is a
tricky business.  Figure~\ref
{fig:polyxdef-polchemdef-fragmentations-definition-wnd} shows examples
of protein fragmentation specifications for fragment types \emph{a, b,
  c, z, y, x, imm}.

Let's concentrate on the fragmentation specification of type \emph{a}.
While the first row of this fragmentation specification is effectively
valid (for a ``protein'' polymer chemistry definition, at least), the
lower two rows (describing fragmentation rules named \emph {a-fgr-1}
and \emph {a-fgr-2}) are fake, only to show the way fully qualified
fragmentation specifications can be created.

Let us analyze the data that the user entered to fully qualify this
\emph {a} fragmentation specification:

\begin{itemize}
\item \guilabel{Name} \guival{a} This is the name of the fragmentation
  specification. Fragments obtained with this specification will be
  named according to the following naming scheme: ``a-\emph{i}'', with
  \emph{a} being the fragmentation name and \emph{i} being the
  position --in the precursor polymer ion-- of the monomer at which
  the fragmentation occurred (see page~\pageref
  {sect:polymer-fragmentation});
\item \guilabel {End} \guival{LE} This is the end of the precursor
  polymer that is to be found in the fragment. Accepted values are
  ``LE'' (left end), ``RE'' (right end) and ``NE'' (no end). We have
  previously seen --for proteins and nucleic acids-- that fragments
  \emph{a, b, c} include the left end (``LE'') of the precursor
  polymer, while ``RE'' applies to fragmentation specifications that
  lead to fragments that contain the right end of the precursor
  polymer (for example, fragments \emph{x, y, z}). Special cases, like
  proteinaceous immonium ions, do not bear any end of the precursor
  polymer, in which case ``NE'' (for no end) should be written here
  instead of ``LE'').
  
  This \guilabel {End} piece of information is important for two
  reasons: 1) because it tells the fragmentation engine from which end
  it should iterate (in the precursor polymer sequence) when making
  all the fragments of a given fragment ion series and 2) because it
  guides \pxm\ to apply the conventional naming scheme using \emph{i}
  with the proper value.  Therefore, the smallest fragment of the
  \emph{a} series is \emph{a-\textbf{1}} (note subscript 1), which is
  the left end monomer of the precursor polymer. The smallest fragment
  of the \emph{x} series is \emph{x-\textbf{1}} (note that subscript
  is also 1). This time, the \emph{x-\textbf{1}} fragment, however,
  corresponds to the right end monomer of the polymer sequence. This
  is because the numbering of the fragments always starts at the
  precursor polymer's end that was specified by the \guilabel {End}
  piece of data from the polymer chemistry definition;
\item \guilabel{Actform} \guival{-C1O1} Optional. This is the chemical
  reaction that will actually change a monomer chain into the proper
  fragment. Indeed, the mass calculation of the fragment's mass is
  performed by summing the mass of the monomers running from the
  \emph{end} of the precursor polymer up to the position where the
  fragmentation occurs, plus adding the mass of the end's cap as
  specified in the polymer chemistry definition. But, for the \emph{a}
  fragments, this is not enough, as it does not lead to a correct
  mass. It is required that the actform ``-C1O1'' be applied to the
  monomer chain so that it is of the correct mass (after having added
  the mass corresponding to the left cap; see below). This actform is
  optional, because for some fragments (for example, fragments \emph
  {b} in the protein polymer chemistry) there is no need for any
  actform besides adding the masses of the monomers and adding the
  mass corresponding to the left cap of the polymer chemistry
  definition. As can be seen on the picture, ``-H0'' is set as an
  actform for \emph {b} fragments. Again, see
  page~\pageref{sect:polymer-fragmentation};
\item \guilabel {Comment} (Empty) Optional. This is simply a comment,
  if the user wants to set any. \textit{Ad libitum}.
\end{itemize}

\noindent A fragmentation specification can include zero or more
fragmentation rule(s) that help model --in a highly detailed manner--
complex fragmentation patterns. Let's see what it takes to define a
fragmentation rule:

\begin{itemize}
\item \guilabel {Name} \guival{a-fgr-1} This is the name of the
  fragmentation rule.  It should be self-explanatory and should
  somehow provide a hint to the fact that this fragrule belongs to the
  \emph{a} fragmentation specification;
\item \guilabel {Prev} \guival{E} Optional. This is one of the logical
  conditions that can be set to be verified so that the actform can be
  applied to the fragment currently generated. In our example, we are
  saying that if --in the precursor ion sequence-- the monomer
  preceeding the one that is currently fragmented is of code `E',
  then this condition is verified and the \guival{+H200} actform
  should be applied to the resulting fragment;
\item \guilabel{This} \guival{D} Optional. This is an analogous
  condition as the one above, unless the monomer onto which this
  condition applies is the monomer being actually fragmented;
\item \guilabel{Next} \guival{F} Optional. This is similar condition,
  unless that it applies to the monomer that is one position forward
  in the precursor ion sequence, with respect to the presently
  fragmented position;
\item \guilabel{Actform} \guival{+H200} This is the chemical action
  with which the fragment will actually be challenged if the set of
  logical conditions above is verified. This actform is the
  \emph{raison d'\^etre} of the fragmentation rule, so it is
  compulsory;
\item \guilabel{comment} \guival{comment here!} Optional.  \textit{Ad
    libitum}.
\end{itemize}

A fragmentation rule is a set of one or more logical conditions that
(if verified) determine a user-specified chemical actform to be
applied to the fragment that was generated in the first place by
fragmenting the precursor polymer using the fragmentation
specification to which the fragrule itself belongs. As can be seen in
the example figure, the fragmentation specification for fragments
\emph{a} (fragmentation specification \emph{a}) contains two
fragmentation rules, but it could have contained as many of them as
necessary to finely describe experimentally observed fragmentation
events\dots

The following paragraph will explain thoroughly how fragmentation
rules modify the way fragments are generated, for a given
fragmentation pattern.

We have seen, in our example of a fragmentation specification named
\emph {a}
(Figure~\ref{fig:polyxdef-polchemdef-fragmentations-definition-wnd}),
that it should generate fragments starting from the left end of the
precursor polymer. Now we see that the fragmentation specification
includes a fragmentation rule: \guilabel {This} is set to `D', which
means that this fragmentation rule is evaluated further \emph{only} if
the monomer currently fragmented is indeed a `D'. If not, the whole
fragmentation rule is skipped. If \guilabel {Prev} is set to something
(for us: `E'), then the fragmentation rule is evaluated further only
if the monomer at position [current -1] is a `E'. If not, the
fragmentation rule is skipped. If \guilabel {Next} is set to something
(for us: `F'), then the fragmentation rule is evaluated further only
if the monomer at position [current +1] is a `F'. If not, the
fragmentation rule is skipped.

What is called a position \textbf{[current +1]} and a position
\textbf{[current -1]} depends on the kind of fragmentation
specification: if the fragmentation specification states that
\guilabel {End} (seen earlier) is ``LE'' (or ``NE''), then the
position [current +1] refers to the position right of the currently
fragmented monomer (in the standard left-to-right polar horizontal
representation of a polymer); if the fragmentation specification
states that \guilabel{End} is ``RE'', then the position [current +1]
refers to the position left of the currently fragmented monomer. This
has to do with the way the fragmentations are normally described: the
fragment numbering scheme starts at the right end of the precursor
polymer for ``RE'' fragments and at the left end of the precursor
polymer for ``LE'' fragments.  This is also true here: for a fragment
of the series \emph{a}, the fragmentation rule that we have described
would effectively be applied to the following sequence:

\[\mathrm {MYNAMEIS\textbf {EDF}FIL}\] 

\emph{only} upon generation of the $\mathrm{MYNAMEIS\textbf{ED}}$
fragment. 

\bigskip\noindent If we were using the same fragmentation rule for a
fragment of the series \emph{x} (for which \guilabel {End} is ``RE''),
the fragmentation rule would never have been evaluated.  Instead, for
the following sequence:

\[\mathrm {MYNAMEIS\textbf {FDE}FIL}\] 

it would have, and thus would have generated the fragment $\mathrm
{\textbf {EDF}IL}$.

\bigskip\noindent Now, what about internal fragment specifications,
like the immonium ions' case, where the \guilabel {End} is defined to
be ``NE'' in the polymer chemistry definition? \pxm\ evaluates the
conditions from left to right; so the conditions are evaluated like
for ``LE'' cases.

Another important thing to figure out: how are the logical conditions
tested? The main condition (entered as \guilabel {This}) is evaluated
first, because this is the simplest evaluation: the value of the
\guilabel {This} monomer can be compared with the currently fragmented
monomer code without depending on the \guilabel {End} value. If the
monomer context complies with this condition (in our example that
would mean that we are actually fragmenting at a `D' monomer), other
conditions (if any) are evaluated. Thus, in logic terminology the
conditions are \emph{AND}ed one with the other: as soon as a condition
is stated it must be verified.  If \emph{any} condition is not
verified, no fragment is created and the other fragmentation rules are
analysed (if any).

If there are more than one fragmentation rule in a fragmentation
specification, each fragmentation rule is evaluated separately. If the
monomeric context (previous/this/next monomer codes) complies with the
logical conditions stated in the evaluated fragmentation rule, a new
fragment is generated. When a fragmentation rule is found not to
comply with the monomer context, then it is simply skipped (no
fragment is generated).

It should be noted that the presence of a fragmentation rule in a
fragmentation specification is not exclusive, in the sense that if the
fragmentation rule contains never satisfied logical
condition(s),\footnote{Such as if ``this monomer's code'' is `Y',
  ``next monomer's code'' is `Y' and ``previous monomer's code'' is
  `Y' and there is no ``YYY'' sequence element in the polymer, for
  example.} a single fragment is indeed generated, which corresponds
to the fragmentation specification without taking into account any
framentation rule.

The fact that each fragmentation rule --that has logical conditions
which are verified in the sequence-- yields a new fragment implies
that the fragmentation rules are not summative: a fragment is not
generated by applying onto it the actform of each validated
fragmentation rule in a fragmentation specification. Each
fragmentation rule, in a given fragmentation specification, gives rise
to a fragment that is a fragment ion resulting from the application of
both the actform specified in the fragmentation specification (if any)
and the actform specified in the fragmentation rule (this one is
compulsory).  Next, when another fragmentation rule of the same
fragmentation specification is evaluated, a brand new fragment is
generated according to the same process as the one just described.

As an example of how the fragmentation rules might be added to a given
fragmentation specification, let's take the example of the `a' and
``a-B'' fragments from the oligonucleotide chemistry. The `a'
fragments might exist \textit{per se} ; however, it might happen that
a further fragmentation event removes the nucleobase off the
nucleotide that undergoes the fragmentation. The generated fragments
are called
``a-B''. Let us see how we could model this chemistry: \\

First look at the conventional `a' fragmentation pattern: \\

\begin{verbatim}
  <fragspecs>
    <fgs>
      <name>a</name>
      <end>LE</end>
      <actform>-O</actform>
    </fgs>
\end{verbatim}

And now look at the ``a-B'' abasic corresponding fragmentation
pattern: \\

\begin{verbatim}
    <fgs>
      <name>a-B</name>
      <end>LE</end>
      <actform>-O</actform>
      <fgr>
        <name>a-B-c</name>
        <actform>-C4H4N3O</actform>
        <this-mnm-code>C</this-mnm-code>
        <comment>a-cytosine</comment>
      </fgr>
      <fgr>
        <name>a-B-a</name>
        <actform>-C5H4N5</actform>
        <this-mnm-code>A</this-mnm-code>
        <comment>a-adenine</comment>
      </fgr>
      <fgr>
        <name>a-B-t</name>
        <actform>-C5H5N2O2</actform>
        <this-mnm-code>T</this-mnm-code>
        <comment>a-thymine</comment>
      </fgr>
      <fgr>
        <name>a-B-g</name>
        <actform>-C5H5N5O</actform>
        <this-mnm-code>G</this-mnm-code>
        <comment>a-guanine</comment>
      </fgr>
      <comment>abasic a fragment</comment>
    </fgs>
\end{verbatim}

For each of the four bases, the model instructs the fragmentation
engine to remove the formula of the base (minus a proton) if the
fragmentation occurs precisely at such a base in the oligonucleotide.


\renewcommand{\sectitle}{Saving A Polymer Chemistry Definition}
\section*{\sectitle}
\addcontentsline{toc}{section}{\numberline{}\sectitle}

\label{sect:save-polymer-chemistry-definiton}

Once the polymer chemistry definition is completed, the user can save
it to a file. Prior to actually writing to the file, the program
checks the syntax validity of the elements that the user has entered
in the window. This check can be triggered manually by clicking onto
the \guilabel{Check Syntax} button. If an error is found in the
polymer chemistry definition being worked on, that error is displayed
in a window so that the user may identify the problem and fix it. 

When saving a polymer chemistry definition to a file, if no error is
detected, the program proceeds with writing the polymer chemistry
definition to an \fileformat {XML} file.

The location where the file should be saved, and the manner that it
may be made available to the whole \pxm framework is to be described
in a later chapter. 

In that chapter, the user will be instructed on how to insure that the
newly-made polymer chemistry definition uses the proper atom
definition. 

Indeed, \pxm is a rather powerful framework, wholly designed to be
modular. But this modularity and power have a cost: complexity. A well
configured system is the key to a powerful program running smoothly.
It is thus very important to grasp the \pxm framework configuration
data hierarchy so that the program knows at each instant where to find
the configuration and data files required to perform properly both the
polymer sequence display and the mass calculations.

But for now go on with the polymer chemistry definition-aware
calculator: \pxc!

\cleardoublepage


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "polyxmass"
%%% End: