\chapter[\pxd] {\pxd: Definition Of Polymer Chemistries} \label{chap:polyxdef} After having completed this chapter you will be able to accomplish the very first steps needed to use the \pxm\ framework's features at best. In order to use the program, indeed, it is required that the polymer chemistry on which you would like to experiment be defined according to a number of rules that will be detailed in the remaining sections of this chapter. The \pxd\ module is easily called by pulling down the ``\pxd'' menu item from the \pxm program's menu. The user may accomplish two different tasks in the \pxd module: \begin{itemize} \item Edit an atom definition; \item Edit a polymer chemistry definition. \end{itemize} \renewcommand{\sectitle}{Editing an atom definition} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} The editing of an atom definition is performed through the user interface that shows up when the user selects one of the two submenu items shown in Figure~\vref{fig:polyxdef-atomdef-menu}. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-atomdef-menu.png} \end{center} \caption[\pxd atom definition menu]{\textbf{\pxd atom definition menu} The user might ask that an atom definition file be opened for editing or that a new atom definition be started empty for \textit{ex nihilo} editing.} \label{fig:polyxdef-atomdef-menu} \end{figure} When the user asks that an existing atom definition file be found, a ``chooser window'' shows up like the one shown on Figure~\vref{fig:polyxdef-atomdef-open-def-wnd}. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-atomdef-open-def-wnd.png} \end{center} \caption[\pxd atom definition choosing window]{\textbf{\pxd atom definition choosing window} The user might either select an atom definition already registered to the \pxm software suite (upper frame) or select an atom definition that is not registered (lower frame).} \label{fig:polyxdef-atomdef-open-def-wnd} \end{figure} When the atom definition editor shows up, the user sees an interface that allows the addition/removal of isotopes or atoms. This interface (Figure~\vref{fig:polyxdef-atomdef-definition-wnd}) makes it trivial to edit to the highest level of refinement the definitions of the atoms to be used in the \pxm software suite. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-atomdef-definition-wnd.png} \end{center} \caption[\pxd atom definition window]{\textbf{\pxd atom definition window} The atom items must contain isotope items, otherwise the atom does not have any ``raison d'\^etre''.} \label{fig:polyxdef-atomdef-definition-wnd} \end{figure} Using the atom definition window is absolutely easy. The main idea is that an atom does not exist as something valuable for doing chemistry until it does not have at least one isotope defined as part of it. This means that to define a new atom, the \guilabel{Add Atom} button should be clicked, which triggers the creation of a new empty item in the treeview shown in Figure~\vref{fig:polyxdef-atomdef-definition-wnd}. At this point, the user must first name the new atom and give it a symbol (to edit a cell, just click onto it, make the required editing and validate by pressing \kbdEnterKey). Next, the user adds an isotope to that atom item. Clicking onto the \guilabel{Add Isotope} button will trigger the creation of an empty isotope. The user fills the \guilabel{Mono Mass} monoisotopic mass field of the newly created empty item. The same has to be done for the \guilabel{Abundance} isotopic abundance field. Each time a new monoisotopic mass/isotopic abundance pair is either edited, added or removed from an atom item, the average mass of that atom is recomputed and shown in the \guilabel{Avg Mass} atom average mass cell. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-atomdef-error-check-wnd.png} \end{center} \caption[\pxd atom syntax-checking window]{\textbf{\pxd atom syntax-checking window} The atom items must contain isotope items, otherwise the atom does not have any ``raison d'\^etre''. Here, the syntax-checking function has found an error, and the message is displayed in the window overlaid onto the definition window} \label{fig:polyxdef-atomdef-error-check-wnd} \end{figure} The user may ---at any moment--- ask that the syntactic validity of the atoms in the definition be checked. For that, clicking onto the \guilabel{Check Syntax} button is enough. If something goes wrong, a window shows up to describe the error(s) that were encountered. In our example of Figure~\vref{fig:polyxdef-atomdef-error-check-wnd}, we see that the syntax-checking function has detected that atom ``Carbon'' has no isotopic data whatsoever; and that is a real error, as we were mentioning earlier. Once the atom definition is completed, the user has to register it to the \pxm software suite. This task is described in a later chapter about the configuration/data files hierarchy of the \pxm software. \renewcommand{\sectitle}{Editing a polymer chemistry definition} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} Editing a polymer chemistry definition is performed using the carefully crafted user interface that shows up when the user selects one of the two submenu items shown in Figure~\ref{fig:polyxdef-polchemdef-menu}. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-polchemdef-menu.png} \end{center} \caption[\pxd polymer chemistry definition menu]{\textbf{\pxd polymer chemistry definition menu} The user might ask that a polymer chemistry definition file be opened for editing or that a new polymer chemistry definition be started empty for \textit{ex nihilo} editing.} \label{fig:polyxdef-polchemdef-menu} \end{figure} When the user asks that an existing polymer chemistry definition file be found, a ``chooser window'' shows up like the one shown on Figure~\vref{fig:polyxdef-polchemdef-open-def-wnd}. \begin{figure} \begin{center} \includegraphics [scale=3] {figures/raster/polyxdef-polchemdef-open-def-wnd.png} \end{center} \caption[\pxd polymer chemistry definition choosing window]{\textbf{\pxd polymer chemistry definition choosing window} The user might either select a polymer chemistry definition already registered to the \pxm software suite (upper frame) or select a polymer chemistry definition that is not registered (lower frame).} \label{fig:polyxdef-polchemdef-open-def-wnd} \end{figure} When the polymer chemistry definition editor shows up, the user sees an interface that allows the addition/removal of a number of chemical items that define the polymer chemistry (Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd}). For example, the user might define any number of monomers to be later used in order to create polymer sequences. Equally important is the ability to define any kind of chemical modification (Figure~\vref{fig:polyxdef-polchemdef-modifs-definition-wnd}). Doing chemical or enzymatic cleavages on polymer sequences is something rather common in experimental laboratories, and the user can model any kind of chemical/enzymatic cleavage (Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd}). Also, it is of crucial importance that the user be able to define any kind of gas phase fragmentations for his newly-defined polymer chemistry (Figure~\vref{fig:polyxdef-polchemdef-fragmentations-definition-wnd}). Also, d \begin{figure} \begin{center} \includegraphics [scale=2.9] {figures/raster/polyxdef-polchemdef-whole-definition-wnd.png} \end{center} \caption[\pxd polymer chemistry definition window]{\textbf{\pxd polymer chemistry definition window} The window lets the user define with great flexibility the chemical entities that characterize the polymer chemistry being defined. Here the monomer definition treeview is displayed.} \label{fig:polyxdef-polchemdef-whole-definition-wnd} \end{figure} \begin{figure} \begin{center} \includegraphics [scale=2] {figures/raster/polyxdef-polchemdef-modifs-definition-wnd.png} \end{center} \caption[\pxd chemical modifications definition]{\textbf{\pxd chemical modifications definition} The user may define any number of chemical modifications to be later applied to the whole polymer sequence or onto any individual monomer.} \label{fig:polyxdef-polchemdef-modifs-definition-wnd} \end{figure} \begin{figure} \begin{center} \includegraphics [scale=2] {figures/raster/polyxdef-polchemdef-cleavages-definition-wnd.png} \end{center} \caption[\pxd cleavages definition]{\textbf{\pxd cleavages definition} The user may define any number of chemical/enzymatic cleavages to be later applied to the polymer sequence.} \label{fig:polyxdef-polchemdef-cleavages-definition-wnd} \end{figure} \begin{figure} \begin{center} \includegraphics [scale=2] {figures/raster/polyxdef-polchemdef-fragmentations-definition-wnd.png} \end{center} \caption[\pxd fragmentations definition]{\textbf{\pxd fragmentations definition} The user may define any number of gas-phase fragmentation patterns to be later applied to the whole polymer sequence or onto any polymer selection (oligomer).} \label{fig:polyxdef-polchemdef-fragmentations-definition-wnd} \end{figure} Now that we have made a quick overview of what a polymer chemistry definition looks like, we have to go through some details. First off, we should immediately explain what the reference to an atom definition file is for, at the top of Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd} (under the label \guilabel{Atom Definition To Use}): \pxm is now able to cope with different atom definitions. Each polymer chemistry definition must unequivocally state what atom definition it has to work with. The combobox list item that is shown on the figure reads \guival{basic}. This is where the user should mention what atom definition file is to be used for using with the polymer chemistry definition that is being worked on. The combobox list widget lists all the available atom definitions at the time the window was opened. In the figure it only lists one item: \guival{basic}, which is the basic atom definition file that is installed by the \pxm-common essential \pxm package. Note that the user is given the opportunity to select an atom definition file that is not yet registered to the \pxm system. To locate such a file on disk, the user should just use the \guilabel{Locate Atom Definition} button. Once the user chooses a file on disk, its name will be shown in the text entry widget below the combobox list. Telling what atom definition file is to be used by any given polymer chemistry definition is of primary importance, because any mass-related computation will be performed by looking at the formulae of each chemical entity in the polymer sequence; the transformation of a chemical formula to a molecular mass is based upon the lookup of what a given atom symbol should weigh. This lookup step is done by going into the atom definition file looking for the atom of the proper symbol, and checking what its isotope(s) is(are). \textsl{Thus, we say that the resolution of the \pxm mass spectrometric software suite is isotopic.} It is necessary to let the \pxd module know the contents of that selected atom definition file, so that they can be used by the polymer chemistry definition being elaborated in this session. This is achieved by clicking onto the \guilabel{Read Atom Definition} button. Clicking that button triggers the parsing of the file whose name is displayed in the text entry widget sitting right above the buttons. If an error occurs while parsing the atom definition file, then a message is displayed to inform the user. It is only when the \pxd module has completed successfully the parsing of the atom definition window (the result will be displayed in a timeout manner in the messages text entry widget at the bottom of the window), that the user can start defining the new polymer chemistry. We will review that process in a detailed manner below. The atom definition that is associated to the polymer chemistry must be registered to the \pxm software suite at the time the polymer chemistry definition is used. The way this association is performed will be described in a later chapter. \renewcommand{\sectitle}{Various Identification And Singular Data} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} ``Identification data'' are pieces of information that should be defined in order to describe the polymer chemistry (these are non-chemical pieces of information). For example, an identification datum is the polymer chemistry definition type. ``Singular data'' are pieces of information that are not present in more than one copy in the polymer definition. An example of a singular datum is the string that describes how the elongating polymer sequence should be left- or right-capped so that it gets to its ``finished state'', after the polymerization has terminated. Looking at Figure~\ref{fig:polyxdef-polchemdef-whole-definition-wnd} while reading the following paragraphs will help. This and subsequent figures illustrate the process by which a polymer chemistry definition ``protein'' is defined. As the reader can see, there are a number of identification and singular data to be entered at the top of the polymer chemistry definition window; these are described in the list below: \begin{itemize} \item \guilabel{Polymer Definition Type} \guival{protein} String describing the type of the new polymer chemistry definition being elaborated; \item \guilabel {Polymer Endings' Chemistry (Caps)} Description of the chemical capping reaction that should happen either on the left end (\guilabel{Left Cap}) or on the right end (\guilabel{Right Cap}) of the polymer sequence, once it is successfully polymerized. As shown, this chemistry is divided into two pieces of information: \begin{itemize} \item \guilabel{Left Cap} \guival{+H} String describing the actform that should be applied to the left end of the elongating polymer sequence; \item \guilabel{Right Cap} \guival{+OH} String describing the actform that should be applied to the right end of the elongating polymer sequence; \end{itemize} \item \guilabel {Maximum Number of Allowed Characters For A Monomer Code} \guival{1} This integer value indicates the maximum number of characters that may be used to describe monomer codes. See below for details about this critical value; \item \guilabel {Polymer Ionization Rule} This rule describes the manner in which the polymer sequence should be ionized by default, when the mass is calculated. This rule actually holds two elements: \begin{itemize} \item \guilabel {Actform} \guival{+H} String describing what chemical reaction should be applied to the polymer in order to ionize it. Here we ask that all the polymer sequences of polymer chemistry definition ``protein'' be protonated once by default; \item \guilabel {Charge} \guival{1} Signed numerical value indicating what charge the polymer will hold once the ionization rule's actform has been applied to it. Here, it is asked that the proteins bear one positive charge after that the default mono-protonation mentioned above has taken place. \end{itemize} \end{itemize} Now that we have defined the identification and singular data for the polymer, we will go on with another type of data: ``plural data''. Conversely to what said previously about singular data, plural data are pieces of information that can be present in more than one copy in the polymer chemistry definition. An example of plural data is the data pertaining to the monomers. \renewcommand{\sectitle}{Various Plural Data} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \subsection*{The Monomers} \label{sect:monomers} The monomers are the constitutive blocks of the polymer sequence. Their definition should be done with great care, as all the mass calculations are based on the formulae of the defined monomers. Remember that in our \pxm' jargon, ``monomer'' stands \emph{not} for the molecule that you bought from the chemicals vendor in order to synthesize the polymer; it stands for this molecule \emph{less} the chemical group(s) that left it when the polymerization occurred. If this sounds strange to you, you definitely should read chapter~\vref{chap:basics-polymer-chemistry} for a detailed explanation of the \pxm\ specialized words. The lower part of Figure~\vref{fig:polyxdef-polchemdef-whole-definition-wnd} shows how easy it is to define a new monomer: this is as easy as entering three strings in each column of a row (that may be created by clicking onto the \guilabel {Add} button). Note that none of the two \guilabel{Name} and \guilabel{Formula} strings are limited in size. The case of the \guilabel{Code} string is a bit more complicated and depends on the value that is entered in the \guilabel {Maximum Number of Allowed Characters For A Monomer Code} field. In our example, this value is \guival{1}, which means that we are allowed to use only one character to describe a monomer's code. Thus, we can see in the figure that all the monomers have a single-character code. It is possible however, to use another value, for example 3. In this case there is a general rule which is enforced in \pxd: \begin{center} \fbox{\parbox{0.9\textwidth}{\textsl {``The first character of a monomer code must be uppercase, while the remaining characters (if any) must be lowercase.''} That means that ---in our example of 3-character codes--- `A', ``Al'', ``Ala'' would be perfectly fine, while ``Alan'', ``AL'', `a', ``AlA'' would be wrong.}} \end{center}. The mechanism here is highly sophisticated, contrary to what may look like, because you have to imagine what goes on in the different \pxm modules, in particular in the polymer sequence editor (\pxe): how are monomer codes keyed-in if `A' and ``Ala'' are valid monomer codes in a polymer chemistry definition? The magic is described in the chapter about \pxe. Not conforming to the instructions above will yield unpredictable results. \subsection*{The Modifications} \label{sect:modifications} Oft-times a polymer will be modified chemically by the user. This is especially true when the user tries to mimick polymer chemical modifications that arise in biochemical processes, in particular regulatory modifications, like protein phosphorylations, for example. Indeed, a biopolymer is modified more often than not. A modification can be a phosphorylation onto a protein residue (on an alcohol function-bearing residue) like a seryl residue, for example, or an acetylation onto a amino function-bearing residue, like a lysyl residue. The \pxm\ mass spectrometry framework gives the user the entire freedom to define any number of modifications. Let us see how; once again, looking at Figure~\vref{fig:polyxdef-polchemdef-modifs-definition-wnd} will help. Indeed, this figure shows, amongst others, how a \emph{Phosphorylation} modification is defined. Most evidently, a modification is defined by a \guilabel{Name} string (of unlimited length) and by an \guilabel{Actform} string (of unlimited length). The syntax of an actform should by now be somewhat familiar to the reader. In the \emph{Phosphorylation} case, it can be read like this: ---\textsl{``The polymer looses a proton and gains H2PO3''}. When the polymer is modified with this modification, its masses will change by the mass corresponding to this ``reaction''. Of course, the fact that the actform is written in this way is related to the fact that a chemist always thinks in terms of ``leaving'' and ``entering'' groups. However, a user might perfectly write ``+HPO3'' instead of ``-H+H2PO3'', or even more precisely ``-H+H3PO4-OH''. Any of these actforms are exactly identical from a molecular mass point of view (and thus also from the \pxm's perspective). \subsection*{The Cleavage Specifications} \label{sect:cleavespecif} It is common practice ---in biopolymer chemistry, at least--- to cut a polymer into pieces using molecular scissors like the following: \begin{itemize} \item proteases, for proteins; \item nucleases, for nucleic acids; \item glycosidases, for saccharides\dots \end{itemize} For each different polymer type, the molecular scissors are specific. Indeed, a protease will not cleave a polysaccharide. The specificity of a cleaving enzyme is thus something that should be described in each polymer chemistry definition, since this specificity is indeed polymer chemistry-specific. Here we show the way that the user can define the cleavage specificity of a molecular scissor. As usual, looking at Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd} might help in reading the following paragraphs. By looking at this figure, it should be obvious that defining a cleavage specification gets a little more involved than what we saw earlier for modifications. This is true only for certain chemical reagents that modify the substrate they cleave, which is not that frequent. In the Figure~\vref{fig:polyxdef-polchemdef-cleavages-definition-wnd}, the first cleavage specification is ``CyanogenBromide'' (note that there is no space between \emph{Cyanogen} and \emph{Bromide} in the \guilabel{Name} column entry). Let us analyze the data entered by the user in order to fully qualify this cleavage agent (which, conversely to the other ones listed in the \guilabel {Name} column of the treeview shown in the figure, is not a protease but a chemical reagent): \begin{itemize} \item \guilabel{Name} \guival{CyanogenBromide} This is merely the name of the cleavage agent; \item \guilabel{Pattern} \guival{M/} This tells the \pxm\ framework where to cleave in the polymer sequence when a CyanogenBromide cleavage is asked. The syntax of the cleavage pattern is detailed below; \item \guilabel{Left Code} and \guilabel{Left Actform} (Empty) This is a special case for those cleavage agents that not only cut a polymer sequence (usually it is a hydrolysis) but that also modify the substrate in such a way that must be taken into account by \pxm\ so that it computes correct molecular masses for the resulting oligomers. These rules are optional. However, if \guilabel{Left Code} is filled with something, then it is compulsory that \guilabel {Left Actform} be filled with something valid also, and conversely; \item \guilabel{Right Code} and \guilabel {Right Actform} \guival{M} and \guival{-CH2S+O3}, respectively. Same explanation as above. Here, what we say is that each oligomer resulting from the cleavage of the polymer sequence at a `M' monomer should be modified using the \guilabel {Right Actform} actform. Since the cleavage occurs right of `M', it is logical that a `M' is found right of the oligomer that was generated upon a ``CyanogenBromide'' cleavage. A special case in which a `M' may be found at the right end of an oligomer, without resulting from a polymer sequence cleavage, is if the `M' was at the right end of the polymer sequence. Of course this case is evaluated and if it is found, the the actform is not applied. \end{itemize} \noindent In order to best explicate the cleavage specification pattern syntax I shall provide below some examples: \begin{itemize} \item \textbf{Trypsin = K/;R/;-K/P} \textsl {``Trypsin cuts right of a `K' and right of a `R'. But it does not cut right of a `K' if this K is immediately followed by a P''}; \item \textbf{EndoAspN = /D} \textsl {``EndoAspN cuts left of a D''}; \item \textbf{Hypothetical = T/YS; PGT/HYT; /MNOP; -K/MNOP} \textsl {``Hypothetical cuts after `T' if it is followed by YS and also cuts after `T' if preceded by PG and followed by HYT. Also, Hypothetical cuts prior to `M' if `M' is followed by NOP and if `M' is not preceded by K''}. \end{itemize} \noindent Please, \emph{do} note that the letters above correspond to monomer codes and \emph{not} to monomer names. If, for example, we were defining a ``Trypsin'' cleavage specification pattern ---in a protein polymer chemistry definition with the standard 3-character monomer codes--- we would have defined it this way: ``Trypsin = Lys/;Arg/;-Lys/Pro''. \medskip Now comes the time to explain in more detail what the \guilabel{Left Code} and \guilabel {Left Actform} (along with the \guilabel {Right} siblings) are for. For this, we shall consider that we have the following polymer sequence (1-character monomers codes): \[\mathrm{THIS{\bf M}WILL{\bf M}BECUT{\bf M}ANDTHAT{\bf M}ALSO}\] If we cleave this polymer using ``CyanogenBromide'' and if the cleavage is total,\footnote{Cleavage occurs at every possible position, right of each monomer `M'.} we shall get the following oligomers: \[\mathrm{THIS{\bf M}\ WILL{\bf M}\ BECUT{\bf M}\ ANDTHAT{\bf M}\ ALSO}\] But if there is a partial cleavage, we would \emph{also} get one or more of these oligomers: \[\mathrm{THISMWILL{\bf M}\ BECUTMANDTHAT{\bf M}\ ALSO\ WILLMBECUT{\bf M}\ ANDTHATMALSO}\] and so on\dots \vspace {\baselineskip} \noindent Now, the biochemist knows that when a protein is cleaved with cyanogen bromide, the cleavage occurs effectively right of monomer `M' (this we also know already) \emph{and} that the `M' monomer that underwent the cleavage is changed from a methionyl residue to an homoseryl residue (this chemical change involves this actform: ``-CH2S+O''). The following two lines of oligomers should definitely ``undergo the actform'', one time only for each oligomer: \[\mathrm{THIS{\bf M},\ WILL{\bf M},\ BECUT{\bf M},\ ANDTHAT{\bf M}}\] and \[\mathrm{THISMWILL{\bf M},\ BECUTMANDTHAT{\bf M},\ WILLMBECUT{\bf M}}\] while the two oligomers shown below should not ``undergo the actform'' because (even if one of them does contain a `M' monomer) the cleavage \emph{did not occur} at a this `M' monomer: \[\mathrm{ALSO\ ANDTHATMALSO}\] This example should clarify why we clearly indicate --in the cleavage specification for ``CyanogenBromide''-- that the oligomers resulting from this cleavage should ``undergo the `-CH2S+O' actform'' \textsl{only if they have a `M' as their right end monomer code}. This would be of crucial importance, if we had a cleavage agent that would cleave not only right of `M' but at some other places: we really would need to specify these rules in a careful way. For example, imagine you had noted --in your many cyanogen bromide experiments-- that more often than rarely cyanogen bromide would cleave right of `C' (cysteine) residues, but with no chemical modification of the `C' monomer.\footnote {This is a purely hypothetical situation that I never observed personally!} In this case, you would be glad that the possibility is given to you to specify that the generated oligomers should ``undergo the `-CH2S+O' actform'' only if they have a `M' as their right end monomer, so that `C'-terminated oligomers are not chemically modified. You would thus safely define this pattern:~``M/;C/''\dots\ The logical conditions that the user can set forth for a cleavage reaction are called (in an intuitive manner) \emph{cleavage rule}s. Now that we got trained to think in an abstract way with these cleavage rules, we can proceed to yet meatier stuff: the fragmentation specifications. A polymer chemistry definition can hold as many fragmentation specifications as necessary. A fragmentation specification holds a number of pieces of information, amongst which there is a compound datum describing logical conditions similar to, but more complex than cleavage rules: \emph{fragmentation rules}. Each fragmentation specification might have zero or more (with no limitation) fragmentation rules. We review this complex matter in the next section. \subsection*{The Fragmentation Specifications} \label{sect:fragspecif} As you might have noticed reading page~\pageref {sect:polymer-fragmentation}, the fragmentation specification is a tricky business. Figure~\ref {fig:polyxdef-polchemdef-fragmentations-definition-wnd} shows examples of protein fragmentation specifications for fragment types \emph{a, b, c, z, y, x, imm}. Let's concentrate on the fragmentation specification of type \emph{a}. While the first row of this fragmentation specification is effectively valid (for a ``protein'' polymer chemistry definition, at least), the lower two rows (describing fragmentation rules named \emph {a-fgr-1} and \emph {a-fgr-2}) are fake, only to show the way fully qualified fragmentation specifications can be created. Let us analyze the data that the user entered to fully qualify this \emph {a} fragmentation specification: \begin{itemize} \item \guilabel{Name} \guival{a} This is the name of the fragmentation specification. Fragments obtained with this specification will be named according to the following naming scheme: ``a-\emph{i}'', with \emph{a} being the fragmentation name and \emph{i} being the position --in the precursor polymer ion-- of the monomer at which the fragmentation occurred (see page~\pageref {sect:polymer-fragmentation}); \item \guilabel {End} \guival{LE} This is the end of the precursor polymer that is to be found in the fragment. Accepted values are ``LE'' (left end), ``RE'' (right end) and ``NE'' (no end). We have previously seen --for proteins and nucleic acids-- that fragments \emph{a, b, c} include the left end (``LE'') of the precursor polymer, while ``RE'' applies to fragmentation specifications that lead to fragments that contain the right end of the precursor polymer (for example, fragments \emph{x, y, z}). Special cases, like proteinaceous immonium ions, do not bear any end of the precursor polymer, in which case ``NE'' (for no end) should be written here instead of ``LE''). This \guilabel {End} piece of information is important for two reasons: 1) because it tells the fragmentation engine from which end it should iterate (in the precursor polymer sequence) when making all the fragments of a given fragment ion series and 2) because it guides \pxm\ to apply the conventional naming scheme using \emph{i} with the proper value. Therefore, the smallest fragment of the \emph{a} series is \emph{a-\textbf{1}} (note subscript 1), which is the left end monomer of the precursor polymer. The smallest fragment of the \emph{x} series is \emph{x-\textbf{1}} (note that subscript is also 1). This time, the \emph{x-\textbf{1}} fragment, however, corresponds to the right end monomer of the polymer sequence. This is because the numbering of the fragments always starts at the precursor polymer's end that was specified by the \guilabel {End} piece of data from the polymer chemistry definition; \item \guilabel{Actform} \guival{-C1O1} Optional. This is the chemical reaction that will actually change a monomer chain into the proper fragment. Indeed, the mass calculation of the fragment's mass is performed by summing the mass of the monomers running from the \emph{end} of the precursor polymer up to the position where the fragmentation occurs, plus adding the mass of the end's cap as specified in the polymer chemistry definition. But, for the \emph{a} fragments, this is not enough, as it does not lead to a correct mass. It is required that the actform ``-C1O1'' be applied to the monomer chain so that it is of the correct mass (after having added the mass corresponding to the left cap; see below). This actform is optional, because for some fragments (for example, fragments \emph {b} in the protein polymer chemistry) there is no need for any actform besides adding the masses of the monomers and adding the mass corresponding to the left cap of the polymer chemistry definition. As can be seen on the picture, ``-H0'' is set as an actform for \emph {b} fragments. Again, see page~\pageref{sect:polymer-fragmentation}; \item \guilabel {Comment} (Empty) Optional. This is simply a comment, if the user wants to set any. \textit{Ad libitum}. \end{itemize} \noindent A fragmentation specification can include zero or more fragmentation rule(s) that help model --in a highly detailed manner-- complex fragmentation patterns. Let's see what it takes to define a fragmentation rule: \begin{itemize} \item \guilabel {Name} \guival{a-fgr-1} This is the name of the fragmentation rule. It should be self-explanatory and should somehow provide a hint to the fact that this fragrule belongs to the \emph{a} fragmentation specification; \item \guilabel {Prev} \guival{E} Optional. This is one of the logical conditions that can be set to be verified so that the actform can be applied to the fragment currently generated. In our example, we are saying that if --in the precursor ion sequence-- the monomer preceeding the one that is currently fragmented is of code `E', then this condition is verified and the \guival{+H200} actform should be applied to the resulting fragment; \item \guilabel{This} \guival{D} Optional. This is an analogous condition as the one above, unless the monomer onto which this condition applies is the monomer being actually fragmented; \item \guilabel{Next} \guival{F} Optional. This is similar condition, unless that it applies to the monomer that is one position forward in the precursor ion sequence, with respect to the presently fragmented position; \item \guilabel{Actform} \guival{+H200} This is the chemical action with which the fragment will actually be challenged if the set of logical conditions above is verified. This actform is the \emph{raison d'\^etre} of the fragmentation rule, so it is compulsory; \item \guilabel{comment} \guival{comment here!} Optional. \textit{Ad libitum}. \end{itemize} A fragmentation rule is a set of one or more logical conditions that (if verified) determine a user-specified chemical actform to be applied to the fragment that was generated in the first place by fragmenting the precursor polymer using the fragmentation specification to which the fragrule itself belongs. As can be seen in the example figure, the fragmentation specification for fragments \emph{a} (fragmentation specification \emph{a}) contains two fragmentation rules, but it could have contained as many of them as necessary to finely describe experimentally observed fragmentation events\dots The following paragraph will explain thoroughly how fragmentation rules modify the way fragments are generated, for a given fragmentation pattern. We have seen, in our example of a fragmentation specification named \emph {a} (Figure~\ref{fig:polyxdef-polchemdef-fragmentations-definition-wnd}), that it should generate fragments starting from the left end of the precursor polymer. Now we see that the fragmentation specification includes a fragmentation rule: \guilabel {This} is set to `D', which means that this fragmentation rule is evaluated further \emph{only} if the monomer currently fragmented is indeed a `D'. If not, the whole fragmentation rule is skipped. If \guilabel {Prev} is set to something (for us: `E'), then the fragmentation rule is evaluated further only if the monomer at position [current -1] is a `E'. If not, the fragmentation rule is skipped. If \guilabel {Next} is set to something (for us: `F'), then the fragmentation rule is evaluated further only if the monomer at position [current +1] is a `F'. If not, the fragmentation rule is skipped. What is called a position \textbf{[current +1]} and a position \textbf{[current -1]} depends on the kind of fragmentation specification: if the fragmentation specification states that \guilabel {End} (seen earlier) is ``LE'' (or ``NE''), then the position [current +1] refers to the position right of the currently fragmented monomer (in the standard left-to-right polar horizontal representation of a polymer); if the fragmentation specification states that \guilabel{End} is ``RE'', then the position [current +1] refers to the position left of the currently fragmented monomer. This has to do with the way the fragmentations are normally described: the fragment numbering scheme starts at the right end of the precursor polymer for ``RE'' fragments and at the left end of the precursor polymer for ``LE'' fragments. This is also true here: for a fragment of the series \emph{a}, the fragmentation rule that we have described would effectively be applied to the following sequence: \[\mathrm {MYNAMEIS\textbf {EDF}FIL}\] \emph{only} upon generation of the $\mathrm{MYNAMEIS\textbf{ED}}$ fragment. \bigskip\noindent If we were using the same fragmentation rule for a fragment of the series \emph{x} (for which \guilabel {End} is ``RE''), the fragmentation rule would never have been evaluated. Instead, for the following sequence: \[\mathrm {MYNAMEIS\textbf {FDE}FIL}\] it would have, and thus would have generated the fragment $\mathrm {\textbf {EDF}IL}$. \bigskip\noindent Now, what about internal fragment specifications, like the immonium ions' case, where the \guilabel {End} is defined to be ``NE'' in the polymer chemistry definition? \pxm\ evaluates the conditions from left to right; so the conditions are evaluated like for ``LE'' cases. Another important thing to figure out: how are the logical conditions tested? The main condition (entered as \guilabel {This}) is evaluated first, because this is the simplest evaluation: the value of the \guilabel {This} monomer can be compared with the currently fragmented monomer code without depending on the \guilabel {End} value. If the monomer context complies with this condition (in our example that would mean that we are actually fragmenting at a `D' monomer), other conditions (if any) are evaluated. Thus, in logic terminology the conditions are \emph{AND}ed one with the other: as soon as a condition is stated it must be verified. If \emph{any} condition is not verified, no fragment is created and the other fragmentation rules are analysed (if any). If there are more than one fragmentation rule in a fragmentation specification, each fragmentation rule is evaluated separately. If the monomeric context (previous/this/next monomer codes) complies with the logical conditions stated in the evaluated fragmentation rule, a new fragment is generated. When a fragmentation rule is found not to comply with the monomer context, then it is simply skipped (no fragment is generated). It should be noted that the presence of a fragmentation rule in a fragmentation specification is not exclusive, in the sense that if the fragmentation rule contains never satisfied logical condition(s),\footnote{Such as if ``this monomer's code'' is `Y', ``next monomer's code'' is `Y' and ``previous monomer's code'' is `Y' and there is no ``YYY'' sequence element in the polymer, for example.} a single fragment is indeed generated, which corresponds to the fragmentation specification without taking into account any framentation rule. The fact that each fragmentation rule --that has logical conditions which are verified in the sequence-- yields a new fragment implies that the fragmentation rules are not summative: a fragment is not generated by applying onto it the actform of each validated fragmentation rule in a fragmentation specification. Each fragmentation rule, in a given fragmentation specification, gives rise to a fragment that is a fragment ion resulting from the application of both the actform specified in the fragmentation specification (if any) and the actform specified in the fragmentation rule (this one is compulsory). Next, when another fragmentation rule of the same fragmentation specification is evaluated, a brand new fragment is generated according to the same process as the one just described. As an example of how the fragmentation rules might be added to a given fragmentation specification, let's take the example of the `a' and ``a-B'' fragments from the oligonucleotide chemistry. The `a' fragments might exist \textit{per se} ; however, it might happen that a further fragmentation event removes the nucleobase off the nucleotide that undergoes the fragmentation. The generated fragments are called ``a-B''. Let us see how we could model this chemistry: \\ First look at the conventional `a' fragmentation pattern: \\ \begin{verbatim} <fragspecs> <fgs> <name>a</name> <end>LE</end> <actform>-O</actform> </fgs> \end{verbatim} And now look at the ``a-B'' abasic corresponding fragmentation pattern: \\ \begin{verbatim} <fgs> <name>a-B</name> <end>LE</end> <actform>-O</actform> <fgr> <name>a-B-c</name> <actform>-C4H4N3O</actform> <this-mnm-code>C</this-mnm-code> <comment>a-cytosine</comment> </fgr> <fgr> <name>a-B-a</name> <actform>-C5H4N5</actform> <this-mnm-code>A</this-mnm-code> <comment>a-adenine</comment> </fgr> <fgr> <name>a-B-t</name> <actform>-C5H5N2O2</actform> <this-mnm-code>T</this-mnm-code> <comment>a-thymine</comment> </fgr> <fgr> <name>a-B-g</name> <actform>-C5H5N5O</actform> <this-mnm-code>G</this-mnm-code> <comment>a-guanine</comment> </fgr> <comment>abasic a fragment</comment> </fgs> \end{verbatim} For each of the four bases, the model instructs the fragmentation engine to remove the formula of the base (minus a proton) if the fragmentation occurs precisely at such a base in the oligonucleotide. \renewcommand{\sectitle}{Saving A Polymer Chemistry Definition} \section*{\sectitle} \addcontentsline{toc}{section}{\numberline{}\sectitle} \label{sect:save-polymer-chemistry-definiton} Once the polymer chemistry definition is completed, the user can save it to a file. Prior to actually writing to the file, the program checks the syntax validity of the elements that the user has entered in the window. This check can be triggered manually by clicking onto the \guilabel{Check Syntax} button. If an error is found in the polymer chemistry definition being worked on, that error is displayed in a window so that the user may identify the problem and fix it. When saving a polymer chemistry definition to a file, if no error is detected, the program proceeds with writing the polymer chemistry definition to an \fileformat {XML} file. The location where the file should be saved, and the manner that it may be made available to the whole \pxm framework is to be described in a later chapter. In that chapter, the user will be instructed on how to insure that the newly-made polymer chemistry definition uses the proper atom definition. Indeed, \pxm is a rather powerful framework, wholly designed to be modular. But this modularity and power have a cost: complexity. A well configured system is the key to a powerful program running smoothly. It is thus very important to grasp the \pxm framework configuration data hierarchy so that the program knows at each instant where to find the configuration and data files required to perform properly both the polymer sequence display and the mass calculations. But for now go on with the polymer chemistry definition-aware calculator: \pxc! \cleardoublepage %%% Local Variables: %%% mode: latex %%% TeX-master: "polyxmass" %%% End: