Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > ed4950ee216151219bf3841700ccc7f8 > files > 31

gp-0.26-6mdv2010.0.i586.rpm








<html><head><title>gp_cdndev</title>

<link rev="made" href="mailto:january@bioinformatics.org">
</head>
<body bgcolor="#FFFFFF" link="#FF0000">

<hr>

<h1>gp_cdndev</h1>
<h2>GP</h2>
<h2>2000</h2>


     
<h2>NAME</h2>
    gp_cdndev - calculate the codon bias of sequence(s)
<p><h2>SYNOPSIS</h2>
    
    <strong>gp_cdndev</strong> [options] &lt;codon usage file&gt; [inputfile] [outputfile]
<p><h2>OPTIONS</h2>
    
<p><dl>
<p><p></p><dt><strong>-o</strong><dd> Show bias for all ORFs read
<p><p></p><dt><strong>-t</strong><dd> Show the total bias for the set of ORFs read
<p><p></p><dt><strong>-b</strong><dd> Both of the above
<p><p></p><dt><strong>-c file</strong><dd> Read the alternate genetic code from <p></p><dt><strong>file</strong><dd>
<p><p></p><dt><strong>-v</strong><dd> Prints the version information.
<p><p></p><dt><strong>-d</strong><dd> Prints lots of debugging information.
<p><p></p><dt><strong>-h</strong><dd> Shows usage information.
<p><p></p><dt><strong>codon usage file</strong><dd> file containing a certain codon usage distribution
<p><p></p><dt><strong>inputfile</strong><dd> file to proces; if not given, will use standard input
<p><p></p><dt><strong>outputfile</strong><dd> file to write the data to; if not given, will use
    standard output
<p></dl>
<p><h2>DESCRIPTION</h2>
    
<p>Codon usage is related to the levels of protein expression. It is possible
		to predict expression of an ORF or a set of ORFs by comparing codon
		usage of those ORFs to the codon usage of genes with known levels of protein
		expression. There are different methods to measure this codon bias; the one
		used by <strong>gp_cdndev</strong> is described by S. Karlin and J. Mrazek, 2000, J.
		Bact. 182:5328-5350. Basically, it is a sum of absolute differences in the
		codon frequencies of the reference and input set, weighed by the frequencies
		of respective amino acids. Refer to the above paper for more details.
<p>First you will have to record the codon usage of your reference sequence
		set, for example using the <a href="gp_cusage.html">gp_cusage(1)</a> program.
<p>The format of the codon table is following: each line contains a codon and
		the frequency of it (trailing information in each line is skipped), empty
		lines and lines starting with a hash ('#') are ignored. The file does not
		have to contain information about all codons: it is enough to specify the
		codons that have frequency greater then 0.0. Here is an example:
<p><pre>

			# comment -- this line will be ignored
			GCC 1.35
			# the codon GCC has the frequency of 1.35
		
</pre>

<p>Note that <strong>gp_cdndev</strong> does not check whether the sequence is a valid ORF
		or not. It just takes three nucleotides, checks what they code, and records
		the respective codon frequency.
<p><h2>EXAMPLES</h2>
    
<p>1. Highly expressed ribosomal sequences are stored in the file <strong>ribo.fasta</strong> ;
	some unknown ORFs are stored in <strong>mystery.fasta</strong>. Now you'd like to calculate
	the codon bias of sequences in mystery.fasta in respect to the set of
	ribosomal sequences. Here is how you do it:
<p>a. Calculate the codon usage of ribosomal sequences and write it to a file
	called <strong>ribo.cdu</strong>:
<p><code>gp_cusage ribo.fasta ribo.cdu</code> 
<p>or
<p><code>gp_cusage ribo.fasta &gt; ribo.cdu</code>
<p>b. Calculate the bias of <strong>mystery.fasta</strong>:
<p><code>gp_cdndev ribo.cdu mystery.fasta</code>
<p>2. Just like the example above, but you'd like to know the total bias of the
	set of sequences stored in <strong>mystery.fasta</strong>:
<p><code>gp_cusage ribo.fasta &gt; ribo.cdu</code>
<p><code>gp_cdndev -t ribo.cdu mystery.fasta</code>
<p><h2>SEE ALSO</h2>
    
<a href="index.html">Genpak(1)</a> 
<a href="gp_acc.html">gp_acc(1)</a> 
<a href="gp_adjust.html">gp_adjust(1)</a> 
<a href="gp_cusage.html">gp_cusage(1)</a> 
<a href="gp_digest.html">gp_digest(1)</a> 
<a href="gp_dimer.html">gp_dimer(1)</a> 
<a href="gp_findorf.html">gp_findorf(1)</a> 
<a href="gp_gc.html">gp_gc(1)</a> 
<a href="gp_getseq.html">gp_getseq(1)</a> 
<a href="gp_map.html">gp_map(1)</a> 
<a href="gp_matrix.html">gp_matrix(1)</a> 
<a href="gp_mkmtx.html">gp_mkmtx(1)</a> 
<a href="gp_pattern.html">gp_pattern(1)</a> 
<a href="gp_primer.html">gp_primer(1)</a> 
<a href="gp_qs.html">gp_qs(1)</a> 
<a href="gp_randseq.html">gp_randseq(1)</a> 
<a href="gp_seq2prot.html">gp_seq2prot(1)</a> 
<a href="gp_slen.html">gp_slen(1)</a> 
<a href="gp_tm.html">gp_tm(1)</a> 
<a href="gp_trimer.html">gp_trimer(1)</a> 
<p><h2>DIAGNOSTICS</h2>
    
<p>All <strong>Genpak</strong> programs complain in situations you would also complain,
like when they cannot find a sequence you gave them or the sequence is not
valid. 
<p>The <strong>Genpak</strong> programs do not write over existing files. I have found this
feature very useful :-)
<p><h2>BUGS</h2>
    
<p>I'm sure there are plenty left, so please mail me if you find them. I tried
to clean up every bug I could find.
<p><h2>AUTHOR</h2>
    
<p>January Weiner III
		<a href="mailto:january@bioinformatics.org">&lt;january@bioinformatics.org&gt;</a>    
</body>
</html>