Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > ed05dc5f357c49fdded780f52b150dce > files > 34

ploticus-2.41-2mdv2010.0.i586.rpm

<html>
<head>
<!-- This file has been generated by unroff 1.0, 03/11/09 12:56:14. -->
<!-- Do not edit! -->
<STYLE TYPE="text/css">
<!--
        A:link{text-decoration:none}
        A:visited{text-decoration:none}
        A:active{text-decoration:none}
        OL,UL,P,BODY,TD,TR,TH,FORM { font-family: arial,helvetica,sans-serif;; font-size:small; color: #333333; }

        H1 { font-size: x-large; font-family: arial,helvetica,sans-serif; }
        H2 { font-size: large; font-family: arial,helvetica,sans-serif; }
        H3 { font-size: medium; font-family: arial,helvetica,sans-serif; }
        H4 { font-size: small; font-family: arial,helvetica,sans-serif; }
-->
</STYLE>
<title>ploticus: input data formats</title>
<body bgcolor=D0D0EE vlink=0000FF>
<br>
<br>
<center>
<table cellpadding=2 bgcolor=FFFFFF width=550><tr>
<td>
  <table cellpadding=2 width=550><tr>
  <td><br><h2>Input data formats</h2></td>
  <td align=right>
  <small>
  <a href="../doc/welcome.html"><img src="../doc/ploticus.gif" border=0></a><br>
  Version 2.41 Mar2009
  <td></tr></table>
</td></tr>
<td>
<br>
<br>

<title>Manual page for Input_data_formats(PL)</title>
</head>
<body>

<p>
Ploticus can read tabular ASCII data from files, commands, or from standard input.<tt> </tt>
If you're using prefabs, your data source is specified on the command line eg. <tt>data=myfile.dat</tt>.<tt> </tt>
If you're writing scripts,
<a href="getdata.html">
 proc getdata
</a>
is used to read data.. data can also be embedded directly into scripts.<tt> </tt>

<br><br>
<h2>See also</h2>
<a href="getdata.html">
 proc getdata
</a>



<br><br><br>
<h2>Plotting from data fields</h2>
<p>
Plotting and data display operations are done
using fields.  Suppose we have a data set like this in the file <tt>myfile.dat</tt>:
<pre>
   F1 2.43 0.47 PF7955
   F2 2.79 0.28 PT2705
   F3 2.62 0.37 PB2702
</pre>
Suppose we want to draw a bar graph using the values in field 2,
and draw error bars using the values in field 3.<tt> </tt>
Fields can be specified by number, so 
we could use this command: 
<pre>
   pl -prefab vbars  data=myfile.dat  y=2   err=3
</pre>
If your data set has a field name header (field names in the first row) you can
reference fields using those names if you want to.  For example:
<pre>
   test level se    case_id
   F1   2.43  0.47  PF7955
   F2   2.79  0.28  PT2705
   F3   2.62  0.37  PB2702
</pre>
 ..we could use this command:
<pre>
    pl -prefab vbars  data=myfile.dat  header=yes   y=level  err=se
</pre>
The field name header must use the same delimitation as the data proper.<tt> </tt>
Field names are like variable names; they
cannot contain embedded white space, comma, or quote characters.<tt> </tt>
Script writers can use field names by setting the
<a href="getdata.html">
 fieldnameheader option
</a>
to <tt>yes</tt>.<tt> </tt>
Script writers can also assign field names explicitly if desired.<tt> </tt>

<br><br><br>

<h2>Recognized data formats</h2>
Data files or streams should be plain ASCII text, not binary, and should be organized as a
collection of rows having one or more fields.<tt> </tt>
Fields may have numeric or alphanumeric content and may be delimited in one of these ways:


<br><br><br>
<dl>
<dt> <dd>
<b>whitespace delimited</b> 
<br>
<pre>
	F1 2.43 0.47 Jane_Doe     PF7955   
	F2 2.79 0.28 John_Smith   PT2705
	F3 2.62 0.37 Ken_Brown    PB2702
	F4  -    -   Bud_Flippner PX7205
	...
</pre>
Fields are delimited by any mixture of one or more spaces or tabs.<tt> </tt>
No quote processing is done.<tt> </tt>
Blank fields must be represented using a nonblank code, and
alphanumeric fields cannot contain white space.<tt> </tt>
Embedded spaces must be represented some other way, such as with underscores.<tt> </tt>

<br><br><br>
<dt> <dd>
<b>spacequote delimited</b> 
<br>
<pre>
	F1 2.43 0.47 "Jane Doe"   PF7955
	F2 2.79 0.28 "John Smith" PT2705
	F3 2.62 0.37 "Ken Brown"  PB2702
	F4 "" "" "Bud Flippner"   PX7205
</pre>
This is a variant of whitespace delimitation where
fields may be enclosed in double quotes ("), and quoted fields may have 
embedded white space.  Blank fields may be represented as shown or using a code.<tt> </tt>

<br><br><br>
<dt> <dd>
<b>tab delimited</b> 
<br>
<pre>
	F1	2.43	0.47	Jane Doe	PF7955
	F2	2.79	0.28	John Smith	PT2705
	F3	2.62	0.37	Ken Brown	PB2702
	F4			Bud Flippner	PX7205
	...
</pre>
Fields are separated by a single tab.  
Zero length fields are taken to be blank.<tt> </tt>
Data fields may not contain embedded tabs.<tt> </tt>
The first field must start at the very beginning of the line.<tt> </tt>
The last field in a row may be terminated by a tab or not.<tt> </tt>

<br><br><br>
<dt> <dd>
<b>bar delimited</b>
<br>
<pre>
	F1|2.43|0.47|Jane Doe|PF7955
	F2|2.79|0.28|John Smith|PT2705
	F3|2.62|0.37|Ken Brown|PB2702
	F4|||Bud Flippner|PX7205
	...
</pre>
Fields are separated by a single bar character (|) 
Data fields may not contain embedded bar characters.<tt> </tt>
The first field must start at the very beginning of the line.<tt> </tt>
The last field in a row may be terminated by a bar character or not.<tt> </tt>
Available in versions 2.40+.<tt> </tt>

<br><br><br>
<dt> <dd>
<b>comma delimited</b> 
<pre>
	"F1",2.43,0.47,"Jane Doe"
	"F2",2.79,0.28,"John Smith"
	"F3",2.62,0.37,"Ken Brown"
	"F4",,,"Hello""world"
	...
</pre>
Also known as comma-quote delimited or CSV.  Fields are separated by commas.  
Alphanumeric fields may be enclosed in double quotes (not an issue except when
data fields contain whitespace or comma characters).<tt> </tt>
Zero length fields and fields containing "" are taken to be blank.<tt> </tt>
An embedded double quote is represented using ("") as seen in row F4 above.<tt> </tt>
The first field must start at the very beginning of the line.<tt> </tt>
No whitespace is allowed before or after fields (although this
apparently is tolerated in the CSV spec).<tt> </tt>


<br><br><br>
</dl>
<h2>Notes regarding data input and parsing</h2>
<p>
<b>Numeric values in scientific notation</b> - as of 2.30 these should be handled transparently.<tt> </tt>
<p>
<b>Empty rows and commented rows</b> are ignored.<tt> </tt>
The default comment symbol is <tt>//</tt> and it may be used only at the beginning of a line.<tt> </tt>
An alternate comment symbol can be specified if desired.<tt> </tt>
<p>
<b>Number of data fields per row (record):</b>
Proc getdata needs to get a fixed idea of the number of data fields per row to be parsed and stored.<tt> </tt>
In most cases the number of fields is uniform on all rows, so this isn't an issue.<tt> </tt>
But where the number of fields differs between rows, or where the number of field names disagrees with
the number of data fields, here is the behavior:
Number of expected fields can be explicitly specified by setting <tt>nfields</tt>.<tt> </tt>
Or, if data field names exist, the expected number of data fields will be the same as the number of field names.<tt> </tt>
Otherwise, the first usable data row will dictate the expected number of data fields per row.<tt> </tt>
If a certain data row has <b>more</b> than the expected number of fields, extra fields are silently ignored.<tt> </tt>
If a row has <b>less</b> than the expected number of fields, blank fields are silently added
until the record has same number of fields as other records.<tt> </tt>
<p>
<b>Rows may be conditionally selected</b> at the time of reading by specifying a <tt>select</tt> condition.<tt> </tt>
Rows not meeting the condition will be skipped.<tt> </tt>
<p>
<b>Leading white space</b> is allowed when using <tt>whitespace</tt> or <tt>spacequoted</tt> delimitation.<tt> </tt>
It is not allowed on the other types.  This applies regardless of whether data are specified in script or coming from
file or command.<tt> </tt>
<p>
<b>Comments / empty lines in comma-delimited data files:</b> Comment symbol must
be at beginning of line, and empty lines may not contain any whitespace.<tt> </tt>
<p>
<b>Row termination:</b> Each row, including the last one, should be terminated with a newline or CR/LF.<tt> </tt>
<p>
<b>When specifying data within the ploticus script:</b>
Data may be subject to side-effects of script interpreter, such as evaluation/expansion of
constructs that resemble script operators or variables.<tt> </tt>


<br><br><br>
<h2>Missing data</h2>
Missing data values may be represented using a code or by a zero-length field, depending on the delimitation method.<tt> </tt>
A value is considered missing if it is non-plottable.. ie if plotting numerics any non-numeric value is considered
missing data; if plotting dates any value that isn't a date (in the current format) is considered missing data.<tt> </tt>
When plotting, missing values are generally skipped over, but exactly what occurs depends on
what kind of plot operation is being done.  


<a name=set></a>
<br><br><br>
<h2>Embedded #set statements</h2>
Data files may contain embedded <tt>#set</tt> statements for setting prefab parameters and ploticus
variables directly from the data file.  

The syntax is:
<dl>
<dt> <dd>
<tt>#set VARIABLE = value</tt>.<tt> </tt>
<br>
or <tt>#set </tt><i>parametername</i><tt> = </tt><i>value</i>.<tt> </tt>
</dl>
<p>
The #set statements may be given before, after, or intermingled with the data rows.<tt> </tt>
All tokens are separated by whitespace and quoting is never used.<tt> </tt>
Here's an example of a data file with embedded #set statements:
<dl>
<dt> <dd>
<pre>
  #set mytitle = Orders processed on Tue 8 Jul '03
  #set ymax = 40
  ABC	3	4	11	42.3
  DEF	5	2	48	27.4
  GHI	9	1	79	37.3
  ...
</pre>
</dl>
<p>
Some prefab parameters (those that control data input or that are accessed by the prefab 
before the data are read) cannot be #set within the data file, or will be problematic when 
set this way.  If a prefab parameter doesn't seem to be working correctly when set from within 
a data file, try setting it on the command line instead.  Another way to determine this is to
run the prefab with -echo and check the script output to see if the parameter is accessed before the 
proc getdata.<tt> </tt>

<br><br><br>

<h2>Other possibilities</h2>
<p>
Since ploticus can read data on standard input, there are many possibilities for getting 
data for plotting.  To get data out of an SQL database, use your
database's command line tool to extract tabular ASCII data.<tt> </tt>
Or, to get data across the internet using a URL, use a utility like
<a href="http://www.acme.com/software">
 Jeff Poskanzer's http_get.<tt> </tt>
</a>
Be sure to set <b>delim</b> appropriately.<tt> </tt>
These examples illustrate:
<dl>
<dt> <dd>
<tt>mysql acars &lt; mycommand.sql | pl -prefab ... data=stdin delim=tab..</tt>
<br>
<tt>http_get "http://abc.net/delta/jan28.dat" | pl -prefab ... data=stdin ..</tt>
</dl>
<p>
If you are developing ploticus scripts, and your data exists in a state such that additional processing is
required in order to work with it, you may be able to accomplish the desired manipulation 
within ploticus.  
To select certain fields, reformat fields, concatenate fields, etc., try using a
<a href="getdata.html">
 proc getdata filter.<tt> </tt>
</a>
To perform accumulation, tabulation and counting, rewriting
as percents, computation of totals, reversing record order,
rotation of row/column matrix, break processing, etc.,
<a href="processdata.html">
 proc processdata
</a>
may be useful (it operates on the data after they have been read in).<tt> </tt>
<p>
Script writers wishing to embed large amounts of data directly into a script
may be interested in
<a href="trailer.html">
 proc trailer,
</a>
which allows the data to be given
at the end of the script file, to get it out of the way.<tt> </tt>

<a name=currentds></a>
<br><br><br>

<h2>The current data set</h2>
Within a script, 
<a href="getdata.html">
 proc getdata
</a>
can be invoked any number of times to read in data.  
However there can only be one active data set at any one time.<tt> </tt>
This is referred to as the "current data set".<tt> </tt>
<p>
Note that
<a href="getdata.html">
 proc getdata
</a>
isn't the only way that the current data set can be filled...<tt> </tt>
<a href="processdata.html">
 proc processdata
</a>
and
<a href="tabulate.html">
 proc tabulate
</a>
can be used to derive new data (the result then becomes the current data set).<tt> </tt>
<p>
Data sets and derivations are managed as a stack (it is possible to make a derivation of a derivation).<tt> </tt>
The most recently created data set is generally the "current" one.  
You can return to your original data set (or "pop back" to an earlier derivation) by using
<a href="usedata.html">
 proc usedata
</a>
<p>
When 
<a href="getdata.html">
 proc getdata
</a>
is used to aqcuire new data, the stack of derived data sets is cleared.<tt> </tt>

<br><br><br>

<h2>Examples</h2>
Here are some script examples:
<br>
<a href="../gallery/scat7.dat">
 scat7.dat
</a>
(white-space delimited)
<br>
<a href="../gallery/stock.csv">
 stock.csv
</a>
(comma delimited)
<br>
<a href="../gallery/timeline3.htm">
 timeline3
</a>
(data specified within script)
<br>
<a href="../gallery/km2.htm">
 km2
</a>
(data specified within script).<tt> </tt>


<br>
<br>
</td></tr>
<td align=right>
<a href="../doc/welcome.html">
<img src="../doc/ploticus.gif" border=0></a><br><small>data display engine &nbsp; <br>
<a href="../doc/Copyright.html">Copyright Steve Grubb</a>
<br>
<br>
<center>
<img src="../gallery/all.gif"> 
</center>
</td></tr>
</table>
<br>
<center>
<table><tr><td>
Ploticus is hosted at http://ploticus.sourceforge.net</td><td> &nbsp; </td><td>
<a href="http://sourceforge.net/projects/ploticus"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=38453&type=12" 
width="120" height="30" border="0" 
alt="Get ploticus data display engine at SourceForge.net. Fast, secure and Free Open Source software downloads" /></a>
</td></tr></table>


</center>
<p><hr>
Markup created by <em>unroff</em> 1.0,&#160;<tt> </tt>&#160;<tt> </tt>March 11, 2009.
</body>
</html>