Sophie: gap-system-4.4.12-5mdv2010.0 i586

gap-system-4.4.12-5mdv2010.0.i586.rpm

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%W  tutorial.tex           ParGAP documentation            Gene Cooperman
%%
%H  $Id: tutorial.tex,v 1.3 2001/11/16 15:57:00 gap Exp $
%%
%Y  Copyright (C) 1999-2001  Gene Cooperman
%Y    See included file, COPYING, for conditions for copying
%%

\Chapter{Tutorial}

Section~"Trivial Parallelism" covers trivial  parallelism  (the  simplest
and most common form  of  parallel  application).  This  is  followed  in
Section~"Using ParGAP interactively" by  a  description  of  how  to  use
{\ParGAP}  interactively,  and  Section~"Streaming"   illustrates   these
principles with a short implementation of parallel  streaming.  Streaming
refers to  simultaneously  running  different  algorithms  for  the  same
problem in different ``streams'' or processes, and  accepting  the  first
answer that returns. The remaining processes for the other algorithms are
then terminated. {\ParGAP} allows those processes to reside on  a  single
CPU or on different CPUs. Hence, in the case of streaming,  {\ParGAP}  is
potentially useful even if one has  only  a  single  computer  available,
since multiple streams can still reside locally in separate processes.

For more sophisticated (non-trivial)  parallel  applications,  the  TOP-C
model  of  parallelism  is  recommended,   and   it   is   described   in
Section~"TOP-C model for non-trivial parallelism".

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Trivial Parallelism}

In parallel computation, perhaps 80\% of the applications fall under  the
heading of ``trivial parallelism''. These are  situations  in  which  one
must compute many unrelated cases. For  example,  perhaps  200  different
cases must be computed and results returned for each case and  each  case
has no interaction with any other one. In the absence of shared data,  it
would be common to start 20 {\GAP} jobs on 20 distinct workstations in  a
student computer laboratory. If there are 200 cases, then one  writes  20
different ``batch jobs'', each batch job handling 10 distinct cases.

{\ParGAP}  provides   strong   support   for   this   common   situation.
Conceptually, one is always talking to {\ParGAP} on a master process (the
local workstation), and the master process talks to each of various slave
processes. In its simplest form, one merely generates a  list  of  inputs
for the cases, and  writes  a  suitable  function  to  provide  the  case
information. Effectively, trivial  parallelism  can  be  expressed  by  a
parallel version of {\GAP}'s `List()'  function  (see~"ref:List"  in  the
Reference  Manual),  which  is  called  `ParList()'  (see~"ParList")   in
{\ParGAP}.

The following is  an  example  of  a  {\ParGAP}  session  that  uses  the
`ParList()'   function.   Observe    the    use    of    `BroadcastMsg()'
(see~"BroadcastMsg")      to      ensure      that      the      function
`AnalyzePrimitivePermGroupsOfOrder()' is known to the slaves.

\beginexample
gap> AnalyzePrimitivePermGroupsOfOrder := function(ord)
>      #returns a list of data records of all primitive 
>      #permutation groups of order ord that are represented 
>      #as permutation groups in GAP's primitive group database
>      
>      local PrimitiveGroupsOfOrder, AnalyzePrimitivePermGroup;
>    
>      PrimitiveGroupsOfOrder :=
>        ord -> List( [1..NrPrimitiveGroups(ord)],
>                     i -> PrimitiveGroup(ord,i) );
>    
>      AnalyzePrimitivePermGroup :=
>        grp -> rec( order := Size(grp),
>                    degree := NrMovedPoints(grp),
>                    baseSize := Size(BaseOfGroup(grp)) );
>    
>      return List( Filtered( PrimitiveGroupsOfOrder(ord),
>                             IsPermGroup ),
>                   AnalyzePrimitivePermGroup );
>    end;;
gap> #Define AnalyzePrimitivePermGroupsOfOrder on slaves
gap> BroadcastMsg( 
>      PrintToString("AnalyzePrimitivePermGroupsOfOrder := ",
>                    AnalyzePrimitivePermGroupsOfOrder ) );
gap> ParList( [2..256], AnalyzePrimitivePermGroupsOfOrder );
[ [ rec( order := 2, degree := 2, baseSize := 1 ) ], 
  [ rec( order := 3, degree := 3, baseSize := 1 ), 
  <...many lines of output omitted...>
\endexample

As one expects, the answer  is  computed  in  parallel  in  a  time  that
decreases linearly with the number of processors. Hence a Beowulf cluster
of 16 processors (1 {\ParGAP} master and 15 {\ParGAP} slaves) will return
this information approximately 15 times as fast as  a  single  processor,
greatly facilitating interactive exploration of the properties of various
groups.

If each single case can be executed quickly, then the execution time  for
`ParList()' can be dominated by the network overhead imposed  by  message
passing between the master and slaves. In this case,  an  optional  third
parameter to `ParList()' specifies the number of cases to be submitted to
a slave in a single message (modulo how many cases are left, of  course).
As an example, suppose we are interested in finding those  of  the  first
100,000 integers that have many distinct prime factors, but to reduce the
network overhead we wish to batch 1000 cases at a  time  to  the  slaves.
Then one is able to execute the following.

\beginexample
gap> ParList([1..100000],
>            i -> Length(PrimePowersInt(i))/2, 1000);
\endexample

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Using ParGAP interactively}

Having seen an example where {\ParGAP} can be useful, one next  needs  to
know what is involved in setting up and using {\ParGAP}.  {\ParGAP}  runs
only on UNIX, due to its dependence on an MPI  subset  (included  in  the
distribution) that calls UNIX sockets. It  should  be  possible  to  port
{\ParGAP} to Windows, either through the use of Cygwin (UNIX emulation on
Windows) or through a native Windows MPI implementation.  Similarly,  the
code has been written in a ``vanilla'' manner, so that in  principle,  it
should be easy to port to another interactive language, such as MAGMA  or
LISP. An older version was in fact written in LISP~\cite{Coo95}, although
the current version is not implemented in LISP. In the following,  recall
that {\ParGAP}  employs  a  master-slave  architecture,  with  the  local
process being the ``master'' process, and all others being ``slaves''.

{\ParGAP} is installed in the same way as other  {\GAP}  packages.  After
extracting the {\ParGAP} files, one invokes `configure' and `make'.  (See
Section~"Installing ParGAP" for the details.) One next writes a file with
a list of  slave  processors  to  attach,  and  an  indication  of  where
{\ParGAP} is located on that processor; the usual way  is  to  name  this
file `procgroup' and have it in the directory  one  starts  up  {\ParGAP}
(see Section~"Running ParGAP" for details). An example  `procgroup'  file
is:

\begintt
# This is a sample ParGAP procgroup file.
# Lines starting with a # are comments.
# Blank lines are also treated as comments.

# The following line generates the MASTER process
local 0

# Each of the following `localhost' lines generates 
# a SLAVE process on the same machine as the MASTER
localhost 1 /proj/sparc/gap4r3/pkg/pargap/bin/pargap.sh
localhost 1 /proj/sparc/gap4r3/pkg/pargap/bin/pargap.sh

# The following line generates a SLAVE process on a remote
# machine: `iron.ccs.neu.edu' (called via rsh or ssh or ...)
iron.ccs.neu.edu 1 /proj/alpha/gap4r3/pkg/pargap/bin/pargap.sh
\endtt

Next, instead of invoking `gap', one  then  invokes  the  supplied  shell
script,  `pargap.sh',  (possibly  modified  and/or  renamed;  again   see
Section~"Running ParGAP" for the details). Observe that unlike most other
{\GAP} packages, one should not  call  `RequirePackage()'  function  from
within a {\GAP} session to start {\ParGAP}. As usual,  master  and  slave
processes read a  `.gaprc'  file  in  the  home  directory,  if  present.
{\ParGAP} looks  for  a  file  named  `procgroup'  file  in  the  current
directory, but can be directed  to  use  a  different  file  through  the
`-p4pg' switch.

The following example assumes that `pargap' is a symbolic link  or  alias
for the full pathname of `pargap.sh'. Hence, if one always uses {\ParGAP}
with the slave processes specified in `myprocgroup.big', one may wish  to
set up an alias or else a  symbolic  link  to  a  script  containing  the
equivalent of:

\){\kernttindent}pargap.sh -p4pg <PATH>/myprocgroup.big

Observe also that heterogeneous computing  is  supported,  in  that  each
slave machine may be a different dialect of UNIX running under  different
hardware.

Within {\ParGAP}, the standard message-passing commands  (send,  receive,
probe,  etc.)  are  available.  The  slaves  are  numbered  consecutively
starting at~1. If no slave  parameter  is  supplied  for  `SendMsg()'  or
`SendRecvMsg()', then the default slave is~1. Most other commands take  a
default  slave  parameter,  `MPI_ANY_SOURCE'  (meaning  ```ANY_SLAVE''').
`PingSlave()' sends a ``null message'', and waits for an  acknowledgement
that the slave is alive.

The command `SendMsg(<string>, <i>);' sends the (string) message <string>
to slave <i>. Slave~<i> then evaluates <string> as a {\GAP} command,  and
returns an answer. If a command has no answer, (such as `"x:=3;"'),  then
the string `"<no_return_val>"' is  returned.  If  multiple  commands  are
sent, such as `"x:=3; y:=4; x+y;"', then only the last  return  value  is
returned. Note that the final semicolon, normally required by {\GAP},  is
optional in a command string for a message.

Most of the other commands are clear from context. Every `SendMsg()' to a
slave must be balanced  by  a  `RecvMsg()',  except  where  the  commands
`FlushAllMsgs()' and `ParReset()' are used to flush any pending  messages
from a slave. A `SendRecvMsg(<string>, <i>)' command is equivalent to the
consecutive commands `SendMsg(<string>,  <i>)'  and  `RecvMsg(<i>)'.  The
command `BroadcastMsg()' sends its string argument  to  each  slave,  and
there is no reply message. `ParEval()' is like `BroadcastMsg()',  but  it
also evaluates on the master process. If one is unsure whether there is a
pending message, one can invoke `ProbeMsgNonBlocking()' or  `ProbeMsg()';
if no slave parameter is provided, the default is  `MPI_ANY_SOURCE'.  The
command `ParRead()' is a parallel version of {\GAP}'s  `Read()'  command;
it is useful for ensuring that the slaves  and  master  share  a  similar
environment. Other examples of commands that are  analogous  to  {\GAP}'s
built-in      commands       include:       `ParReread',       `ParEval',
`ParInstallGlobalFunction'    and     `ParInstallGlobalFunction'.     See
Section~"Slave Listener Commands" for more complete descriptions  of  the
above commands.

An illustration of the usage of  the  commands  mentioned  above  with  a
sample {\ParGAP} session, for which a `procgroup' file  has  defined  two
slaves is the first example given  in  Section~"Extended  Example".  It's
recommended that you review that example at this point.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{Streaming}

Next, we demonstrate a  simple  implementation  of  ``streaming''  as  an
example of how easy it is to use  the  {\ParGAP}  tools  to  provide  new
functionality.  To  the  best  of  this  author's  knowledge,  the   term
``streaming'' originated with Charles Leedham-Green at the conference  on
which this proceedings is based, where he  made  a  general  request  for
streaming functionality to be implemented in some software system.

The idea of streaming is that one may have two algorithms for  solving  a
problem. One would like to to solve the  problem  simultaneously  on  two
processors, each using a distinct algorithm.  Whichever  processor  finds
the solution first should report back to the master process.  The  master
process should then interrupt the other slave, so as to make it available
for further  computation.  This  is  useful  when  one  has  a  heuristic
available that may finish early with the correct answer, or may  continue
for a very long time. The heuristic can then  be  ``streamed''  alongside
the standard algorithm, in full confidence that  the  standard  algorithm
will provide a reasonable upper bound on the time to compute an answer.

One way to provide ``streaming'' functionality is by  the  implementation
of a function we describe below that is inspired  by  {\GAP}'s  `First()'
function (see~"ref:First" in the {\GAP} Reference Manual).  The  function
`First()' takes a list and boolean function as arguments, and returns the
first element of the list for which the boolean function returns  `true'.
In   contrast,   we   define   a   corresponding   {\ParGAP}    function,
`ParFirstResult()', which takes  a  list  and  a  (general)  function  as
arguments. Messages are sent to the slaves causing  the  $i$th  slave  to
evaluate the function on the $i$th list element. The  value  returned  is
the value obtained from whichever slave  finishes  first.  Note  that  in
consistency with the goal of streaming, the function signals an error  if
asked to evaluate a list with more entries than the number of slaves.

\beginexample
gap> ParFirstResult := function( list, fnc )
>   local i, result;
>   if Length(list) > TOPCnumSlaves then
>     Error("too few slaves");
>   fi;
>   for i in [1..Length(list)] do
>     SendMsg( PrintToString(
>                   "fnc :=", fnc, "; fnc(", list[i], ");"),
>              i );
>   od;
>   result := RecvMsg(); # default is MPI_ANY_SOURCE
>   ParReset(); # Interrupt all other slaves
>   return result;
> end;;
\endexample

We now demonstrate the use of `ParFirstResult()'  in  ``streaming''.  For
our example we need the \package{FactInt} package by Stefan Kohl. To  get
this package if you don't have it, visit
\URL{http://www-gap.dcs.st-and.ac.uk/gap/Share/factint.html}

or the equivalent at one of {\GAP}'s mirror sites, and  follow  the  easy
installation instructions.

If one hasn't already included a  `RequirePackage("factint");'  statement
in one's `.gaprc' file then it is necessary to do:

\beginexample
gap> ParEval("RequirePackage(\"factint\")");

Loading FactInt 1.1 (Routines for Integer Factorization),
by Stefan.Kohl@cip.mathematik.uni-stuttgart.de

true
\endexample

so  that  each  slave  (not  just   the   master)   is   aware   of   the
\package{FactInt} functions.

Above we defined `ParFirstResult()' on the master process. We will assume
that we have two slaves.

\beginexample
gap> StreamingFactInt := function(i, x)
>   local alg;
>   alg :=
>    [ x -> [ "MPQS ALGORITHM",  FactorsMPQS(Factorial(x)+1) ],
>      x -> [ "CFRAC ALGORITHM", FactorsCFRAC(Factorial(x)+1) ]
>    ];
>   return alg[i](x);
> end;;
\endexample

Both `StreamingFactInt(1, x);' and `StreamingFactInt(2,  x);'  factor  an
integer, but one uses the multiple polynomial quadratic  sieve  algorithm
(MPQS), and the other uses the continued fraction algorithm (CFRAC);  the
functions `FactorsMPQS' and `FactorsCFRAC' that perform these  algorithms
are  defined  by  the  \package{FactInt}  package.  We  demonstrate   the
``streaming'' of these algorithms in  determining  the  factorization  of
$35!+1$.

\beginexample
gap> # Now, define StreamingFactInt() on each of the slaves.
gap> BroadcastMsg( PrintToString( "StreamingFactInt :=",
>                                 StreamingFactInt ) );
gap> ParFirstResult([1,2], i->StreamingFactInt(i, 35) );
... resetting ...
[ "MPQS ALGORITHM", [ 137, 379, 17839, 340825649, 32731815563800396289317 ] ]
\endexample

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\Section{TOP-C model for non-trivial parallelism}

There are many examples  where  trivial  parallelism  does  not  suffice.
Typically, this happens either when there are global variables that  must
be ``read'' by each slave process, and possibly updated, or else when the
input for the next slave process depends on the result that arrived  from
the last slave process. Here we provide only the basics of  the  parallel
model. The parallel model was described in more  detail  in~\cite{Coo97},
and a still more detailed  description  is  contained  in  Chapter~"Basic
Concepts for the TOP-C model (MasterSlave)".

For non-trivial parallelism, {\ParGAP} uses the TOP-C model~\cite{Coo96}.
{\ParGAP} typically invokes the TOP-C model via a command such as

\){\kernttindent}MasterSlave( <GenerateTaskInput>, <DoTask>, <CheckTaskResult>,
\){\kernttindent}             <UpdateSharedData> );

The four arguments, <GenerateTaskInput>, <DoTask>, <CheckTaskResult>, and
<UpdateSharedData> are ``callback'' functions written by the  application
programmer, and the names of those callback functions are arbitrary. (The
manual for \package{ParGAP/MPI}, the  earlier  incarnation  of  {\ParGAP}
used slightly different names for its examples, but the  purpose  of  the
arguments  of  `MasterSlave()'  has  not  changed  --  only  the   naming
convention has.)

The only {\ParGAP}  command  above  is  `MasterSlave()'.  A  task  is  an
arbitrary  function  (here  called  `<DoTask>()')  executed  on  a  slave
process, that takes input from the master process and returns its  output
to the master process. The  diagram  in  Section~"Other  TOP-C  Commands"
gives some idea of the flow of  control  as  a  task  is  processed.  The
diagram there is meant to represent the three main  abstractions  of  the
TOP-C model: (1)~the task, (2)~the shared data, and (3)~the  action.  The
*shared data* consists of any  globally  shared  data  (which  should  be
readable by all processes). The *task* is a procedure executed on a slave
process that takes a *task input*, and the shared data,  and  produces  a
*task output*, which depends only on the  task  input  and  shared  data.
Finally, the task input and task output are sent to the  master  process,
which must then decide upon an *action*.

Typical actions are `NO_ACTION' (and the master process  might  save  the
task output in a private list of results), `UPDATE_SHARED_DATA'  (send  a
message to all slaves to update the local copy  of  the  globally  shared
data), and `REDO' (re-do the computation in  case  the  shared  data  was
changed by another slave process).  (Earlier  incarnations  of  {\ParGAP}
used `UPDATE'. Now the alternative `UPDATE_SHARED_DATA' is offered and we
standardize on this here.)

An example invocation of `MasterSlave()' is shown below,  where  we  pass
the four application functions as direct  arguments  of  `MasterSlave()'.
The routine below implements a  simplified  version  of  the  `ParList()'
function described in Section~"Trivial Parallelism".

\begintt
ParInstallTOPCGlobalFunction( "MyParList",
function( list, fnc )
  local result, iter;
  result := [];
  # Each invocation of GAP's iterator
  # returns next element of list.
  iter := Iterator(list);
  MasterSlave(
      # GenerateTaskInput():
      function() if IsDoneIterator(iter) then return NOTASK;
                 else return NextIterator(iter); fi; end,
      # DoTask():
      fnc,
      # CheckTaskResult():
      function(input,output) result[input] := output; 
                             return NO_ACTION; end,
      # UpdateSharedData():
      Error # We never see action:  UPDATE_SHARED_DATA
      );
  return result;
end );
\endtt

The function `ParInstallTOPCGlobalFunction()' installs `MyParList' on all
{\ParGAP} processes. It also defines the version of `MyParList()' on  the
master differently from on a slave, so that a call,  `MyParList([1..100],
x->x^2);',  on  the  master  automatically  causes   `MyParList([1..100],
x->x^2);' to be invoked on the slave with the  same  arguments.  This  is
required  for  the  TOP-C  model.  Note  that  the   distinct   function,
`ParInstallGlobalFunction()', exists for the equivalent of `BroadcastMsg(
"InstallGlobalFunction( MyParList, function ... end)");'. Both  functions
should be called only from the master (possibly from  inside  a  `Read()'
command).

The shared data consists only of the variable~`fnc', which is read by all
processes, but in this case is never ``updated''. Note that one need  not
explicitly declare variables that are in the shared data. A TOP-C  shared
data variable is defined as a variable whose value is read by  more  than
one process, and which is modified only through a call to the application
routine `<UpdateSharedData>()'. If we would like to see the messages,  as
they are  passed  back  and  forth  between  master  and  slave,  we  can
optionally set the variable  `ParTrace'  (see~"ParTrace").  We  are  then
ready to execute our new parallel function.

\beginexample
gap>  ParTrace := true;;
gap>  MyParList([2..256], AnalyzePrimitivePermGroupsOfOrder);
\endexample

The example above in fact requires only trivial  parallelism.  Hence  the
`<CheckTaskResult>()' parameter to `MasterSlave()' is especially  simple.
Since the action is never `REDO',  we  achieve  the  maximum  concurrency
among slave processes, resulting in a  speedup  that  is  linear  in  the
number of slaves.

In  general,  the  parallel  efficiency  of  an  application,  with   its
concommitant decisions about concurrency of  slave  tasks  are  typically
determined by the actions chosen  by  `<CheckTaskResult>()'.  The  sample
below is a standard {\ParGAP} ``idiom'' that allows one to easily set  up
reasonable concurrency in a parallel application.

\beginexample
CheckTaskResult := function( taskInput, taskOutput )
  if taskOutput = fail then return NO_ACTION;
  elif not IsUpToDate() then return REDO_ACTION;
  else return UPDATE_ACTION;
  fi;
end;
\endexample

The function `IsUpToDate()' is a boolean function provided  by  {\ParGAP}
that returns `true' if there  has  been  no  `UPDATE_SHARED_DATA'  action
since the time that the ``corresponding'' task input was generated  by  a
call to the user-provided  function  `<GenerateTaskInput>()'.  Otherwise,
`IsUpToDate()' returns `false'. The ``corresponding'' task input  is  the
task input associated with that task output which was most recently  seen
on the master process. In  the  context  of  the  user-provided  function
`<CheckTaskResult>()', it is the first argument to that function.

% \begintt
% SeqMasterSlave :=
%   function(GenerateTaskInput, DoTask, CheckTaskResult, UpdateSharedData)
%   local taskInput, taskOutput, action;
%   while true do
%     taskInput := GenerateTaskInput();
%     if taskInput = NOTASK then break; fi;
%     repeat
%       taskOutput := DoTask( taskInput );
%       action := CheckTaskResult( taskInput, taskOutput );
%     until action <> REDO_ACTION;
%     if action = UPDATE_ACTION then
%       # Modify the global, shared data structures here
%       # Called on all processes, master and slaves
%       UpdateSharedData( taskInput, taskOutput );
%     fi;
%   od;
% end;
% \endtt
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%
%E