Sophie: gap-system-4.4.12-5mdv2010.0 i586

gap-system-4.4.12-5mdv2010.0.i586.rpm

                           The ParGAP package

The ParGAP (Parallel GAP) package provides  a  way  of  writing  parallel
programs using the  GAP  language.  Former  names  of  the  package  were
ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface,
a well-known standard  for  parallelism.  ParGAP  is  based  on  the  MPI
standard, and this distribution includes a subset implementation of  MPI,
to provide a portable layer with a high level interface to  BSD  sockets.
Since knowledge of MPI is not required for use of this software,  we  now
refer to the package as simply ParGAP. For  more  information  visit  the
author's ParGAP home page at:

  http://www.ccs.neu.edu/home/gene/pargap.html

ParGAP works only under UNIX. (Cygwin is a possible  option  on  Windows,
but you will have to port it yourself.)

ParGAP may be obtained as `pargap-XXX.zoo' (for some version number  XXX)
from the same places as GAP. The main FTP servers are:

  ftp://ftp-gap.dcs.st-and.ac.uk/pub/gap/gap4/share/
  ftp://ftp.math.rwth-aachen.de/pub/gap4/share/
  ftp://ftp.ccs.neu.edu/pub/mirrors/ftp-gap.dcs.st-and.ac.uk/pub/gap/gap4/share/
  ftp://pell.anu.edu.au/pub/algebra/gap4/share/

`pargap-XXX.zoo' is also available via the GAP www page at

  http://www-gap.dcs.st-and.ac.uk/~gap/Info4/share.html
  http://aldebaran.math.rwth-aachen.de/~GAP/Info4/share.html
  http://mirrors.ccs.neu.edu/GAP/NEU/Info4/share.html
  http://wwwmaths.anu.edu.au/research.groups/algebra/GAP/www/Info4/share.html

or, alternatively, `pargap-XXX.tar.gz' (which is  assured  to  be  the
most recent version) can be obtained from the author's ftp site:

  ftp://ftp.ccs.neu.edu/pub/people/gene/pargapmpi/

ParGAP has been tested on Linux (ELF), Solaris 2.6 and OSF 1 (alpha).


                      Installing the ParGAP package

To  install  the  ParGAP  package,  move  the  file  `pargap-XXX.zoo'  or
`pargap-XXX.tar.gz' into the `pkg' directory in which you plan to install
ParGAP. Usually, this will be the directory `pkg'  in  the  hierarchy  of
your version of GAP 4. If your version of GAP 4 is earlier than  GAP  4.3
then there are  a  couple  of  adjustments  to  GAP's  `lib/init.g'  file
required (see item 0. of the next section). Also note that  currently  it
is not possible to have the `pkg' directory  separate  from  GAP's  `pkg'
directory; we hope to remedy this in future versions of ParGAP  (so  that
it will also possible to keep  an  additional  `pkg'  directory  in  your
private directories; section "ref:Installing GAP Packages" of  the  GAP 4
reference manual gives details on how to do this,  when  it's  possible.)
(If you are not a system  administrator  and  your  system  administrator
won't install ParGAP for you on the system and you don't have enough disk
space in your own directory to create a whole new GAP, what you can do is
create the illusion of having a complete  version  of  GAP  in  your  own
directory using symbolic  links  (sorry!  currently  that's  all  we  can
offer.)

Now change into the `pkg' directory in which you plan to install  ParGAP.
If you got a `.zoo' file, unpack it with:

  unzoo -x pargap-XXX

If you got a `.tar.gz' file and  your  `tar'  command  supports  the  `z'
option, unpack it with:

  tar zxf pargap-XXX.tar.gz

or otherwise unpack in two steps with:

  gunzip pargap-XXX.tar
  tar xvf pargap-XXX.tar

Whether you got the `.zoo' or `.tar.gz' archive you should now have a new
directory `pargap'. As for a generic GAP package, do:

  cd pargap
  ./configure ../..
  make

Your ParGAP should now be ready to use. In the `bin'  subdirectory  there
will be a script

  pargap.sh

which you should use to start ParGAP. Edit the script if necessary,  copy
it to a standard path and rename it according to how you intend  to  call
ParGAP (e.g. rename it: `pargap'). Also, in the `bin' subdirectory  is  a
sample `procgroup' file which defines the master and slave processes that
will be used by ParGAP. When ParGAP is started it looks for a file called
`procgroup' in the current directory, unless the `-p4pg' option is  used.
Thus if you renamed your shell script `pargap', the following  are  valid
ways of starting ParGAP:

  pargap

(if current directory contains the file: `procgroup'), or

  pargap -p4pg myprocgroupfile

(where `myprocgroupfile' is the complete path of your  procgroup  file  -
there is no restriction on how you name it).

If you had trouble installing ParGAP, please see the next section of this
file. Otherwise, try it out:

gap> # This assumes your procgroup file includes two slave processes.
gap> PingSlave(1); #a `true' response indicates Slave 1 is alive
true
gap> # Print() on slave appears on standard output 
gap> # i.e. after the master's prompt.
gap> SendMsg( "Print(3+4)" );
gap> 7
gap> # A <return> was input above to get a fresh prompt.
gap> #
gap> # To get special characters (including newline: `\n')
gap> # into a string, escape them with a `\'.
gap> SendMsg( "Print(3+4,\"\\n\")" );
gap> 7

gap> # Again, a <return> was input above after the 7 and new-line
gap> # were printed to get a fresh prompt.
gap> #
gap> # Each SendMsg() is normally balanced by a RecvMsg().
gap> SendMsg( "3+4", 2);
gap> RecvMsg( 2 );
7
gap> # The following is equivalent to the two previous commands.
gap> SendRecvMsg( "3+4", 2);
7
gap> # Flush any messages that are pending. The response is
gap> # the number of messages flushed. (Above, the two
gap> # SendMsg("Print...") (to the default slave: 1) did not
gap> # have a corresponding RecvMsg() command.)
gap> FlushAllMsgs();
2
gap> # As with Print() the result of Exec() appears on standard
gap> # output. Print() and Exec() are each `no-value' functions,
gap> # and so the result of a RecvMsg() in these cases
gap> # is "<no_return_val>".
gap> SendRecvMsg( "Exec(\"pwd\")" ); # Your pwd will differ :-)
/home/gene
"<no_return_val>"
gap> # Put default slave into an infinite loop.
gap> SendMsg("while true do od");
gap> # Default slave can't execute the next command until it's 
gap> # finished with the previous command.
gap> SendMsg("Print(\"WAKE UP\\n\")");
gap> # Check to see if a message is waiting to be collected but
gap> # return immediately (i.e. don't get blocked by waiting for
gap> # a message to appear). A `false' response indicates the
gap> # infinite loop hasn't terminated and produced a value yet!
gap> ProbeMsgNonBlocking();
false
gap> # Send an interrupt to each slave, slave 1 will see the
gap> # following command and print `WAKE UP', and then all
gap> # pending messages are flushed.
gap> ParReset();
... resetting ...
WAKE UP
0
gap> # The return value, 0, from ParReset() indicates there
gap> # were 0 pending messages flushed, confirming correctness
gap> # of ProbeMsgNonBlocking() when it returned "false"
gap> SendRecvMsg( "a:=45; 3+4", 1 );
7
gap> # Note "a" is defined on slave 1, not slave 2.
gap> SendMsg( "a", 2 ); # Slave prints error, output on master
gap>  Variable: 'a' must have a value
gap> # <return> entered to get fresh prompt.
gap> RecvMsg( 2 ); # No value for last SendMsg() command
"<no_return_val>"
gap> RecvMsg( 1 );
45
gap> myfnc := function() return 42; end;;
gap> # Use PrintToString() to define myfnc on all slave processes
gap> BroadcastMsg( PrintToString( "myfnc := ", myfnc ) );
gap> SendRecvMsg( "myfnc()", 1 );
42
gap> FlushAllMsgs(); # There are no messages pending.
0
gap> # Execute analogue of GAP's List() in parallel on slaves.
gap> squares := ParList( [1..100], x->x^2 );
[ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 
  289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 
  900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 
  1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 
  2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 
  3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 
  5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 
  7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 
  9216, 9409, 9604, 9801, 10000 ]
gap> # Ensure problem environment is read into master and slaves.
gap> # Try one of your GAP program files instead.
gap> ParRead( "/home/gene/myprogram.g");


The ParGAP package was designed and written by:

  Gene Cooperman
  College of Computer Science
  Northeastern University, Boston, MA, U.S.A.

If you use ParGAP to solve a problem then please send a  short  email  to
`gene@ccs.neu.edu' about it, and reference the ParGAP package as follows:

\bibitem[Coo99]{Coo99}
      Cooperman, Gene,
      {\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1,
      College of Computer Science, Northeastern University, 1999,
      \verb|http://www.ccs.neu.edu/home/gene/pargapmpi.html|.

=========================================================================

                             Troubleshooting

0.  In versions of GAP earlier than GAP 4.3 some ParGAP ``hooks'' need to
    be added to GAP's `lib/init.g' file. Please add:

       PAR_GAP_SLAVE_START := fail;

    before the line:

       READ(GAP_RC_FILE);

    and add:

       if PAR_GAP_SLAVE_START <> fail then PAR_GAP_SLAVE_START(); fi;

    at the end of the file.

1.  Do you have enough swap space to support multiple  GAP  processes?  A
    simple way to check this is with the UNIX command, `top'.  The  Linux
    version of `top' sorts by memory usage if you type `M'.

2.  `make' tries to automatically create:

       pkg/pargap/bin/pargap.sh
       
    and copy the parameters from `<GAP_ROOT>/bin/gap.sh'. <GAP_ROOT>  was
    specified when  you  executed  `./configure  <GAP_ROOT>'  to  install
    ParGAP. This can be error-prone if your site has an unusual setup. If
    you execute `<GAP_ROOT>/bin/gap.sh', does gap come up? If so, compare
    it   with   `pargap.sh'   and   check   for   correct   settings   in
    `.../pkg/pargap/bin/pargap.sh'?

3.  Did ParGAP find your `procgroup' file?
    [It looks in the current directory for `procgroup', or for:

          ... -p4pg PATH/procgroup

     on the command line.]

4.  Were the remote slave processes able to start up? If so,  could  they
    connect back to  the  master?  To  test  connectivity  problems,  try
    manually starting a remote slave by executing a line in  the  script.
    Try a simple `rsh remote_hostname'  to  see  if  the  issue  is  with
    security.

5.  If  the  previous  step  failed  due  to  security  issues,  such  as
    requesting a password, you have several options. `man rshd' tells you
    the security model at your site (or possibly `man  ssh'  if  you  use
    that). Then read "Problems with Passwords (Getting Around  Security)"
    in the ParGAP manual in the `doc' directory.

6.  Is the `procgroup' file in your current directory set correctly?
    Test it.  If you are calling it on a remote host, manually type:

       rsh <HOSTNAME> <ParGAP>

    where <HOSTNAME> and <ParGAP> appear exactly as in `procgroup', e.g.
    
       rsh denali.ccs.neu.edu /usr/local/gap4r3/bin/pargap.sh

    In some cases, `exec' is used to save process overhead. Also try:

       rsh <HOSTNAME> exec <ParGAP>

    If you plan to call it on localhost, try just:   <ParGAP>

    Note that if not all the slave processes succeed in connecting
    to the master, then ParGAP writes out a file:

       /tmp/pargapmpi--rsh.$$
       
    where $$ is replaced by the the process id of the ParGAP process.

7.  Is `pargap' listed in `.../pkg/ALLPKG'?
    [It's needed to autostart slaves.]

8.  Inside ParGAP, has MPI been successfully initialized?
    Try:  
    
    gap> MPI_Initialized();

9.  A remote (slave) ParGAP process starts in  your  home  directory  and
    tries to cd to a directory of the same name as your local  directory.
    Check your assumptions about the remote machine. Try:

    gap> SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()");
    gap> SendRecvMsg("UNIX_Getpid()");

10. If the connection dies at random, after some period of time:
    You can experiment with SO_KEEPALIVE and variants.  (man  setsockopt)
    This periodically sends *null messages* so the  remote  machine  does
    not think that the originating  machine  is  dead.  However,  if  the
    remote machine fails to reply, the  local  process  sends  a  SIGPIPE
    signal to notify current processes of a broken  socket,  even  though
    there might have been only a temporary lapse in connectivity.
    `ssh' specifies `KeepAlive yes' by default, but setting `KeepAlive no'
    might get you through some transient lapses in  connectivity  due  to
    high congestion. 
    You may also want to experiment with: `setenv RSH "rsh -n"'

11. Read the documentation for further possible problems.

=========================================================================

                               Final Notes

Note that this package modifies  the  GAP  `src'  and  `bin'  files,  and
creates a  new  GAP  kernel.  This  new  GAP  kernel  can  be  shared  by
traditional users of the old, sequential GAP kernel, and by  those  doing
parallel processing.

The GAP kernel will have identical behavior to the old  GAP  kernel  when
invoked through the gap.sh script or the `bin/@GAParch@/gap' binary.  The
new ParGAP variables will appear to the end user _ONLY_ if the GAP binary
was invoked as `pargapmpi': a symbolic link to the actual GAP binary. The
script, `pargap.sh', does this.

So, in a multi-user environment, traditional users can  continue  to  use
`gap.sh'  without  noticing  any  difference.  Only  an   invocation   as
`pargap.sh' will add the new features.

Comments and contributions to a ParGAP user library, or any other type of
assistance, are gratefully accepted.

							Gene Cooperman
							gene@ccs.neu.edu