%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %W tutorial.tex ParGAP documentation Gene Cooperman %% %H $Id: tutorial.tex,v 1.3 2001/11/16 15:57:00 gap Exp $ %% %Y Copyright (C) 1999-2001 Gene Cooperman %Y See included file, COPYING, for conditions for copying %% \Chapter{Tutorial} Section~"Trivial Parallelism" covers trivial parallelism (the simplest and most common form of parallel application). This is followed in Section~"Using ParGAP interactively" by a description of how to use {\ParGAP} interactively, and Section~"Streaming" illustrates these principles with a short implementation of parallel streaming. Streaming refers to simultaneously running different algorithms for the same problem in different ``streams'' or processes, and accepting the first answer that returns. The remaining processes for the other algorithms are then terminated. {\ParGAP} allows those processes to reside on a single CPU or on different CPUs. Hence, in the case of streaming, {\ParGAP} is potentially useful even if one has only a single computer available, since multiple streams can still reside locally in separate processes. For more sophisticated (non-trivial) parallel applications, the TOP-C model of parallelism is recommended, and it is described in Section~"TOP-C model for non-trivial parallelism". %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Trivial Parallelism} In parallel computation, perhaps 80\% of the applications fall under the heading of ``trivial parallelism''. These are situations in which one must compute many unrelated cases. For example, perhaps 200 different cases must be computed and results returned for each case and each case has no interaction with any other one. In the absence of shared data, it would be common to start 20 {\GAP} jobs on 20 distinct workstations in a student computer laboratory. If there are 200 cases, then one writes 20 different ``batch jobs'', each batch job handling 10 distinct cases. {\ParGAP} provides strong support for this common situation. Conceptually, one is always talking to {\ParGAP} on a master process (the local workstation), and the master process talks to each of various slave processes. In its simplest form, one merely generates a list of inputs for the cases, and writes a suitable function to provide the case information. Effectively, trivial parallelism can be expressed by a parallel version of {\GAP}'s `List()' function (see~"ref:List" in the Reference Manual), which is called `ParList()' (see~"ParList") in {\ParGAP}. The following is an example of a {\ParGAP} session that uses the `ParList()' function. Observe the use of `BroadcastMsg()' (see~"BroadcastMsg") to ensure that the function `AnalyzePrimitivePermGroupsOfOrder()' is known to the slaves. \beginexample gap> AnalyzePrimitivePermGroupsOfOrder := function(ord) > #returns a list of data records of all primitive > #permutation groups of order ord that are represented > #as permutation groups in GAP's primitive group database > > local PrimitiveGroupsOfOrder, AnalyzePrimitivePermGroup; > > PrimitiveGroupsOfOrder := > ord -> List( [1..NrPrimitiveGroups(ord)], > i -> PrimitiveGroup(ord,i) ); > > AnalyzePrimitivePermGroup := > grp -> rec( order := Size(grp), > degree := NrMovedPoints(grp), > baseSize := Size(BaseOfGroup(grp)) ); > > return List( Filtered( PrimitiveGroupsOfOrder(ord), > IsPermGroup ), > AnalyzePrimitivePermGroup ); > end;; gap> #Define AnalyzePrimitivePermGroupsOfOrder on slaves gap> BroadcastMsg( > PrintToString("AnalyzePrimitivePermGroupsOfOrder := ", > AnalyzePrimitivePermGroupsOfOrder ) ); gap> ParList( [2..256], AnalyzePrimitivePermGroupsOfOrder ); [ [ rec( order := 2, degree := 2, baseSize := 1 ) ], [ rec( order := 3, degree := 3, baseSize := 1 ), <...many lines of output omitted...> \endexample As one expects, the answer is computed in parallel in a time that decreases linearly with the number of processors. Hence a Beowulf cluster of 16 processors (1 {\ParGAP} master and 15 {\ParGAP} slaves) will return this information approximately 15 times as fast as a single processor, greatly facilitating interactive exploration of the properties of various groups. If each single case can be executed quickly, then the execution time for `ParList()' can be dominated by the network overhead imposed by message passing between the master and slaves. In this case, an optional third parameter to `ParList()' specifies the number of cases to be submitted to a slave in a single message (modulo how many cases are left, of course). As an example, suppose we are interested in finding those of the first 100,000 integers that have many distinct prime factors, but to reduce the network overhead we wish to batch 1000 cases at a time to the slaves. Then one is able to execute the following. \beginexample gap> ParList([1..100000], > i -> Length(PrimePowersInt(i))/2, 1000); \endexample %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Using ParGAP interactively} Having seen an example where {\ParGAP} can be useful, one next needs to know what is involved in setting up and using {\ParGAP}. {\ParGAP} runs only on UNIX, due to its dependence on an MPI subset (included in the distribution) that calls UNIX sockets. It should be possible to port {\ParGAP} to Windows, either through the use of Cygwin (UNIX emulation on Windows) or through a native Windows MPI implementation. Similarly, the code has been written in a ``vanilla'' manner, so that in principle, it should be easy to port to another interactive language, such as MAGMA or LISP. An older version was in fact written in LISP~\cite{Coo95}, although the current version is not implemented in LISP. In the following, recall that {\ParGAP} employs a master-slave architecture, with the local process being the ``master'' process, and all others being ``slaves''. {\ParGAP} is installed in the same way as other {\GAP} packages. After extracting the {\ParGAP} files, one invokes `configure' and `make'. (See Section~"Installing ParGAP" for the details.) One next writes a file with a list of slave processors to attach, and an indication of where {\ParGAP} is located on that processor; the usual way is to name this file `procgroup' and have it in the directory one starts up {\ParGAP} (see Section~"Running ParGAP" for details). An example `procgroup' file is: \begintt # This is a sample ParGAP procgroup file. # Lines starting with a # are comments. # Blank lines are also treated as comments. # The following line generates the MASTER process local 0 # Each of the following `localhost' lines generates # a SLAVE process on the same machine as the MASTER localhost 1 /proj/sparc/gap4r3/pkg/pargap/bin/pargap.sh localhost 1 /proj/sparc/gap4r3/pkg/pargap/bin/pargap.sh # The following line generates a SLAVE process on a remote # machine: `iron.ccs.neu.edu' (called via rsh or ssh or ...) iron.ccs.neu.edu 1 /proj/alpha/gap4r3/pkg/pargap/bin/pargap.sh \endtt Next, instead of invoking `gap', one then invokes the supplied shell script, `pargap.sh', (possibly modified and/or renamed; again see Section~"Running ParGAP" for the details). Observe that unlike most other {\GAP} packages, one should not call `RequirePackage()' function from within a {\GAP} session to start {\ParGAP}. As usual, master and slave processes read a `.gaprc' file in the home directory, if present. {\ParGAP} looks for a file named `procgroup' file in the current directory, but can be directed to use a different file through the `-p4pg' switch. The following example assumes that `pargap' is a symbolic link or alias for the full pathname of `pargap.sh'. Hence, if one always uses {\ParGAP} with the slave processes specified in `myprocgroup.big', one may wish to set up an alias or else a symbolic link to a script containing the equivalent of: \){\kernttindent}pargap.sh -p4pg <PATH>/myprocgroup.big Observe also that heterogeneous computing is supported, in that each slave machine may be a different dialect of UNIX running under different hardware. Within {\ParGAP}, the standard message-passing commands (send, receive, probe, etc.) are available. The slaves are numbered consecutively starting at~1. If no slave parameter is supplied for `SendMsg()' or `SendRecvMsg()', then the default slave is~1. Most other commands take a default slave parameter, `MPI_ANY_SOURCE' (meaning ```ANY_SLAVE'''). `PingSlave()' sends a ``null message'', and waits for an acknowledgement that the slave is alive. The command `SendMsg(<string>, <i>);' sends the (string) message <string> to slave <i>. Slave~<i> then evaluates <string> as a {\GAP} command, and returns an answer. If a command has no answer, (such as `"x:=3;"'), then the string `"<no_return_val>"' is returned. If multiple commands are sent, such as `"x:=3; y:=4; x+y;"', then only the last return value is returned. Note that the final semicolon, normally required by {\GAP}, is optional in a command string for a message. Most of the other commands are clear from context. Every `SendMsg()' to a slave must be balanced by a `RecvMsg()', except where the commands `FlushAllMsgs()' and `ParReset()' are used to flush any pending messages from a slave. A `SendRecvMsg(<string>, <i>)' command is equivalent to the consecutive commands `SendMsg(<string>, <i>)' and `RecvMsg(<i>)'. The command `BroadcastMsg()' sends its string argument to each slave, and there is no reply message. `ParEval()' is like `BroadcastMsg()', but it also evaluates on the master process. If one is unsure whether there is a pending message, one can invoke `ProbeMsgNonBlocking()' or `ProbeMsg()'; if no slave parameter is provided, the default is `MPI_ANY_SOURCE'. The command `ParRead()' is a parallel version of {\GAP}'s `Read()' command; it is useful for ensuring that the slaves and master share a similar environment. Other examples of commands that are analogous to {\GAP}'s built-in commands include: `ParReread', `ParEval', `ParInstallGlobalFunction' and `ParInstallGlobalFunction'. See Section~"Slave Listener Commands" for more complete descriptions of the above commands. An illustration of the usage of the commands mentioned above with a sample {\ParGAP} session, for which a `procgroup' file has defined two slaves is the first example given in Section~"Extended Example". It's recommended that you review that example at this point. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{Streaming} Next, we demonstrate a simple implementation of ``streaming'' as an example of how easy it is to use the {\ParGAP} tools to provide new functionality. To the best of this author's knowledge, the term ``streaming'' originated with Charles Leedham-Green at the conference on which this proceedings is based, where he made a general request for streaming functionality to be implemented in some software system. The idea of streaming is that one may have two algorithms for solving a problem. One would like to to solve the problem simultaneously on two processors, each using a distinct algorithm. Whichever processor finds the solution first should report back to the master process. The master process should then interrupt the other slave, so as to make it available for further computation. This is useful when one has a heuristic available that may finish early with the correct answer, or may continue for a very long time. The heuristic can then be ``streamed'' alongside the standard algorithm, in full confidence that the standard algorithm will provide a reasonable upper bound on the time to compute an answer. One way to provide ``streaming'' functionality is by the implementation of a function we describe below that is inspired by {\GAP}'s `First()' function (see~"ref:First" in the {\GAP} Reference Manual). The function `First()' takes a list and boolean function as arguments, and returns the first element of the list for which the boolean function returns `true'. In contrast, we define a corresponding {\ParGAP} function, `ParFirstResult()', which takes a list and a (general) function as arguments. Messages are sent to the slaves causing the $i$th slave to evaluate the function on the $i$th list element. The value returned is the value obtained from whichever slave finishes first. Note that in consistency with the goal of streaming, the function signals an error if asked to evaluate a list with more entries than the number of slaves. \beginexample gap> ParFirstResult := function( list, fnc ) > local i, result; > if Length(list) > TOPCnumSlaves then > Error("too few slaves"); > fi; > for i in [1..Length(list)] do > SendMsg( PrintToString( > "fnc :=", fnc, "; fnc(", list[i], ");"), > i ); > od; > result := RecvMsg(); # default is MPI_ANY_SOURCE > ParReset(); # Interrupt all other slaves > return result; > end;; \endexample We now demonstrate the use of `ParFirstResult()' in ``streaming''. For our example we need the \package{FactInt} package by Stefan Kohl. To get this package if you don't have it, visit \URL{http://www-gap.dcs.st-and.ac.uk/gap/Share/factint.html} or the equivalent at one of {\GAP}'s mirror sites, and follow the easy installation instructions. If one hasn't already included a `RequirePackage("factint");' statement in one's `.gaprc' file then it is necessary to do: \beginexample gap> ParEval("RequirePackage(\"factint\")"); Loading FactInt 1.1 (Routines for Integer Factorization), by Stefan.Kohl@cip.mathematik.uni-stuttgart.de true \endexample so that each slave (not just the master) is aware of the \package{FactInt} functions. Above we defined `ParFirstResult()' on the master process. We will assume that we have two slaves. \beginexample gap> StreamingFactInt := function(i, x) > local alg; > alg := > [ x -> [ "MPQS ALGORITHM", FactorsMPQS(Factorial(x)+1) ], > x -> [ "CFRAC ALGORITHM", FactorsCFRAC(Factorial(x)+1) ] > ]; > return alg[i](x); > end;; \endexample Both `StreamingFactInt(1, x);' and `StreamingFactInt(2, x);' factor an integer, but one uses the multiple polynomial quadratic sieve algorithm (MPQS), and the other uses the continued fraction algorithm (CFRAC); the functions `FactorsMPQS' and `FactorsCFRAC' that perform these algorithms are defined by the \package{FactInt} package. We demonstrate the ``streaming'' of these algorithms in determining the factorization of $35!+1$. \beginexample gap> # Now, define StreamingFactInt() on each of the slaves. gap> BroadcastMsg( PrintToString( "StreamingFactInt :=", > StreamingFactInt ) ); gap> ParFirstResult([1,2], i->StreamingFactInt(i, 35) ); ... resetting ... [ "MPQS ALGORITHM", [ 137, 379, 17839, 340825649, 32731815563800396289317 ] ] \endexample %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Section{TOP-C model for non-trivial parallelism} There are many examples where trivial parallelism does not suffice. Typically, this happens either when there are global variables that must be ``read'' by each slave process, and possibly updated, or else when the input for the next slave process depends on the result that arrived from the last slave process. Here we provide only the basics of the parallel model. The parallel model was described in more detail in~\cite{Coo97}, and a still more detailed description is contained in Chapter~"Basic Concepts for the TOP-C model (MasterSlave)". For non-trivial parallelism, {\ParGAP} uses the TOP-C model~\cite{Coo96}. {\ParGAP} typically invokes the TOP-C model via a command such as \){\kernttindent}MasterSlave( <GenerateTaskInput>, <DoTask>, <CheckTaskResult>, \){\kernttindent} <UpdateSharedData> ); The four arguments, <GenerateTaskInput>, <DoTask>, <CheckTaskResult>, and <UpdateSharedData> are ``callback'' functions written by the application programmer, and the names of those callback functions are arbitrary. (The manual for \package{ParGAP/MPI}, the earlier incarnation of {\ParGAP} used slightly different names for its examples, but the purpose of the arguments of `MasterSlave()' has not changed -- only the naming convention has.) The only {\ParGAP} command above is `MasterSlave()'. A task is an arbitrary function (here called `<DoTask>()') executed on a slave process, that takes input from the master process and returns its output to the master process. The diagram in Section~"Other TOP-C Commands" gives some idea of the flow of control as a task is processed. The diagram there is meant to represent the three main abstractions of the TOP-C model: (1)~the task, (2)~the shared data, and (3)~the action. The *shared data* consists of any globally shared data (which should be readable by all processes). The *task* is a procedure executed on a slave process that takes a *task input*, and the shared data, and produces a *task output*, which depends only on the task input and shared data. Finally, the task input and task output are sent to the master process, which must then decide upon an *action*. Typical actions are `NO_ACTION' (and the master process might save the task output in a private list of results), `UPDATE_SHARED_DATA' (send a message to all slaves to update the local copy of the globally shared data), and `REDO' (re-do the computation in case the shared data was changed by another slave process). (Earlier incarnations of {\ParGAP} used `UPDATE'. Now the alternative `UPDATE_SHARED_DATA' is offered and we standardize on this here.) An example invocation of `MasterSlave()' is shown below, where we pass the four application functions as direct arguments of `MasterSlave()'. The routine below implements a simplified version of the `ParList()' function described in Section~"Trivial Parallelism". \begintt ParInstallTOPCGlobalFunction( "MyParList", function( list, fnc ) local result, iter; result := []; # Each invocation of GAP's iterator # returns next element of list. iter := Iterator(list); MasterSlave( # GenerateTaskInput(): function() if IsDoneIterator(iter) then return NOTASK; else return NextIterator(iter); fi; end, # DoTask(): fnc, # CheckTaskResult(): function(input,output) result[input] := output; return NO_ACTION; end, # UpdateSharedData(): Error # We never see action: UPDATE_SHARED_DATA ); return result; end ); \endtt The function `ParInstallTOPCGlobalFunction()' installs `MyParList' on all {\ParGAP} processes. It also defines the version of `MyParList()' on the master differently from on a slave, so that a call, `MyParList([1..100], x->x^2);', on the master automatically causes `MyParList([1..100], x->x^2);' to be invoked on the slave with the same arguments. This is required for the TOP-C model. Note that the distinct function, `ParInstallGlobalFunction()', exists for the equivalent of `BroadcastMsg( "InstallGlobalFunction( MyParList, function ... end)");'. Both functions should be called only from the master (possibly from inside a `Read()' command). The shared data consists only of the variable~`fnc', which is read by all processes, but in this case is never ``updated''. Note that one need not explicitly declare variables that are in the shared data. A TOP-C shared data variable is defined as a variable whose value is read by more than one process, and which is modified only through a call to the application routine `<UpdateSharedData>()'. If we would like to see the messages, as they are passed back and forth between master and slave, we can optionally set the variable `ParTrace' (see~"ParTrace"). We are then ready to execute our new parallel function. \beginexample gap> ParTrace := true;; gap> MyParList([2..256], AnalyzePrimitivePermGroupsOfOrder); \endexample The example above in fact requires only trivial parallelism. Hence the `<CheckTaskResult>()' parameter to `MasterSlave()' is especially simple. Since the action is never `REDO', we achieve the maximum concurrency among slave processes, resulting in a speedup that is linear in the number of slaves. In general, the parallel efficiency of an application, with its concommitant decisions about concurrency of slave tasks are typically determined by the actions chosen by `<CheckTaskResult>()'. The sample below is a standard {\ParGAP} ``idiom'' that allows one to easily set up reasonable concurrency in a parallel application. \beginexample CheckTaskResult := function( taskInput, taskOutput ) if taskOutput = fail then return NO_ACTION; elif not IsUpToDate() then return REDO_ACTION; else return UPDATE_ACTION; fi; end; \endexample The function `IsUpToDate()' is a boolean function provided by {\ParGAP} that returns `true' if there has been no `UPDATE_SHARED_DATA' action since the time that the ``corresponding'' task input was generated by a call to the user-provided function `<GenerateTaskInput>()'. Otherwise, `IsUpToDate()' returns `false'. The ``corresponding'' task input is the task input associated with that task output which was most recently seen on the master process. In the context of the user-provided function `<CheckTaskResult>()', it is the first argument to that function. % \begintt % SeqMasterSlave := % function(GenerateTaskInput, DoTask, CheckTaskResult, UpdateSharedData) % local taskInput, taskOutput, action; % while true do % taskInput := GenerateTaskInput(); % if taskInput = NOTASK then break; fi; % repeat % taskOutput := DoTask( taskInput ); % action := CheckTaskResult( taskInput, taskOutput ); % until action <> REDO_ACTION; % if action = UPDATE_ACTION then % # Modify the global, shared data structures here % # Called on all processes, master and slaves % UpdateSharedData( taskInput, taskOutput ); % fi; % od; % end; % \endtt %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %E