Sophie: parrot-doc-1.6.0-1mdv2010.0 i586

parrot-doc-1.6.0-1mdv2010.0.i586.rpm

# Copyright (C) 2001-2007, Parrot Foundation.
# $Id: pdd25_concurrency.pod 38816 2009-05-16 04:08:22Z coke $

=head1 PDD 25: Concurrency

=head2 Abstract

This document defines the requirements and implementation strategy for
Parrot's concurrency models.

=head2 Version

$Revision: 38816 $

=head2 Description

=over 4

=item - Parrot supports multiple concurrency models, including POSIX
threads, event-based programming, and asynchronous I/O.

=item - A concurrency scheduler manages all concurrent tasks

=item - Each interpreter has its own concurrency scheduler

=item - Concurrency schedulers for different interpreters
communicate and can share tasks

=item - A concurrency scheduler may link to other schedulers as a
parent, a child, or an equal

=item - A task is a concurrent unit of work

=item - All tasks support a standard interface used by the concurrency
scheduler, but otherwise have a great deal of flexibility in their
implementation

=item - Tasks can share PMC variables

=back

=head2 Definitions

Concurrency is a parallel execution of units of code (on multiprocessor
machines), or a flexible ordering of serial units of code (on single processor
machines). For certain problem spaces, concurrency offers significant speed
gains by parceling out processor-intensive activity or by ensuring that a wait
for input or system resources doesn't hold up the entire application.

A task is a unit of code that can be executed concurrently.

=head2 Implementation

Rather than defining a single canonical threading model, Parrot defines
an infrastructure that supports multiple concurrency models and provides
for interaction between the various models. Parrot already uses multiple
concurrency models for events, threads, async I/O, and exceptions, a
trend that will only continue as we support multiple HLLs and external
threading libraries like Intel's Threading Building Blocks. Designing
for multiple concurrency models also gives Parrot more room to grow as
future models are researched and developed.

To avoid conflicts between concurrency models, Parrot provides a single
central concurrency scheduler for each interpreter instance. Each
concurrency model defines a Task PMC that supports a standard minimal
interface. The scheduler can interact with tasks from different models
without direct access to the details of each model.

On multiprocessor systems, the scheduler is responsible for allocating
tasks to processors, or for delegating that allocation to the underlying
OS.

For the most part, when we talk about concurrency, we mean concurrency
across an interpreter pool. An interpreter pool is a set of interpreter
instances that share common resources: the memory pools, arenas, and
global namespace--pretty much everything except what's in the
interpreter structure itself. They're essentially threads in the OS
sense.

Another form of concurrency is between completely independent
interpreter instances, each with their own memory pools, arenas,
namespaces, etc. Independent interpreters may run as separate processes
on the same machine, or even as separate processes on different machines
(in a clustering environment, for example). The concerns of
shared-interpreter concurrency and independent-interpreter concurrency
are similar, and in Parrot both use the same central concurrency
scheduler. This PDD doesn't directly address independent-interpreter
concurrency, but does include occasional notes on how it integrates with
shared-interpreter concurrency.

=head3 Supported Concurrency Models

The following are a few of the concurrency models Parrot intends to
support. The biggest differences between them are in how they handle
variables shared across concurrent tasks. But the design is such that
each of the different models can run simultaneously, coordinated through the
central concurrency scheduler.

=head4 Mutex/Lock Concurrency

In this model, before performing an operation on a shared variable, you
must first acquire a lock on it. Once a lock is acquired, any other
concurrent tasks that attempt to acquire a lock on that variable will
block until the existing lock is released.

A mutex is a thing that can be locked. They are not directly exposed to
users. They're non-recursive, non-read/write, exclusive things. When a
concurrent task gets a mutex, any other attempt to get that mutex will
block until the owning task releases the mutex. Mutexes are implemented
using the platform-native lock construct.

The first thing that any vtable function of a shared PMC must do is to
acquire the mutex of the PMCs in its parameter list (in ascending
address order). In this model only PMCs can be shared.

=head4 STM Concurrency

Parrot's preferred model of concurrency is based on Software
Transactional Memory. In this model, rather than locking a shared
variable while performing a series of operations on it, the changes are
bundled into a transaction that acts as an atomic unit.

Within the transaction, STM creates a "hypothetical" copy of the
variable, logs the changes made to that copy, and at the end of the
transaction performs some validation steps to decide whether to save the
hypothetical value back to the real variable (a commit) or discard the
hypothetical value (a roll back). One common validation step is to check
whether the value of the real variable was changed during the execution
of the transaction (possibly by another concurrent task).

STM tasks can read/write shared variables from mutex/lock tasks, as they
appear to the mutex/lock task as a single atomic operation. Mutex/lock
tasks can read shared variables from STM tasks, but they cannot write
them, as the STM tasks will not respect the lock and may commit a new
value in the middle of a complex operation that requires the lock. As a
safety mode, STM tasks may be configured to fail validation on any
transaction attempting to commit to a variable locked by a mutex/lock
task.

=head4 POSIX Concurrency

This is the POSIX "share-everything" style of threading, such as is used
in Perl 5's "pthread" model, as well as the thread models for Ruby and
Python. [Recommended reading: "Programming with POSIX Threads" by Dave
Butenhof.]

=head4 Process-type Concurrency

This is the Perl 5 "iThreads" threading model. In this model no data is
shared implicitly, and all sharing must be done on purpose and
explicitly. It resembles the Unix
fork-process-with-shared-memory-segment model, not a surprise as it was
originally developed with emulation of Unix's fork system in mind.

=head4 Independent Concurrency

Independent tasks have no contact with the internal data of any other
task in the current process. These are implemented as STM concurrency
but only use transactions for the shared interpreter globals.

Note that independent tasks may still communicate back and forth by
passing either atomic things (ints, floats, and pointers) or static
buffers that can become the property of the destination thread.

=head4 Intel Threading Building Blocks

Threading Building Blocks (TBB) is a library of tools for data-parallel
programming, dividing large data sets into small pieces so that
operations on those data-sets can be parallelized across multiple
processors.

Parrot will provide two levels of integration with TBB: an interface for
TBB's scheduling to interact with the central concurrency scheduler, and
an interface for developers to access the TBB routines from within
PIR/PASM.

Like Parrot, TBB is task-based. Since TBB performs its own scheduling,
TBB tasks in Parrot will be given a lightweight scheduler that only has
the responsibility of passing messages, events, etc, back and forth
between the TBB task and the central scheduler. TBB tasks will not share
variables with any other types of concurrent tasks in Parrot.

Note that since TBB is a C++ library, it is only available when Parrot
is compiled with a C++ compiler.

=head3 Concurrency Scheduler API

The concurrency scheduler has two parts, a Scheduler PMC, which has an
instance stored in the interpreter struct, and a set of core routines in
F<src/scheduler.c>.

An instance of the Scheduler PMC has 5 internal attributes, which are:

=over 4

=item 1

An unique ID for the scheduler

=item 2

The current highest assigned task ID

=item 3

The task list

=item 4

The task priority index

=item 5

The list of handlers

=back

The unique ID of the scheduler is used by other schedulers to pass messages.
With a small set of identifying information (including process ID, interpreter
ID, scheduler ID, and possibly a URL/hostname) a scheduler can address other
schedulers, both local to the current interpreter and remote.

The task list is a simple unordered integer indexed data structure, currently
implemented as a hash. Each task in the list has an integer ID assigned when
it is first inserted into the list. A task retains the same ID throughout its
lifetime, and the ID is not reused once a task is finalized. (The data
structure is currently implemented as a hash so that it only takes up the
memory required for currently living tasks. A task list for a particular
program may use thousands of task IDs, but only need memory allocated for a
handful of elements at any given moment.)

The task rank index is calculated based on the type, priority rating, age of
the tasks in the task list. The index is a simple array, and in general the
top (zeroth) element in the array is the next one to receive attention. The
index is recalculated regularly as new tasks are inserted into the task list,
existing tasks are modified or completed, and as time progresses so the age of
some tasks pushes them to a higher priority. Because of the regular
recalculation, the rank index may cache some frequently-accessed and rarely
changing data from the tasks (though it is not required to do so). (As a later
optimization, some data structure other than an array may be used to speed up
rank recalculation. For example, with a hash of hashes of arrays keyed on task
attributes, the process of inserting new tasks at a relative priority to other
existing tasks could be performed without shifting the rank of all lower
ranked tasks.)

The list of handlers is a simple stack of handler PMCs currently waiting for
an appropriate task (event, exception). See PDD 24 on Events for more details
on event handlers.

=head4 Flags

PMC flags 0-7 are reserved for private use by a PMC. The scheduler uses flag 0
to indicate whether the priority index is currently valid or needs to be
recalculated before the next use.

=head4 Vtable Functions

=over 4

=item push_pmc

Add an entry to the task list.

=item pop_pmc

Pull the next entry (the highest ranked entry in the task priority index) off
the task list. If there are no tasks remaining in the task list, return null.

=back

=head4 Methods

=over 4

=item add_handler

=begin PIR_FRAGMENT

$P1.'add_handler'($P2)

=end PIR_FRAGMENT

Add an event or exception handler to the scheduler's list of handlers.

=item find_handler

=begin PIR_FRAGMENT

$P1 = $P2.'find_handler'($P3)

=end PIR_FRAGMENT

Search for an event or exception handler $P1, in scheduler $P2, for the task
$P3. Returns a null PMC if an appropriate handler is not found.

=back

=head3 Task PMC API

The interface of the Task PMC is also the minimum required interface for all
subclasses, extensions, and alternate implementations of a task.

An instance of the Task PMC has 7 internal attributes, which are:

=over 4

=item 1

The task ID

=item 2

The type of the task

=item 3

The subtype of the task

=item 4

The priority of the task

=item 5

The birthtime stamp of the task

=item 6

The status of the task

=item 7

The code block of the task (optional)

=item 8

An interpreter structure for the task (optional)

=back

Types of tasks include 'event', 'exception', 'io', and 'code'. The subtype of
a task is used by events and exceptions to identify appropriate handlers.
Possible status values for tasks include 'created', 'invoked', 'inprocess',
and 'completed'. The final state of a task is 'destroyed', but is never
marked (the task PMC is removed from the task list and at some later point
destroyed by GC). The priority of a task is an integer value between 0 and
100, with 0 as the lowest priority.

The birthtime stamp is the point at which the task was inserted into the task
list, and is used for calculating the age of tasks.

The code block is optional and only for tasks that are associated with a
simple code block. The interpreter structure is also optional and only used
for thread-like tasks that maintain their own interpreter state.

=head4 Vtable Functions

=over 4

=item get_attr_str

Retrieve an attribute of the task.

=item set_attr_str

Set an attribute of the task.

=back

=head3 Opcodes

=over 4

=item new

=begin PIR_FRAGMENT

$P1 = new 'Task'

=end PIR_FRAGMENT

Creates a new task. (The Scheduler PMC is never instantiated directly, it is
only used by Parrot internals.)

=item schedule

=begin PIR_FRAGMENT

$P0 = new 'Task'
# set attributes
schedule $P0

=end PIR_FRAGMENT

=item join

=begin PIR_FRAGMENT_INVALID

$P0 = new 'Task'
# ... schedule the task, etc.
join $P0

=end PIR_FRAGMENT_INVALID

Wait for a particular task to complete.

=item kill

=begin PIR_FRAGMENT_INVALID

$P0 = new 'Task'
# ... schedule the task, etc.
kill $P0

=end PIR_FRAGMENT_INVALID

Kill a task without waiting for it to complete.

=back

=head2 Attachments

None.

=head2 Footnotes

None.

=head2 References

Dec 2003 - (Dan ponders threads based on POSIX and Perl 5 experience)
<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/e64b22ab7de0a7a6/889b5d8c4cd267b7?lnk=gst&q=threads&rnum=3#889b5d8c4cd267b7>

Dec. 2003 - "threads and shared interpreter data structures"
<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/e64ea4ff287e04fd/b71333e282d3d187?lnk=gst&q=threads&rnum=9#b71333e282d3d187>

Jan. 2004 - "Threads Design. A Win32 perspective."
<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/3209629b23306029/52ba9d37425ba015?lnk=gst&q=threads&rnum=8#52ba9d37425ba015>

Jan. 2004 - "Start of threads proposal"
<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/4c7de440da84d5c6/04cfb70b0d81dfba?tvc=1&q=threads#04cfb70b0d81dfba>

Sept. 2005 - "consider using OS threads"
<http://groups.google.com/group/perl.perl6.internals/browse_frm/thread/40b50e3aa9255f8e/036a87b5d2b5ed2c?lnk=gst&q=threads&rnum=2#036a87b5d2b5ed2c>

Aug. 2007 - "multi-threading a work in progress"
<http://perlmonks.org/?node_id=636466>

Concurrency as Futures -
<http://www.cincomsmalltalk.com/userblogs/mls/blogView?showComments=true&entry=3336838959>

Io language - <http://www.iolanguage.com/about/>

Java memory and concurrency - http://www.cs.umd.edu/~pugh/java/memoryModel/

=cut

__END__
Local Variables:
fill-column:78
End:
vim: expandtab shiftwidth=4: