Sophie: waf-1.5.9-1mdv2010.0 noarch

waf-1.5.9-1mdv2010.0.noarch.rpm

<?xml version='1.0'?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
>
<chapter id="task_system">
	<title>The Task system</title>
	<section id="task_creation">
		<title>Task creation and execution</title>
		<para>
			When Waf tasks are created, they are not executed immediately, for example it is not necessary to rebuild an application if the source files have not changed. Some tasks may also depend on other tasks which may not be created yet. For this reasons Waf delays the execution of tasks to the point when all build functions have been executed.
		</para>
		<para>
			Creating all tasks by hand is a tedious process that the task generators (<xref  linkend="task_gen"/>) may automate. Before starting the build, Waf asks each task generator to produce the corresponding tasks. If Waf is launched from a sub folder inside the source directory, it will try to avoid the creation of the tasks that are not relevant for that particular sub folder (optimization).
		</para>
		<para>
		 	Once the tasks are created, Waf will review each of them one by one to decide whether to execute them or not. A summary can be found on the following diagram:
			<graphic format="png" fileref="task_execution.png" align="center"/>
		</para>
	</section>

	<section id="task_execution">
		<title>Task execution</title>
		<para>
			Executing a task consists in calling the method <emphasis>run</emphasis> on that task, and setting the task execution state.
			The following diagram is a summary of the process:
			<graphic format="png" fileref="task_run.png" align="center"/>
			The method <emphasis>post_run</emphasis> can be used to check if the files have been produced, it must throw an OSError if the task has not completed properly.
		</para>
	</section>

	<section id="task_parallel">
		<title>Task execution in parallel</title>
		<para>
			Tasks may be executed in parallel to take advantage of the hardware (multi-core) or the environment (distributed builds). By default Waf does not execute immediately the tasks that are ready. Instead, tasks are added to a queue which is consumed by threads. Waf detects the number of installed processors. For uni-processor only one task is executed at a time, for dual-processors two tasks are executed at a time, and so on. To disable task parallelization, use the option <emphasis>-j1</emphasis>. To enhance parallelization, use the option <emphasis>-j</emphasis> with the amount of consumers:
			<programlisting language="sh">
$ waf -j3
			</programlisting>
		</para>
		<para>
			By default, Waf does not allow consumer threads to access the tasks directly:
			<itemizedlist>
				<listitem>There is little need for parallelizing the computation of the next task to execute, choosing the next task is fast enough</listitem>
				<listitem>The thread issues are limited to a very small section of the code</listitem>
				<listitem>The producer-consumer scheme prevents <emphasis>busy waiting</emphasis> for the next task</listitem>
				<listitem>A simple global error handler can be used for processing the errors and to decide to stop the build</listitem>
			</itemizedlist>
			The following illustrates the relationship producer-consumer performed for the builds:
			<graphic format="png" fileref="parallel.png" align="center"/>
		</para>
	</section>

	<section id="task_execution_order">
		<title>Task execution order</title>
			<para>
				Running tasks in parallel is a simple problem, but in practice it is more complicated:
				<itemizedlist>
					<listitem>Dependencies can be discovered during the build (dynamic task creation)</listitem>
					<listitem>New ordering constraints can be discovered after files are compiled</listitem>
					<listitem>The amount of tasks and ordering constraints (graph size) can be huge and performance may be a problem</listitem>
				</itemizedlist>

				To make the problem more simple, it is divided by the different concerns, and the ordering constraints can be given on three different levels:
				<orderedlist>
					<listitem>groups of tasks may run only after another group of tasks has finished to run, this represents a strict sequential order between groups of tasks, for example a compiler is produced and used to compile the tasks in the next group</listitem>
					<listitem>task types to indicate the instance will run after other task type instances, for example linking object files may only occur after compiling the source files</listitem>
					<listitem>specific constraints for task instances that can only run after a few other task instances</listitem>
				</orderedlist>
			</para>
			<sect2>
				<title>Task groups</title>
				<para>
					In some circumstances it is necessary to build a compiler and all its dependencies before using it for executing some other tasks (bootstrapping). The following demonstrates how declare groups of tasks to be executed after other groups of tasks:
					<programlisting language="python">
def build(bld):
	bld.new_task_gen(features='cc cprogram', source='main.c', target='mycompiler')
	bld.add_group()
	bld.new_task_gen(features='cc cprogram', source='user.c', target='someotherapp')
					</programlisting>
					The effect of task groups when running tasks in parallel is illustrated by the following diagram. Three groups of tasks have been added, and the execution of the next group only starts when the execution of the tasks in the previous group is complete.
					<graphic format="png" fileref="output-ADDGROUP.png" align="center"/>
				</para>
				<para>
					It is possible to create groups at any point in the scripts, and to add the task generators to any group previously created. Adding groups for specific folders or scripts enables a behaviour similar to projects organized in recursive Makefiles.
					<programlisting language="python">
def build(bld):

	bld.add_group('test1')
	bld.add_group('test2')
	bld.add_group('test3')
	bld.add_group('test4')

	print('adding task generators')

	bld.set_group('test3')
	bld.new_task_gen(features='cxx cprogram', source='main3.c', target='g3')

	bld.set_group('test1')
	bld.new_task_gen(features='cxx cprogram', source='main1.c', target='g1')

	bld.set_group('test2')
	obj2 = bld.new_task_gen(features='cxx cprogram', source='main2.c', target='g2')

	bld.set_group('test4')
	obj2.clone('debug')
					</programlisting>
					Because task groups prevent parallelization, they reduce performance. On the other hand, they make projects more structured and improve the maintainance.
				</para>
			</sect2>

			<sect2>
				<title>Precedence constraints</title>
				<para>
					The attributes <emphasis>before</emphasis> and <emphasis>after</emphasis> are used to declare ordering constraints between tasks:
					<programlisting language="python">
import Task
class task_test_a(Task.TaskBase):
	before = 'task_test_b'
class task_test_b(Task.TaskBase):
	after = 'task_test_a'
					</programlisting>
					Another way to declare precedence constraints is to declare a file extension production, for example:
					<programlisting language="python">
import Task
class task_test_a(Task.TaskBase):
	ext_in = '.c'
class task_test_b(Task.TaskBase):
	ext_out = '.c'
					</programlisting>
					The extensions have to match to add a valid precedence constraint, but they are only annotations, they do not mean the tasks actually have to produce files of that type.
				</para>
			</sect2>

			<sect2>
				<title>Precedence constraints on task instances</title>
				<para>
					The method <emphasis>set_run_after</emphasis> is used to declare ordering constraints between tasks:
					<programlisting language="python">
task1.set_run_after(task2)
					</programlisting>
					unlike the previous constraints, it is used on the instances of class <emphasis>Task</emphasis> which is a subclass of class <emphasis>TaskBase</emphasis>
				</para>
			</sect2>

	</section>

	<section id="execution_tracking">
		<title>Executing tasks only when something changes</title>
		<para>The direct instances of TaskBase are quite limited and do not track the changes to the source files. The class <emphasis>Task</emphasis> provides the necessary features for the most common builds in which source files are used to produce target files. The idea is to create a unique signature for tasks, and to represent the dependencies on files or other tasks by including them in the signature. A hashing function is used for computing the signature, by default it is md5.
		</para>
		<para>
			The following diagram illustrates the task processing including the signature, it is only valid for Task instance (not TaskBase instances):
			<graphic format="png" fileref="task_signature.png" align="center"/>
		</para>
		<para>
			The signature computation uses the following data:
			<orderedlist>
				<listitem>explicit dependencies: input files and dependencies set explicitly using task.deps_man or bld.depends_on</listitem>
				<listitem>implicit dependencies: dependencies searched by the task itself (like source files included from other source files).</listitem>
				<listitem>parameters: compilation flags and command-line parameters.</listitem>
			</orderedlist>
			Here is an example illustrating the different kinds of dependencies:
			<programlisting language="python">
import Task
class task_demo(Task.Task):
	vars = ['CXXFLAGS', 'LINKFLAGS'] <co id="vars-co" linkends="vars"/>
	def scan(self): <co id="scan-co" linkends="scan"/>
		return [[self.inputs[0].parent.find_resource('.svn/entries')], []]

task = task_demo()
task.inputs = [bld.path.find_resource('test.cxx')] <co id="expl-co" linkends="expl"/>
task.deps_man = [bld.path.find_resource('wscript')] <co id="expl2-co" linkends="expl2"/>

bld.add_manual_dependency('main.c', 'an arbitrary string value') <co id="expl3-co" linkends="expl3"/>
bld.add_manual_dependency(
		bld.path.find_or_declare('test_c_program'),
		bld.path.find_resource('bbb')) <co id="expl4-co" linkends="expl4"/>
			</programlisting>
			<calloutlist>
				<callout arearefs="vars-co" id="vars">
					<para>Environment variable dependencies (compilation flags)</para>
				</callout>
				<callout arearefs="scan-co" id="scan">
					<para>Implicit dependencies: a method returns a list containing the list of additional nodes to take into account, and the list of the files that could not be found (cache)</para>
				</callout>
				<callout arearefs="expl-co" id="expl">
					<para>Explicit dependencies as input files (nodes)</para>
				</callout>
				<callout arearefs="expl2-co" id="expl2">
					<para>Explicit dependencies as manual dependencies</para>
				</callout>
				<callout arearefs="expl3-co" id="expl3">
					<para>Manual dependencies on source files, the second parameter can be a string, a node object or a function returning a string</para>
				</callout>
				<callout arearefs="expl4-co" id="expl4">
					<para>Manual dependencies with nodes, the first node represents a target (which may or may not exist in the build), and the second parameter represents a file in the source directory.</para>
				</callout>
			</calloutlist>

		</para>
	</section>

</chapter>