This walkthrough is intended for new users who want so get a quick overview of the Pegasus concepts and system. A preconfigured virtual machine is provided so that no software installation (except Nimbus cloud-client) is required. The walkthrough covers creating a workflow, submitting, monitoring, debugging, and generating run statistics at a high level. As concepts and tools are introduced, links to the main Pegasus documentation are provided.
Please refer to chapter "Starting The Virtual Machine" to start a virtual machine on FutureGrid.
That chapter also includes the first instruction, how to log into
your virtual machine instance as user tutorial using
ssh. Please do so now.
Pegasus takes in an abstract workflow (DAX) and generates an executable workflow (DAG) that is run in an environment. For the purposes of this walkthrough, we will demonstrate the characteristics and structure of workflows by generating a workflow with a bit of Python code that uses the DAX API to generate a DAX. For a detailed description of workflows, how to create them, and how they are used in Pegasus see Creating Workflows
The workflow we will be creating is called Black Diamond because its shape. It is a made up workflow which has 4 jobs, files (f.*) passed between the jobs, and it is an interesting example as it shows data and job dependencies. The workflow looks like:
All the exercises in this chapter will be run from the $HOME/walkthrough/ directory . All the files that are required reside in this directory.
Change the directory to $HOME/walkthrough:
$ cd $HOME/walkthrough
The piece of code which generates the DAX is called a DAX generator, and in this case the code is written in Python. (The proper version of Python is pre-installed on the VM you are on.) APIs are also available for Java and Perl, and if that does not fit in your tool chain, you can also write the DAX XML directly.
Open the file create_diamond_dax.py
$ nano create_diamond_dax.py
The code has 6 logical sections:
Imports and Pegasus location setup. We need to know the location of Pegasus because we need to import the DAX3 Pegasus Python module, and we need to know where on the file system the pegasus-keg executable is.
A new abstract dag (DAX) is created. This is the main DAX object that we will add data, jobs and flow information to.
A replica catalog is set up. Replica catalogs tell Pegasus where to find data. In this example workflow, we only have one input, f.a, and thus the replica catalog only has one entry. For larger workflows, it is not uncommon to have thousands of entries in the replica catalog, and sometimes mulitple physical locations for the same logical filename so that Pegasus can pick which replica to use. Replica catalogs can either be included in the DAX (like in this example) or be standalone files or services. For more information about replica catalogs, see the Data Discovery chapter.
Executables are added. Just like we the replica catalog informs Pegasus on where to find data, the transformation catalog tells Pegasus where to find the executables for the workflow. The transformation catalog can exist inside the DAX (as in this example) or in a standalone file. You can list the same logical exectuable existing on multiple resources, and Pegasus will pick the appropiate one when the workflow is planned. More information can be found in the Executable Discovery chapter.
Jobs are added. The 4 jobs in the Black Diamond in the picture above are added. Arguments are defined, and uses clauses are added to list input and output files. This is an imporant step, as it allows Pegasus to track the files, and stage the data if necessary.
Control flows are set up. This is the edges in the picture, and defines parent/child relationships between the jobs. When the workflow is executing, this is the order the jobs will run in.
Close the file (Ctrl+X) and execute it. Since Pegasus 4.0,
all Pegasus tools are installed according to the LFS with a prefix of
pegasus- before binaries.
$ ./create_diamond_dax.py
The output is the DAX XML:
<?xml version="1.0" encoding="UTF-8"?>
<!-- generated: 2012-03-08 18:00:14.935638 -->
<!-- generated by: tutorial -->
<!-- generator: python -->
<adag xmlns="http://pegasus.isi.edu/schema/DAX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.3.xsd" version="3.3" name="diamond">
<file name="f.a">
<pfn url="file:///home/tutorial/walkthrough/f.a" site="PegasusVM"/>
</file>
<executable name="preprocess" namespace="diamond" version="4.0" arch="x86_64" os="linux" installed="true">
<pfn url="file:///usr/bin/pegasus-keg" site="PegasusVM"/>
</executable>
<executable name="analyze" namespace="diamond" version="4.0" arch="x86_64" os="linux" installed="true">
<pfn url="file:///usr/bin/pegasus-keg" site="PegasusVM"/>
</executable>
<executable name="findrange" namespace="diamond" version="4.0" arch="x86_64" os="linux" installed="true">
<pfn url="file:///usr/bin/pegasus-keg" site="PegasusVM"/>
</executable>
<job id="ID0000001" namespace="diamond" name="preprocess" version="4.0">
<argument>-a preprocess -T60 -i <file name="f.a"/> -o <file name="f.b1"/> <file name="f.b2"/></argument>
<uses name="f.b1" link="output"/>
<uses name="f.a" link="input"/>
<uses name="f.b2" link="output"/>
</job>
<job id="ID0000002" namespace="diamond" name="findrange" version="4.0">
<argument>-a findrange -T60 -i <file name="f.b1"/> -o <file name="f.c1"/></argument>
<uses name="f.c1" link="output"/>
<uses name="f.b1" link="input"/>
</job>
<job id="ID0000003" namespace="diamond" name="findrange" version="4.0">
<argument>-a findrange -T60 -i <file name="f.b2"/> -o <file name="f.c2"/></argument>
<uses name="f.c2" link="output"/>
<uses name="f.b2" link="input"/>
</job>
<job id="ID0000004" namespace="diamond" name="analyze" version="4.0">
<argument>-a analyze -T60 -i <file name="f.c1"/> <file name="f.c2"/> -o <file name="f.d"/></argument>
<uses name="f.c2" link="input"/>
<uses name="f.d" link="output" register="true"/>
<uses name="f.c1" link="input"/>
</job>
<child ref="ID0000002">
<parent ref="ID0000001"/>
</child>
<child ref="ID0000003">
<parent ref="ID0000001"/>
</child>
<child ref="ID0000004">
<parent ref="ID0000002"/>
<parent ref="ID0000003"/>
</child>
</adag>
Note
We broke the overly long line of the XML root element for this document.
We need the DAX in a file to give to Pegasus, so run the same command again, but redirect it to file:
$ ./create_diamond_dax.py > diamond.dax
More information about creating workflows can be found in the Creating Workflows chapter.
Submitting a Pegasus workflow consists of two steps, planning and submitting, but are often made by one single command for convenience. However, the planning stage is where Pegasus is doing powerful transformations to your workflow, so it is important to at least have an idea on what is happening under the covers. Planning includes, but is not limited to:
Adding remote create dir jobs
Adding stage in jobs to transfer data into the remote work directory
Adding cleanup jobs to clean up the work directory as the workflow progresses
Adding stage out jobs to transfer data to the final output location
Adding registration jobs to register the data in a replica catalog
Clusters job together - useful if you have many short tasks
Adds wrappers to the jobs to collect provenance information - this is so statistics and plots can be created after a run
In the case of our black diamond workflow, here is what it looks like after the Pegasus planner has processed the DAX into a directed acyclic graph (DAG) of jobs:
In the above image, find the tasks from the abstract diamond DAX as the purple nodes in this concrete DAG.
Please note that the exact shape of your DAG may be slightly different. This is an effect of certain randomization during job clustering that have no impact on the correctness of the workflow. The minor differences may also be expressed in the total number of jobs you see later in this chapter.
To plan and submit the workflow, run:
$ pegasus-plan --conf pegasusrc --dir runs --sites PegasusVM --output local \
--dax diamond.dax --submit
The output will look something like:
2012.03.08 18:07:15.936 EST: Submitting job(s). 2012.03.08 18:07:15.945 EST: 1 job(s) submitted to cluster 1. 2012.03.08 18:07:15.963 EST: 2012.03.08 18:07:15.969 EST: ----------------------------------------------------------------------- 2012.03.08 18:07:15.978 EST: File for submitting this DAG to Condor : diamond-0.dag.condor.sub 2012.03.08 18:07:15.989 EST: Log of DAGMan debugging messages : diamond-0.dag.dagman.out 2012.03.08 18:07:16.001 EST: Log of Condor library output : diamond-0.dag.lib.out 2012.03.08 18:07:16.025 EST: Log of Condor library error messages : diamond-0.dag.lib.err 2012.03.08 18:07:16.033 EST: Log of the life of condor_dagman itself : diamond-0.dag.dagman.log 2012.03.08 18:07:16.041 EST: 2012.03.08 18:07:16.049 EST: ----------------------------------------------------------------------- 2012.03.08 18:07:16.057 EST: 2012.03.08 18:07:16.065 EST: Your Workflow has been started and runs in base directory given below 2012.03.08 18:07:16.073 EST: 2012.03.08 18:07:16.081 EST: cd/home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run00012012.03.08 18:07:16.089 EST: 2012.03.08 18:07:16.097 EST: *** To monitor the workflow you can run *** 2012.03.08 18:07:16.109 EST: 2012.03.08 18:07:16.117 EST: pegasus-status -l/home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run00012012.03.08 18:07:16.125 EST: 2012.03.08 18:07:16.133 EST: *** To remove your workflow run *** 2012.03.08 18:07:16.141 EST: pegasus-remove/home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run00012012.03.08 18:07:16.149 EST: 2012.03.08 18:07:16.157 EST: Time taken to execute is 2.213 seconds
Tip
The work directory created by Pegasus is where the concrete workflow exists, and the directory is also the handle for Pegasus commands acting on that instance. Using this handle will be covered in the next section.
Further information about planning and submitting workflows can be found in the Running Workflows chapter. Information about the files in the work directory can be found in the Submit Directory Details chapter.
Once the workflow has been submitted, you can check status of it
with the pegasus-status tool. Use the directory
handle (which is different for every workflow you submit) from the
previous step and run it with the -l flag:
$ pegasus-status -l /home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run0001
STAT IN_STATE JOB
Run 03:06 diamond-0
Run 00:46 |-findrange_ID0000002
Run 00:44 |-findrange_ID0000003
Idle 00:32 \_clean_up_preprocess_ID0000001
Summary: 4 Condor jobs total (I:1 R:3)
UNRDY READY PRE IN_Q POST DONE FAIL %DONE STATE DAGNAME
8 0 0 4 0 4 0 25.0 Running *diamond-0.dag
Summary: 1 DAG total (Running:1)
The first section shows jobs currently available to the Condor scheduling sub-system. Pegasus builds on top of Condor to manage jobs. The second section shows a summary of the state of the workflow. Keep on checking the workflow with pegasus-status until it is 100% done.
Once the workflow has finished, you may use the pegasus-analyzer to debug the workflow. This is obviously most useful when workflows have failed for some reason. pegasus-analyzer will show you which jobs failed and the output of those jobs. Our simple black diamond should finish successfully, and pegasus-analyzer output should look like:
$ pegasus-analyzer /home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run0001
pegasus-analyzer: initializing...
************************************Summary*************************************
Total jobs : 15 (100.00%)
# jobs succeeded : 15 (100.00%)
# jobs failed : 0 (0.00%)
# jobs unsubmitted : 0 (0.00%)
To get detailed run statistics, use the pegasus-statistics tool:
$ pegasus-statistics /home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run0001
**********************************************SUMMARY***********************************************
#legends
Workflow summary - Summary of the workflow execution. It shows total
tasks/jobs/sub workflows run, how many succeeded/failed etc.
In case of hierarchical workflow the calculation shows the
statistics across all the sub workflows.It shows the following
statistics about tasks, jobs and sub workflows.
* Succeeded - total count of succeeded tasks/jobs/sub workflows.
* Failed - total count of failed tasks/jobs/sub workflows.
* Incomplete - total count of tasks/jobs/sub workflows that are
not in succeeded or failed state. This includes all the jobs
that are not submitted, submitted but not completed etc. This
is calculated as difference between 'total' count and sum of
'succeeded' and 'failed' count.
* Total - total count of tasks/jobs/sub workflows.
* Retries - total retry count of tasks/jobs/sub workflows.
* Total Run - total count of tasks/jobs/sub workflows executed
during workflow run. This is the cumulative of retries,
succeeded and failed count.
Workflow wall time - The walltime from the start of the workflow execution
to the end as reported by the DAGMAN.In case of rescue dag the value
is the cumulative of all retries.
Workflow cumulative job wall time - The sum of the walltime of all jobs as
reported by kickstart. In case of job retries the value is the
cumulative of all retries. For workflows having sub workflow jobs
(i.e SUBDAG and SUBDAX jobs), the walltime value includes jobs from
the sub workflows as well.
Cumulative job walltime as seen from submit side - The sum of the walltime of
all jobs as reported by DAGMan. This is similar to the regular
cumulative job walltime, but includes job management overhead and
delays. In case of job retries the value is the cumulative of all
retries. For workflows having sub workflow jobs (i.e SUBDAG and
SUBDAX jobs), the walltime value includes jobs from the sub workflows
as well.
-------------------------------------------------------------------------------------------------------------------------------------------------
Type Succeeded Failed Incomplete Total Retries Total Run (Retries Included)
Tasks 4 0 0 4 || 0 4
Jobs 15 0 0 15 || 0 15
Sub Workflows 0 0 0 0 || 0 0
-------------------------------------------------------------------------------------------------------------------------------------------------
Workflow wall time : 5 mins, 17 secs, (total 317 seconds)
Workflow cumulative job wall time : 4 mins, 0 secs, (total 240 seconds)
Cumulative job walltime as seen from submit side : 4 mins, 7 secs, (total 247 seconds)
Summary : /home/tutorial/walkthrough/runs/tutorial/pegasus/diamond/run0001/statistics/summary.txt
****************************************************************************************************
More information about monitoring, debugging and statistics can be found in the Monitoring, Debugging and Statistics chapter.
Please refer to chapter "Terminate Your Virtual Machine" to shut down the currently running virtual machine.
However, if you feel like going on, skip the shutdown, go to the next chapter, and skip the start-up and log-in at the beginning of the next chapter.




