Chapter 9. Example Workflows

These examples are included in the Pegasus distribution and can be found under share/pegasus/examples in your Pegasus install (/usr/share/pegasus/examples for native packages)

Note

These examples are intended to be a starting point for when you want to create your own workflows and want to see how other workflows are set up. The example workflows will probably not work in your environment without modifications. Site and transformation catalogs contain site and user specifics such as paths to scratch directories and installed software, and at least minor modificiations are required to get the workflows to plan and run.

9.1. Grid Examples

These examples assumes you have access to a cluster with Globus installed. A pre-ws gatekeeper and gridftp server is required. You also need Globus and Pegasus installed, both on the machine you are submitting from, and the cluster.

9.1.1. Black Diamond

Pegasus is shipped with 3 different Black Diamond examples for the grid. This is to highlight the available DAX APIs which are Java, Perl and Python. The examples can be found under:

share/pegasus/examples/grid-blackdiamond-java/
share/pegasus/examples/grid-blackdiamond-perl/
share/pegasus/examples/grid-blackdiamond-python/

The workflow has 4 nodes, layed out in a diamond shape, with files being passed between them (f.*):

The binary for the nodes is a simple "mock application" name keg ("canonical example for the grid") which reads input files designated by arguments, writes them back onto output files, and produces on STDOUT a summary of where and when it was run. Keg ships with Pegasus in the bin directory.

This example ships with a "submit" script which will build the replica catalog, the transformation catalog, and the site catalog. When you create your own workflows, such a submit script is not needed if you want to maintain those catalogs manually.

Note

The use of ./submit scripts in these examples are just to make it more easy to run the examples out of the box. For a production site, the catalogs (transformation, replica, site) may or may not be static or generated by other tooling.

To test the examples, edit the submit script and change the cluster config to the setup and install locations for your cluster. Then run:

$ ./submit

The workflow should now be submitted and in the output you should see a work dir location for the instance. With that directory you can monitor the workflow with:

$ pegasus-status [workdir]

Once the workflow is done, you can make sure it was sucessful with:

$ pegasus-analyzer -d [workdir]

9.1.2. NASA/IPAC Montage

This example can be found under

share/pegasus/examples/grid-montage/

The NASA IPAC Montage (http://montage.ipac.caltech.edu/) workflow projects/montages a set of input images from telescopes like Hubble and end up with images like http://montage.ipac.caltech.edu/images/m104.jpg . The test workflow is for a 1 by 1 degrees tile. It has about 45 input images which all have to be projected, background modeled and adjusted to come out as one seamless image.

Just like the Black Diamond above, this example uses a ./submit script.

The Montage DAX is generated with a tool called mDAG shipped with Montage which generates the workflow.

9.1.3. Rosetta

This example can be found under

share/pegasus/examples/grid-rosetta/

Rosetta (http://www.rosettacommons.org/) is a high resolution protein prediction and design software. Highlights in this example are:

  • Using the Pegasus Java API to generate the DAX

  • The DAX generator loops over the input PDBs and creates a job for each input

  • The jobs all have a dependency on a flatfile database. For simplicity, each job depends on all the files in the database directory.

  • Job clustering is turned on to make each grid job run longer and better utilize the compute cluster

Just like the Black Diamond above, this example uses a ./submit script.