These are simple examples that illustrate how to construct common data processing patterns in Pegasus workflows.

Each of the examples can be planned and executed using Pegasus on any standard UNIX system. Using this sites.xml site catalog and this tc.txt transformation catalog, the workflows can be planned and executed by running:

$ pegasus-plan -Dpegasus.catalog.site.file=sites.xml \
               -Dpegasus.catalog.transformation.file=tc.txt \
               -Dpegasus.catalog.replica=File \
               -Dpegasus.catalog.replica.file=rc.txt \
               -Dpegasus.register=false \
               -s local -o local --dir submit \
               --submit --dax DAXFILE.dax

There is a DAX file and a Python DAX generator for each example workflow.

Check the Pegasus User Guide for more information on writing DAX generators and more examples. We have DAX APIs for Java, Python and Perl.

Process

The process workflow consists of a single job that runs the `ls` command and generates a listing of the files in the `/` directory.

DAX File | Python DAX Generator | Java DAX Generator | Perl DAX Generator

Pipeline

The pipeline workflow consists of two jobs linked together in a pipeline. The first job runs the `curl` command to fetch the Pegasus home page and store it as an HTML file. The result is passed to the `wc` command, which counts the number of lines in the HTML file.

DAX File | Python DAX Generator | Java DAX Generator | Perl DAX Generator

Split

The split workflow downloads the Pegasus home page using the `curl` command, then uses the `split` command to divide it into 4 pieces. The result is passed to the `wc` command to count the number of lines in each piece.

DAX File | Python DAX Generator | Java DAX Generator | Perl DAX Generator

Merge

The merge workflow runs the `ls` command on several */bin directories and passes the results to the `cat` command, which merges the files into a single listing.

DAX File | Python DAX Generator | Java DAX Generator | Perl DAX Generator