These are simple examples that illustrate how to construct common data processing patterns in Pegasus workflows.
Each of the examples can be planned and executed using Pegasus on any standard UNIX system. Using this sites.xml site catalog and this tc.txt transformation catalog, the workflows can be planned and executed by running:
$ pegasus-plan -Dpegasus.catalog.site.file=sites.xml \ -Dpegasus.catalog.transformation.file=tc.txt \ -Dpegasus.catalog.replica=File \ -Dpegasus.catalog.replica.file=rc.txt \ -Dpegasus.register=false \ -s local -o local --dir submit \ --submit --dax DAXFILE.dax
There is a DAX file and a Python DAX generator for each example workflow.
The process workflow consists of a single job that runs the `ls` command and generates a listing of the files in the `/` directory.
The pipeline workflow consists of two jobs linked together in a pipeline. The first job runs the `curl` command to fetch the Pegasus home page and store it as an HTML file. The result is passed to the `wc` command, which counts the number of lines in the HTML file.
The split workflow downloads the Pegasus home page using the `curl` command, then uses the `split` command to divide it into 4 pieces. The result is passed to the `wc` command to count the number of lines in each piece.