4. Example Workflows

PegasusHub is a gallery of Pegasus workflows created and published by Pegasus’ user community. PegasusHub is located at https://pegasushub.github.io/ and contains workflows ranging from simple examples that are suitable for beginners (e.g., Diamond Workflow) to more complicated examples showcasing more advanced Pegasus capabilities (e.g., Montage Workflow). All of the workflows available on PegasusHub reside in individual GitHub repositories and can be cloned using Git.

From the available workflows on PegasusHub we have cherry picked some of the examples and through a cli tool called pegasus-init we enable users to fetch them in a ready to execute state on some of the most common execution environments Pegasus supports.

Note

These examples are intended to be a starting point for when you want to create your own workflows and want to see how other workflows are set up. The example workflows might not work in your environment without modifications. Site and transformation catalogs may contain site and user specifics such as paths to scratch directories and installed software, and minor modificiations might be required to get the workflows to plan and run.

4.1. PegasusHub

4.1.1. Contributing Workflows

Any user can contribute its Pegasus workflow to PegasusHub! In order to add your own Pegasus-enabled workflow GitHub repository, you will need to submit a pull request to PegasusHub’s development repository (https://github.com/pegasushub/pegasushub.github.io) and suggest an edit to the _data/workflows.yml file with your workflow repository details.

-organization: your-github-organization
 repo_name: your-workflow-repository

4.1.2. Process Workflow

The process workflow (https://github.com/pegasus-isi/process-workflow) has one node, which does not consume any inputs, and produces one output file listing.txt containing the output of the ls command.

Process Workflow Example

4.1.3. Pipeline Workflow

The pipeline workflow (https://github.com/pegasus-isi/pipeline-workflow) has two nodes in a sequence as shown in the figure below. The first node fetches a webpage using the curl command, followed by a node which computes the no. of lines in the fetched webpage, which is saved in an output file called count.txt.

Pipeline Workflow Example

4.1.4. Split Workflow

The split workflow (https://github.com/pegasus-isi/split-workflow) has five nodes as shown in the figure below. The first node consumes and input file pegasus.html and splits it to produce 4 parts. Each part is then processed by a node which computes the no. of lines in each part using wc command.

Split Workflow Example

4.1.5. Merge Workflow

The merge workflow (https://github.com/pegasus-isi/merge-workflow) is as shown in the figure below. The first set of nodes execute the ls command on different locations and stores the output in files names bin_*.txt. The output of the above nodes is merged using the cat command and it’s output stored in a single file binaries.txt.

Merge Workflow Example

4.1.6. Diamond Workflow

The diamond workflow (https://github.com/pegasus-isi/diamond-workflow) has 4 nodes, layed out in a diamond shape, with files being passed between them (f.*): First node represents a computation to preprocess an input f.a and produce two output files f.b*. Each of the output file is then analyzed by a findrange job, which produces one output f.c*. The outputs are then processed by a single node called analyze which produces an output file f.d.

Diamond Workflow Example

4.2. Pegasus Init

Pegasus Init (pegasus-init) is designed to be an interactive cli tool that generates example workflows, ready to be executed on common execution environments. The example workflows provided are a subset of the workflows availabe at PegasusHub (https://pegasushub.github.io).

Note

Some of the example workflows might overwrite the configuration pegasus-init generates. Be cautious when executing commands that may alter the workflow and catalogs generated by pegasus-init.

Being an interactive cli tool, pegasus-init prompts the user for input, asking questions that will customize the selected execution environment.

Example usage:

pegasus-init example-workflow

pegasus-init first asks you to select one of the execution environments.

###########################################################
###########   Available Execution Environments   ##########
###########################################################
1) Local Machine Condor Pool
2) Local SLURM Cluster
3) Local LSF Cluster
4) OLCF Summit from OLCF Headnode
5) OLCF Summit from OLCF Hosted Kubernetes Pod

select an execution environment [1]:

Afterwards, it asks you to select one of the available workflow examples offered for the selected execution environment

###########################################################
###########     Available Workflow Examples      ##########
###########################################################
1) pegasus-isi/diamond-workflow
2) pegasus-isi/merge-workflow
3) pegasus-isi/pipeline-workflow
4) pegasus-isi/process-workflow
5) pegasus-isi/split-workflow

Select an example workflow [1]:

Based on your answers pegasus-init might ask more questions in order to customize the execution environment’s configuration such as your project allocation, the scheduler’s queue etc.

4.3. Pegasus Init Execution Environments

The execution environemnts supported by pegasus-init are updated dynamically and their source code can be found at the GitHub repository https://github.com/pegasushub/pegasus-site-catalogs. The python script Sites.py is used by pegasus-init to generate the appropriate site catalog for a supported execution environment, but it’s also standalone and it can be used to scaffold a Pegasus site catalog.

python3 Sites.py \
  --execution-site CONDORPOOL \
  --project-name "" \
  --queue-name "" \
  --pegasus-home "" \
  --scratch-parent-dir ~/scratch \
  --storage-parent-dir ~/storage

Note

Use -h|--help to discover more iformation about the input arguments.