2.3. What are Scientific Workflows

Scientific workflows allow users to easily express multi-step computational tasks, for example retrieve data from an instrument or a database, reformat the data, and run an analysis. A scientific workflow describes the dependencies between the tasks and in most cases the workflow is described as a directed acyclic graph (DAG), where the nodes are tasks and the edges denote the task dependencies. A defining property for a scientific workflow is that it manages data flow. The tasks in a scientific workflow can be everything from short serial tasks to very large parallel tasks (MPI for example) surrounded by a large number of small, serial tasks used for pre- and post-processing.

Workflows can vary from simple to complex. Below are some examples. In the figures below, the task are designated by circles/ellipses while the files created by the tasks are indicated by rectangles. Arrows indicate task dependencies.

Process Workflow

It consists of a single task that runs the ls command and generates a listing of the files in the `/` directory.

Figure 2.1. Process Workflow

Process Workflow

Pipeline of Tasks

The pipeline workflow consists of two tasks linked together in a pipeline. The first job runs the `curl` command to fetch the Pegasus home page and store it as an HTML file. The result is passed to the `wc` command, which counts the number of lines in the HTML file.

Figure 2.2. Pipeline of Tasks

Pipeline of Tasks


Split Workflow

The split workflow downloads the Pegasus home page using the `curl` command, then uses the `split` command to divide it into 4 pieces. The result is passed to the `wc` command to count the number of lines in each piece.

Figure 2.3. Split Workflow

Split Workflow


Merge Workflow

The merge workflow runs the `ls` command on several */bin directories and passes the results to the `cat` command, which merges the files into a single listing. The merge workflow is an example of a parameter sweep over arguments.

Figure 2.4. Merge Workflow

Merge Workflow


Diamond Workflow

The diamond workflow runs combines the split and merge workflow patterns to create a more complex workflow.

Figure 2.5. Diamond Workflow

Diamond Workflow

Complex Workflows

The above examples can be used as building blocks for much complex workflows. Some of these are showcased on the Pegasus Applications page.