Chapter 2. Tutorial

2.1. Introduction

This tutorial will take you through the steps of running simple workflows using Pegasus Workflow Management System. Pegasus allows scientists to

  1. Automate their scientific computational work, as portable workflows. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, or Amazon EC2). It automatically locates the necessary input data and computational resources necessary for workflow execution. It cleans up storage as the workflow is executed so that data-intensive workflows have enough space to execute on storage-constrained resources.

  2. Recover from failures at runtime. When errors occur, Pegasus tries to recover when possible by retrying tasks, and when all else fails, provides a rescue workflow containing a description of only the work that remains to be done. It also enables users to move computations from one resource to another. Pegasus keeps track of what has been done (provenance) including the locations of data used and produced, and which software was used with which parameters.

  3. Debug failures in their computations using a set of system provided debugging tools and an online workflow monitoring dashboard.

This tutorial is intended for new users who want to get a quick overview of Pegasus concepts and usage. The accompanying tutorial VM comes pre-configured to run the example workflows. The instructions listed here refer mainly to the simple split workflow example. The tutorial covers

  • submission of an already generated example workflow with Pegasus.

  • how to use the Pegasus Workflow Dashboard for monitoring workflows.

  • the command line tools for monitoring, debugging and generating statistics.

  • recovery from failures

  • creation of workflow using system provided API

  • information catalogs configuration.

More information about the topics covered in this tutorial can be found in later chapters of this user's guide.

All of the steps in this tutorial are performed on the command-line. The convention we will use for command-line input and output is to put things that you should type in bold, monospace font, and to put the output you should get in a normal weight, monospace font, like this:

[user@host dir]$ you type this
you get this

Where [user@host dir]$ is the terminal prompt, the text you should type is “you type this”, and the output you should get is "you get this". The terminal prompt will be abbreviated as $. Because some of the outputs are long, we don’t always include everything. Where the output is truncated we will add an ellipsis '...' to indicate the omitted output.

If you are having trouble with this tutorial, or anything else related to Pegasus, you can contact the Pegasus Users mailing list at to get help. You can also contact us on our support chatroom on HipChat.