Student notes for Pegasus tutorial

Introduction

These are the student notes for the Pegasus tutorial. They are designed to be used in conjunction with instructor presentation and support.

You will see two styles of machine text here:

Text like this is input that you should type.
Text like this is the output you should get.

For example:

$ date

Mon June 1 11:54:58 BST 2007

You will need to log into the tutorial machine, using an ssh client and the login name and password supplied separately.

On Linux or Mac OS X, open a terminal window and type:

On Windows, PuTTY is recommended as an ssh client.

For the purpose of this tutorial replace any instance of @viz-user@ with your viz-login username and @tg-user@ with your teragrid username. If you use your teragrid username remember to use your teragrid password and for viz-login username use your viz password.

$ ssh @viz-user@@viz-login.isi.edu

[welcome message] viz-username@viz-login:~$

You will need to obtain Grid Credentials to run the workflows on Teragrid.
Teragrid provides facility to obtain grid credentials using MyProxy.

$ myproxy-logon -s myproxy.teragrid.org -l @tg-user@

Enter MyProxy pass phrase: A credential has been received for user xxxxx in /tmp/x509up_u1055

Check your proxy using grid-proxy-info.

$ grid-proxy-info

subject : /C=US/O=National Center for Supercomputing Applications/CN=Training - trainxxxx REL issuer : /C=US/O=National Center for Supercomputing Applications/CN=Certification Authority identity : /C=US/O=National Center for Supercomputing Applications/CN=Training - trainxxxx REL type : end entity credential strength : 1024 bits path : /tmp/x509up_u1055 timeleft : 2:59:24

Chapter 2: Running on the GRID using Pegasus

In this chapter you will be introduced to planning and running a workflow through Pegasus on a cluster. You will take a Montage workflow generated and run it on the GRID.

All the exercises in this Chapter will be run from the $HOME/tutorial/ directory. All the files that are required reside in this directory

$ cd $HOME/tutorial
$ 

Files for the exercise are stored in subdirectories:

$ ls

config dags dax

You may also see some other files here.

Exercise 2.1: DAX

An abstract DAG has been generated for Montage application and output in XML format into dax/montage.dax. Open montage.dax in a file viewer:

$ cat dax/montage.dax

Inside the DAX, you should see three sections.

  1. list of all the files used in the workflow
  2. definition of all jobs - each job in the workflow.
  3. list of control-flow dependencies - this section specifies a partial order in which jobs are to executed.

Exercise 2.2 SETTING UP THE REPLICA CATALOG

In this exercise you will insert entries into the Replica Catalog. The replica catalog that we will use today is a simple file based catalog.
We also support and recommmend GLOBUS RLS or a JDBC impelentation for production runs.

A Replica Catalog maintains the lfn to pfn mapping for the input files of your workflow. Pegasus queries it to determine the locations of the raw input data files required by the workflow. Additionally, all the materialized data is registered into RLS for data reuse later on.

You can use the rc-client command to insert , query and delete from the replica catalog.

The input data to be used for your workflow resides in the /scratch/tutorial/inputdata/0.2degree directory. We are going to insert entries into the replica catalog that point to the files in this directory.

The instructors have provided:

Instructions:

Congratulations!! You have the replica catalog setup correctly for use. This is the catalog which you will tinker with most, while running Pegasus.

Exercise 2.3 SETTING UP THE SITE CATALOG AND THE TRANSFORMATION Catalog

In this exercise you will setup your Site Catalog and the Transformation Catalog.

The transformation catalog maintains information about where the application code resides on the grid. In our case, it contains the locations where the Montage code is installed on the various grid sites.

The site catalog contains information about the layout of your grid where you want to run your workflows. For each site information like workdirectories, jobmanagers to use, gridftp servers to use and other site wide information like environment variables to be set is maintained.

The instructors have provided:

$ cat config/tc.data

local   bin/mDiff       gsiftp://sukhna.isi.edu/usr/sukhna/work/montage/software/default/bin/mDiff              STATIC_BINARY   INTEL32::LINUX  ENV::MONTAGE_HOME="."
local   bin/mDiff       gsiftp://viz-login.isi.edu/nfs/software/montage/montage-3.0_beta33-ia64/bin/mDiff       STATIC_BINARY   INTEL64::LINUX  ENV::MONTAGE_HOME="."
local   bin/mFitplane   gsiftp://sukhna.isi.edu/usr/sukhna/work/montage/software/default/bin/mFitplane          STATIC_BINARY   INTEL32::LINUX  NULL
local   bin/mFitplane   gsiftp://viz-login.isi.edu/nfs/software/montage/montage-3.0_beta33-ia64/bin/mFitplane   STATIC_BINARY   INTEL64::LINUX  NULL
local   mAdd:3.0        gsiftp://sukhna.isi.edu/usr/sukhna/work/montage/software/default/bin/mAdd               STATIC_BINARY   INTEL32::LINUX  NULL
local   mAdd:3.0        gsiftp://viz-login.isi.edu/nfs/software/montage/montage-3.0_beta33-ia64/bin/mAdd        STATIC_BINARY   INTEL64::LINUX  NULL

Open the properties file and check a few properties.

$ cat config/properties

## SELECT THE REPLICAT CATALOG MODE AND URL
pegasus.catalog.replica = SimpleFile
pegasus.catalog.replica.file = ${user.home}/tutorial/config/rc.data
#pegasus.catalog.replica.url=rlsn://smarty.isi.edu

## SELECT THE SITE CATALOG MODE AND FILE
pegasus.catalog.site = XML
pegasus.catalog.site.file = ${user.home}/tutorial/config/sites.xml


## SELECT THE TRANSFORMATION CATALOG MODE AND FILE
pegasus.catalog.transformation = File
pegasus.catalog.transformation.file = ${user.home}/tutorial/config/tc.data

## SET UP THE WORK AND INVOCATION DATABASE
pegasus.catalog.work =  Database
pegasus.catalog.provenance = InvocationSchema

## Database related properties
pegasus.catalog.*.db.driver = MySQL
pegasus.catalog.*.db.url = jdbc:mysql://smarty.isi.edu/tg2007
pegasus.catalog.*.db.user = tg2007user
pegasus.catalog.*.db.password =  Teragrid2007

## USE DAGMAN RETRY FEATURE FOR FAILURES
pegasus.dagman.retry=2

## STAGE ALL OUR EXECUTABLES
pegasus.catalog.transformation.mapper = Staged

## CHECK JOB EXIT CODES FOR FAILURE
pegasus.exitcode.scope=all

## OPTIMZE DATA & EXECUTABLE TRANSFERS
pegasus.transfer.refiner=Bundle

#STAGE DATA AND EXECUTABLES USING GRIDFTP 3rd PARTY MODE
pegasus.transfer.*.thirdparty.sites=*

## WORK AND STORAGE DIR  
## CHANGE THESE TO YOUR TERAGRID USERNAME
pegasus.dir.storage = xxxxx/storage
pegasus.dir.exec = xxxxx/exec

Edit the properties pegasus.dir.storage and pegasus.dir.exec to specify relative paths for your workflow execution and data storage directory. Change the xxxxx value to your @tg-user@ value.

$ vim config/properties
[...]
$ cat config/properties

pegasus.dir.storage = @tg-user@/storage
pegasus.dir.exec = @tg-user@/exec

You can look at them to have an idea as to what they look like. But for now we will move ahead and plan your workflow through Pegasus. We need to get running on the GRID fast :). Time is short!!

In production mode the sc-client interfaces with Globus MDS to retrieve the information about various sites.
Also the client pegasus-get-sites can be used to generate a site catalog and transformation catalog for the Open Science Grid.

Exercise 2.4 Running pegasus-plan to generate concrete workflow (condor submit files) and pegasus-run to submit the workflow to a grid resource

In this exercise we are going to run pegasus-planto generate a concrete workflow from the abstract workflow (montage.dax). The Concrete workflow generated, are condor submit files that are submitted to remote grid resources using pegasus-run

The instructors have provided:

You will need to write some things yourself, by following the instructions below:

Instructions:

Exercise 2.5 Tracking the progress of the workflow and debugging the workflows.

In this exercise we are going to list ways to track your workflow, and give some debugging hints when something goes wrong.

We will change into the directory, that was mentioned by the pegasus-run command.

$ cd /nfs/home/@tg-user@/tutorial/dags/@viz-user@/pegasus/montage/run0001

In this directory you will see a whole lot of files. That should not scare you. Unless things go wrong, you need to look at just a very few number of files to track the progress of the workflow

Exercise 2.6 Removing a running workflow

Sometimes you may want to halt the execution of the workflow or just permanently remove it. You can stop/halt a workflow by running the pegasus-remove command mentioned in the output of pegasus-run
$ pegasus-remove /nfs/home/@viz-user@/tutorial/dags/@viz-user@/pegasus/montage/run0001

Job 2788.0 marked for removal

Exercise 2.7 Optimizing a workflow by clustering small jobs

Sometimes a workflow may have too many jobs whose execution time is a few seconds long. In such instances the overhead of scheduling each job on a grid is too large and the runtime of the entire workflow can be optimized by using pegasus clustering techniques. One such technique is to cluster jobs horizontally on the same level into one or more sequential jobs.

$ pegasus-plan -Dpegasus.user.properties=`pwd`/config/properties --dir `pwd`/dags --sites tg_sdsc --output local \
               --nocleanup --cluster horizontal --dax `pwd`/dax/montage.dax 
[....]

Exercise 2.8 Filling out the Teragrid tutorial Survey


   http://tinyurl.com/23u8bc