Chapter 1. Pegasus Tutorial Using Self-contained Virtual Machine

These are the student notes for the Pegasus WMS tutorial on the Virtual Machine that can be downloaded from the Pegasus Website. They are designed to be used in conjunction with instructor presentation and support.

You will see two styles of machine text here:

Text like this is input that you should type.

Text like this is the output you should get.

For example:

$ date
Fri Mar 18 12:50:05 PDT 2011

1.1. Downloading and Running the VM using Virtual Box

You will need to install Virtual Box to run the virtual machine on your computer. If you already have one of the tools installed, use that. Otherwise download the binary versions and install them from the Virtual Box Website .

The instructors have tested the image with Virtual Box 4.0.6

1.1.1. Download the VM for Virtual Box use

Download the corresponding disk image.

  • Virtual Box Pegasus Image

    It is around 576 MB in size. We recommend using a command line tool like wget to download the image. Downloading the image using the browser may sometimes corrupt the image. If you are running windows you try downloading using firefox instead of Internet Explorer.

    $ wget http://pegasus.isi.edu/wms/download/3.0/Pegasus-3.0.2-Debian-6-x86.vbox.tar.bz2
    
    --12:43:50--  http://pegasus.isi.edu/wms/download/3.0/Pegasus-3.0.2-Debian-6-x86.vbox.tar.bz2
               => `Pegasus-3.0.2-Debian-6-x86.vbox.tar.bz2'
    Resolving pegasus.isi.edu... 128.9.64.219
    Connecting to pegasus.isi.edu|128.9.64.219|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 604,017,271 (576M) [application/x-bzip2]
    
    

    The Image is bzipped . You will need to unzip it. For Windows you may need winzip or similar tools to extract the VM files.

    If you have gnu tar you can do this directly

    $ gtar jxvf Pegasus-3.0.2-Debian-6-x86.vbox.tar.bz2

    Else you need to do the following

    $ bunzip2 Pegasus-3.0.2-Debian-6-x86.vbox.tar.bz2
    
    $ tar xvf Pegasus-3.0.2-Debian-6-x86.vbox.tar

    After untarring a folder named Pegasus-3.0.2-Debian-6-x86.vbox will be created that has the vmdk files for the VM.

1.1.2. Running the VM with Virtual Box

Launch Virtual Box on your machine. Follow the steps to add the vmdk file to Virtual Box and create a virtual machine inside the Virtual Box

  1. In the Menu, click Machine and select New ( Machine > New )

  2. It will open the New Virtual Machine Wizard. Click Continue

  3. In the VM Name and OS Type Window specify the name as PegasusVM-3.0.2 .Select the Operating System as Linux and Version as Debian . Click Continue.

  4. Set the base memory to 384 MB . It defaults to 512 MB. If you have more ram on your laptop/deskop feel free to adjust this setting. Click Continue

  5. We now select the Virtual Hard Disk to use with the machine. Select the option box for Use Existing Hard Disk. Click the folder icon next to the list and locate the file Debian-6-x86.vmdk in the folder Pegasus-3.0.2-Debian-6-x86.vbox. Click Continue

  6. Click Done

  7. Now in the Virtual Box , start the PegasusVM-3.0.2 machine.

1.2. Mapping and Executing Workflows using Pegasus

In this chapter you will be introduced to planning and executing a workflow through Pegasus WMS locally. You will then plan and execute a larger Montage workflow on the GRID.

When the virtual machine starts , it will automatically log you in as user tutorial . The password for this account is pegasus.

After logging on, start a terminal. There is a shortcut on the desktop for the terminal.

$ tutorial@pegasus-vm:$ pwd

/home/tutorial

In general, to run workflows on the Grid you will need to obtain Grid Credentials. The VM already has a user certificate installed for the pegasus user. To generate the proxy ( grid credentials ) run the grid-proxy-init command.

$ [pegasus@pegasus ~]$ grid-proxy-init 

Your identity: /O=edu/OU=ISI/OU=isi.edu/CN=Tutorial User
Creating proxy ............................................. Done
Your proxy is valid until: Thu Dec 23 22:41:36 2010


All the exercises in this Chapter will be run from the $HOME/pegasus-wms/ directory. All the files that are required reside in this directory

$ cd $HOME/pegasus-wms

Files for the exercise are stored in subdirectories:

$ ls

config  dax

You may also see some other files here.

1.2.1. Creating a DIAMOND DAX

We generate a 4 node diamond dax. There is a small piece of java code that uses the DAX API to generate the DAX. Open the file $HOME/pegasus-wms/dax/CreateDAX.java in a file editor:

$ vi dax/CreateDAX.java

There is a function Diamond( String site_handle, String pegasus_location ) that constructs the DAX. Towards the end of the function there is some commented out code.

 // Add analyze job
//To be uncommented for exercise 2.1
    
        Job j4 = new Job("j4", "pegasus", "analyze", "4.0");
        j4.addArgument("-a analyze -T 60 -i ").addArgument(fc1);
        j4.addArgument(" ").addArgument(fc2);
        j4.addArgument("-o ").addArgument(fd);
        j4.uses(fc1, File.LINK.INPUT);
        j4.uses(fc2, File.LINK.INPUT);
        j4.uses(fd, File.LINK.OUTPUT);
        
        //add job to the DAX
        dax.addJob(j4);

        //analyze job is a child to the findrange jobs
        dax.addDependency("j2", "j4");
        dax.addDependency("j3", "j4");
    
//End of commented out code for Exercise 2.1

The above snippet of code, adds a job with the ID0000004 to the DAX. It illustrates how to specify

  1. the arguments for the job

  2. the logical files used by the job

  3. the dependencies to other jobs

  4. adding the job to the dax

After uncommenting the code, compile and run the CreateDAX program.

$ cd dax

$ javac -classpath .:/opt/pegasus/default/lib/pegasus.jar CreateDAX.java

$  java -classpath .:/opt/pegasus/default/lib/pegasus.jar CreateDAX local /opt/pegasus/default ./diamond.dax

Let us view the generated diamond.dax.

$ cat diamond.dax

Inside the DAX, you should see three sections.

  1. list of input file locations

  2. list of executable locations

  3. definition of all jobs - each job in the workflow. 4 jobs in total.

  4. list of control-flow dependencies - this section specifies a partial order in which jobs are to executed.

1.2.2. Replica Catalog

First lets change to the tutorial base directory.

$ cd $HOME/pegasus-wms

In this exercise you will insert entries into the Replica Catalog. The replica catalog that we will use today is a simple file based catalog. We also support and recommend the following for production runs

  • Globus RLS

  • JDBC implementation

A Replica Catalog maintains the LFN to PFN mapping for the input files of your workflow. Pegasus queries it to determine the locations of the raw input data files required by the workflow. Additionally, all the materialized data is registered into Replica Catalog for data reuse later on.

1.2.2.1. Pre Populated Replica Catalog

The instructors have provided a File based Replica Catalog configured for the tutorial exercises. The file is inside the config directory.

  • Let us see what the file looks like.

    $ cat config/rc.data
    
    
    statfile_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/tutorial/inputdata/0.2degree/statfile.tbl
              pool="local"
    2mass-atlas-990502s-j1440198.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1440198.fits
              pool="local"
    2mass-atlas-990502s-j1440186.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1440186.fits
              pool="local"
     2mass-atlas-990502s-j1430092.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1430092.fits
              pool="local"
     2mass-atlas-990502s-j1420198.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1420198.fits
              pool="local"
     2mass-atlas-990502s-j1420186.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1420186.fits
              pool="local"
     cimages_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/0.2degree/cimages.tbl pool="local"
     pimages_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/0.2degree/pimages.tbl pool="local"
     region_20070529_153243_22618.hdr
         gsiftp://pegasus-vm/scratch/0.2degree/region.hdr pool="local"
     2mass-atlas-990502s-j1430080.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1430080.fits
              pool="local"
    
    ...

1.2.2.2. pegasus-rc-client ( Optional Exercise )

You can use the pegasus-rc-client command to insert , query and delete from the replica catalog.

Before executing any of the pegasus-rc-client exercises lets us remove the pre populated replica catalog.

$ rm $HOME/pegasus-wms/config/rc.data

To execute the diamond dax created in exercise 2.1, we will need to register input file f.a in the replica catalog. The file f.a resides at /scratch/tutorial/inputdata/diamond/f.a . Let us insert a single entry into the replica catalog.

$  pegasus-rc-client  insert f.a \
          gsiftp://pegasus-vm/scratch/tutorial/inputdata/diamond/f.a pool=local

Let us know verify if f.a has been registered successfully by querying the replica catalog using pegasus-rc-client

$ pegasus-rc-client  lookup f.a

 f.a gsiftp://pegasus-vm/scratch/tutorial/inputdata/diamond/f.a pool="local"

The pegasus-rc-client also allows for bulk insertion of entries. We will be inserting the entries for montage workflow using the bulk mode. The input data to be used for the montage workflow resides in the /scratch/tutorial/inputdata/0.2degree directory. We are going to insert entries into the replica catalog that point to the files in this directory.

The instructors have provided:

  • A file replicas.in, the input data file for the pegasus-rc-client that contains the mappings that need to be populated in the Replica Catalog. The file is inside the config directory

  • Let us see what the file looks like.

    $ cat config/rc.in
    
    
    statfile_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/tutorial/inputdata/0.2degree/statfile.tbl
              pool="local"
    2mass-atlas-990502s-j1440198.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1440198.fits
              pool="local"
    2mass-atlas-990502s-j1440186.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1440186.fits
              pool="local"
     2mass-atlas-990502s-j1430092.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1430092.fits
              pool="local"
     2mass-atlas-990502s-j1420198.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1420198.fits
              pool="local"
     2mass-atlas-990502s-j1420186.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1420186.fits
              pool="local"
     cimages_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/0.2degree/cimages.tbl pool="local"
     pimages_20070529_153243_22618.tbl
         gsiftp://pegasus-vm/scratch/0.2degree/pimages.tbl pool="local"
     region_20070529_153243_22618.hdr
         gsiftp://pegasus-vm/scratch/0.2degree/region.hdr pool="local"
     2mass-atlas-990502s-j1430080.fits
         gsiftp://pegasus-vm/scratch/0.2degree/2mass-atlas-990502s-j1430080.fits
              pool="local"
  • Now we are ready to run rc-client and populate the data. Since each of you have an individual file replica catalog, all the 10 entries should be successfully registered.

    $ pegasus-rc-client  --insert config/rc.in
    
    #Successfully worked on : 12 lines
    #Worked on total number of : 12 lines.
  • Now the entries have been successfully inserted into the Replica Catalog. We should query the replica catalog for a particular lfn.

    $ pegasus-rc-client lookup pimages_20080505_143233_14944.tbl
    
    pimages_20080505_143233_14944.tbl
             gsiftp://pegasus-vm/scratch/tutorial/inputdata/0.2degree/pimages.tbl 
               pool="local"

1.2.3. The Site Catalog

The site catalog contains information about the layout of your grid where you want to run your workflows. For each site following information is maintained

  • grid gateways

  • head node filesystem

  • worker node filesystem

  • scratch and shared file systems on the head nodes and worker nodes

  • replica catalog URL for the site

  • site wide information like environment variables to be set when a job is run.

1.2.3.1. Pre Populated Site Catalog

The instructors have provided a pre-populated site catalog for use in the tutorial in $HOME/pegasus-wms/config directory.

Lets see the site catalog for the Pegasus VM. It refers to two sites local and cluster .

$ cat $HOME/pegasus-wms/config/sites.xml3

<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-3.0.xsd" version="3.0">

<site  handle="cluster" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
  <grid  type="gt2" contact="pegasus-vm/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
  <grid  type="gt2" contact="pegasus-vm/jobmanager-condor" scheduler="SGE" jobtype="compute"/>
  <head-fs>
   <scratch>
    <shared>
     <file-server protocol="gsiftp" url="gsiftp://pegasus" mount-point="/home/tutorial/cluster-scratch"/>
     <internal-mount-point mount-point="/home/tutorial/cluster-scratch"/>
    </shared>
   </scratch>
   <storage>
    <shared>
     <file-server protocol="gsiftp" url="gsiftp://pegasus" mount-point="/home/tutorial/cluster-storage"/>
     <internal-mount-point mount-point="/home/tutorial/cluster-storage"/>
    </shared>
   </storage>
  </head-fs>
  <replica-catalog  type="LRC" url="rlsn://localhost"/>
  <profile namespace="env" key="GLOBUS_LOCATION" >/opt/globus/default</profile>
  <profile namespace="env" key="JAVA_HOME" >/usr</profile>
  <profile namespace="env" key="LD_LIBRARY_PATH" >/opt/globus/default/lib</profile>
  <profile namespace="env" key="PEGASUS_HOME" >/opt/pegasus/default</profile>
  <profile namespace="pegasus" key="clusters.num" >1</profile>
  <profile namespace="pegasus" key="stagein.clusters" >1</profile>
 </site>

 <site  handle="local" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
  <grid  type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
  <grid  type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="compute"/>
  <head-fs>
   <scratch>
    <shared>
     <file-server protocol="gsiftp" url="file://" mount-point="/home/tutorial/local-scratch"/>
     <internal-mount-point mount-point="/home/tutorial/local-scratch"/>
    </shared>
   </scratch>
   <storage>
    <shared>
     <file-server protocol="gsiftp" url="file://" mount-point="/home/tutorial/local-storage"/>
     <internal-mount-point mount-point="/home/tutorial/local-storage"/>
    </shared>
   </storage>
  </head-fs>
  <replica-catalog  type="LRC" url="rlsn://localhost"/>
  <profile namespace="env" key="GLOBUS_LOCATION" >/opt/globus/default</profile>
  <profile namespace="env" key="JAVA_HOME" >/usr</profile>
  <profile namespace="env" key="LD_LIBRARY_PATH" >/opt/globus/default/lib</profile>
  <profile namespace="env" key="PEGASUS_HOME" >/opt/pegasus/default</profile>
 </site>

</sitecatalog>
 

1.2.3.2. Generating a Site Catalog for OSG

The client pegasus-sc-client can be used to generate a site catalog and transformation catalog for the Open Science Grid.

$ [pegasus@pegasus pegasus-wms]$ pegasus-sc-client --vo engage --sc ./engage-osg-sc.xml \
  --source OSGMM --grid OSG -vvvv 

2010.11.24 18:00:46.410 PST: [INFO]  Skipping site CIT_CMS_T2 
2010.11.24 18:00:46.416 PST: [INFO]  Adding site RENCI-Engagement 
2010.11.24 18:00:46.475 PST: [INFO]  Adding site Nebraska 
2010.11.24 18:00:46.476 PST: [INFO]  Adding site Prairiefire 
2010.11.24 18:00:46.476 PST: [INFO]  Adding site BNL-ATLAS 
2010.11.24 18:00:46.477 PST: [INFO]  Adding site BNL-ATLAS__1 
2010.11.24 18:00:46.478 PST: [INFO]  Adding site UFlorida-PG 
2010.11.24 18:00:46.478 PST: [INFO]  Skipping site CIT_CMS_T2__1 
2010.11.24 18:00:46.478 PST: [INFO]  Adding site RENCI-Blueridge 
2010.11.24 18:00:46.479 PST: [INFO]  Adding site Nebraska__1 
2010.11.24 18:00:46.480 PST: [INFO]  Adding site UMissHEP 
2010.11.24 18:00:46.480 PST: [INFO]  Adding site UCR-HEP 
2010.11.24 18:00:46.481 PST: [INFO]  Adding site LIGO_UWM_NEMO 
2010.11.24 18:00:46.482 PST: [INFO]  Adding site FNAL_FERMIGRID 
2010.11.24 18:00:46.482 PST: [INFO]  Adding site USCMS-FNAL-WC1 
2010.11.24 18:00:46.483 PST: [INFO]  Adding site UConn-OSG 
2010.11.24 18:00:46.484 PST: [INFO]  Adding site UFlorida-HPC 
2010.11.24 18:00:46.484 PST: [INFO]  Adding site GridUNESP_CENTRAL 
2010.11.24 18:00:46.493 PST: [INFO]  Adding site NWICG_NotreDame 
2010.11.24 18:00:46.494 PST: [INFO]  Site LOCAL . Creating default entry 
2010.11.24 18:00:46.527 PST: [INFO]  Loaded 19 sites  
2010.11.24 18:00:46.527 PST:   Writing out site catalog to /home/tutorial/pegasus-wms/./engage-osg-sc.xml 
2010.11.24 18:00:46.959 PST:   Number of SRM Properties retrieved 14 
2010.11.24 18:00:46.970 PST:   Writing out properties to /home/tutorial/pegasus-wms/./pegasus.6475454308491531036.properties 
2010.11.24 18:00:46.972 PST: [INFO]  Time taken to execute is 1.101 seconds 
2010.11.24 18:00:46.972 PST: [INFO] event.pegasus.planner planner.version 3.0.0  - FINISHED 


1.2.4. Transformation Catalog

The transformation catalog maintains information about where the application code resides on the grid. It also provides additional information about the transformation as to what system they are compiled for, what profiles or environment variables need to be set when the transformation is invoked etc.

1.2.4.1. Pre Populated Transformation Catalog

The instructors have provided a ready transformation catalog (tc.data.text) in the $HOME/pegasus-wms/config directory

In our case, it contains the locations where the Diamond or Montage code is installed in the Pegasus VM. Let us see the Transformation Catalog

For each transformation the following information is captured

  1. tr - A transformation identifier. (Normally a Namespace::Name:Version.. The Namespace and Version are optional.)

  2. pfn - URL or file path for the location of the executable. The pfn is a file path if the transformation is of type INSTALLED and generally a url (file:/// or http:// or gridftp://) if of type STAGEABLE

  3. site - The site identifier for the site where the transformation is available

  4. type - The type of transformation. Whether it is Iinstalled ("INSTALLED") on the remote site or is availabe to stage ("STAGEABLE").

  5. arch os, osrelease, osversion - The arch/os/osrelease/osversion of the transformation. osrelease and osversion are optional.

    ARCH can have one of the following values x86, x86_64, sparcv7, sparcv9, ppc, aix. The default value for arch is x86

    OS can have one of the following values linux,sunos,macosx. The default value for OS if none specified is linux

  6. Profiles - One or many profiles can be attached to a transformation for all sites or to a transformation on a particular site.

$ cat $HOME/pegasus-wms/config/tc.data.text


# multiple line text-based transformation catalog: 2010-11-24T20:46:41.710-08:00
tr bin/mDiff {
        site local {
                profile env "MONTAGE_HOME" "." 
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mDiff"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr bin/mFitplane {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mFitplane"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr condor::dagman {
        site local {
                pfn "/usr/bin/condor_dagman"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

tr diamond::findrange:2.0 {
        site local {
                pfn "/opt/pegasus/default/bin/keg"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

tr diamond::preprocess:2.0 {
        site local {
                pfn "/opt/pegasus/default/bin/keg"
                arch "x86"
                os "LINUX"
                type "INSTALLED"
        }
}

tr mAdd:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mAdd"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mBackground:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mBackground"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mBgModel:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mBgModel"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mConcatFit:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mConcatFit"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mDiffFit:3.0 {
        site local {
                profile env "MONTAGE_HOME" "." 
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mDiffFit"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mImgtbl:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mImgtbl"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mJPEG:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mJPEG"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mProjectPP:3.0 {
        site local {
                profile condor "priority" "25" 
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mProjectPP"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

tr mShrink:3.0 {
        site local {
                pfn "gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/
bin/mShrink"
                arch "x86"
                os "LINUX"
                type "STAGEABLE"
        }
}

1.2.4.2. pegasus-tc-client ( Optional )

We will use the pegasus-tc-client to add the entry for the transformation dummy into the transformation catalog.

  • $ pegasus-tc-client  -a -l diamond::dummy:2.0 \
          -p /opt/pegasus/default/bin/keg -r local -t INSTALLED -s INTEL32::LINUX 
    
     2008.04.30 15:11:59.313 PDT: [INFO] Added tc entry sucessfully
    

    Let us try and query for the entry we inserted.

    $ pegasus-tc-client  -q -P -l diamond::dummy:2.0
    
    #RESID     LTX                     PFN                                           TYPE          SYSINFO
    
    local    diamond::analyze:2.0    /cluster-software/pegasus/current/bin/keg    INSTALLED    INTEL32::LINUX
    
    

    Let us try and query the transformation catalog for all the entries in it. Let us see what our transformation catalog looks like

    $ pegasus-tc-client  -q -B          
    
    local   mDiff       
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mDiff       
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mFitplane
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mFitplane
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mAdd:3.0  
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mAdd
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mBackground:3.0
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mBackground
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mBgModel:3.0
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mBgModel
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mConcatFit:3.0 
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mConcatFit
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mDiffFit:3.0
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mDiffFit 
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mImgtbl:3.0 
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mImgtbl  
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mJPEG:3.0       
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mJPEG  
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mProject:3.0 
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mProjectPP 
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mProjectPP:3.0
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mProjectPP
                                    STAGEABLE   INTEL32::LINUX  ENV::MONTAGE_HOME="."
    local   mShrink:3.0
                 gsiftp://pegasus-vm/scratch/tutorial/software/montage/3.0/x86/bin/mShrink
                                    STAGEABLE   INTEL32::LINUX  NULL

1.2.5. Properties

Pegasus Workflow Planner is configured via the use of java properties. The instructors have provided a ready properties file at $HOME/.pegasusrc .

$ cat $HOME/.pegasusrc
 
##########################
# PEGASUS USER PROPERTIES 
##########################

## SELECT THE REPLICAT CATALOG MODE AND URL
pegasus.catalog.replica = File
pegasus.catalog.replica.file = ${user.home}/pegasus-wms/config/rc.data


## SELECT THE SITE CATALOG MODE AND FILE
pegasus.catalog.site = XML3
pegasus.catalog.site.file = ${user.home}/pegasus-wms/config/sites.xml3


## SELECT THE TRANSFORMATION CATALOG MODE AND FILE
pegasus.catalog.transformation = Text
pegasus.catalog.transformation.file = ${user.home}/pegasus-wms/config/tc.data.text

## USE DAGMAN RETRY FEATURE FOR FAILURES
dagman.retry=2

## CHECK JOB EXIT CODES FOR FAILURE
dagman.post.scope=all

## STAGE ALL OUR EXECUTABLES OR USE INSTALLED ONES 
pegasus.catalog.transformation.mapper = All

## WORK AND STORAGE DIR  
pegasus.dir.storage = storage
pegasus.dir.exec = exec

#JOB CATEGORIES
dagman.projection.maxjobs 2
[pegasus@pegasus pegasus-wms

1.2.6. Planning and Running Workflows Locally

In this exercise we are going to run pegasus-plan to generate a executable workflow from the abstract workflow (diamond.dax). The Executable workflow generated, are condor submit files that are submitted locally using pegasus-run

The instructors have provided:

  • A dax (diamond.dax) in the $HOME/pegasus-wms/dax directory.

You will need to write some things yourself, by following the instructions below:

  • Run pegasus-plan to generate the condor submit files out of the dax.

  • Run pegasus-run to submit the workflow locally.

Instructions:

  • Let us run pegasus-plan on the diamond dax.

    $ cd ~/pegasus-wms
    $ pegasus-plan --dax `pwd`/dax/diamond.dax --force\
                   --dir dags -s local -o local --nocleanup -v

    The above command says that we need to plan the diamond dax locally. The condor submit files are to be generated in a directory structure whose base is dags. We also are requesting that no cleanup jobs be added as we require the intermediate data to be saved. Here is the output of pegasus-plan.

    
    2010.12.23 10:54:02.180 PST: [INFO] event.pegasus.refinement dax.id blackdiamond_0  - STARTED 
    2010.12.23 10:54:02.189 PST: [INFO] event.pegasus.siteselection dax.id blackdiamond_0  - STARTED 
    2010.12.23 10:54:02.203 PST: [INFO] event.pegasus.siteselection dax.id blackdiamond_0  - FINISHED 
    2010.12.23 10:54:02.317 PST: [INFO]  Grafting transfer nodes in the workflow 
    2010.12.23 10:54:02.318 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id blackdiamond_0  - STARTED 
    2010.12.23 10:54:02.447 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id blackdiamond_0  - FINISHED 
    2010.12.23 10:54:02.449 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id blackdiamond_0  - STARTED 
    2010.12.23 10:54:02.452 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id blackdiamond_0  - FINISHED 
    2010.12.23 10:54:02.452 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id blackdiamond_0  - STARTED 
    2010.12.23 10:54:02.453 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id blackdiamond_0  - FINISHED 
    2010.12.23 10:54:02.453 PST: [INFO] event.pegasus.refinement dax.id blackdiamond_0  - FINISHED 
    2010.12.23 10:54:02.539 PST: [INFO]  Generating codes for the concrete workflow 
    2010.12.23 10:54:03.340 PST: [INFO]  Generating codes for the concrete workflow -DONE 
    2010.12.23 10:54:03.340 PST: [INFO]  Generating code for the cleanup workflow 
    2010.12.23 10:54:03.482 PST: [INFO]  Generating code for the cleanup workflow -DONE 
    2010.12.23 10:54:03.530 PST:   
    
    
    I have concretized your abstract workflow. The workflow has been entered 
    into the workflow database with a state of "planned". The next step is 
    to start or execute your workflow. The invocation required is
    
    
    pegasus-run -Dpegasus.user.properties=$HOME/.../blackdiamond/run0001/pegasus.7289539421670233327.properties 
            /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
     
    2010.12.23 10:54:03.530 PST:   Time taken to execute is 1.757 seconds 
    2010.12.23 10:54:03.530 PST: [INFO] event.pegasus.planner planner.version 3.0.2  - FINISHED 
    
    
  • Now run pegasus-run as mentioned in the output of pegasus-plan. Do not copy the command below it is just for illustration purpose.

    [pegasus@pegasus pegasus-wms]$ pegasus-run \
     -Dpegasus.user.properties=$HOME/.../blackdiamond/run0001/pegasus.350356687577055673.properties \
           $HOME/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    -----------------------------------------------------------------------
    File for submitting this DAG to Condor           : blackdiamond-0.dag.condor.sub
    Log of DAGMan debugging messages                 : blackdiamond-0.dag.dagman.out
    Log of Condor library output                     : blackdiamond-0.dag.lib.out
    Log of Condor library error messages             : blackdiamond-0.dag.lib.err
    Log of the life of condor_dagman itself          : blackdiamond-0.dag.dagman.log
    
    -no_submit given, not submitting DAG to Condor.  You can do this with:
    "condor_submit blackdiamond-0.dag.condor.sub"
    -----------------------------------------------------------------------
    Submitting job(s).
    Logging submit event(s).
    1 job(s) submitted to cluster 320.
    
    Your Workflow has been started and runs in base directory given below
    
    cd /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    *** To monitor the workflow you can run ***
    
    pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    *** To remove your workflow run ***
    pegasus-remove -d 320.0
    or
    pegasus-remove /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    [pegasus@pegasus pegasus-wms]$ 
    
    

1.3. Monitoring, Debugging and Statistics

In this section, we are going to list ways to track your workflow, how to debug a failed workflow and how to generates statistics and plots for a workflow run.

1.3.1. Tracking the progress of the workflow and debugging the workflows.

We will change into the directory, that was mentioned by the output of pegasus-run command.

$ cd /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001

In this directory you will see a whole lot of files. That should not scare you. Unless things go wrong, you need to look at just a very few number of files to track the progress of the workflow

  • Run the command pegasus-status as mentioned by pegasus-run above to check the status of your jobs. Use the watch command to auto repeat the command every 2 seconds.

    $ watch pegasus-status /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    -- Submitter: pegasus : <172.16.80.128:40195> : pegasus
     ID      OWNER/NODENAME   SUBMITTED     RUN_TIME ST PRI SIZE CMD               
      84.0   pegasus         7/19 16:59   0+00:01:17 R  0   7.3  condor_dagman -f -
      87.0    |-preprocess_  7/19 17:00   0+00:00:31 R  10  0.1  kickstart -n diamo
    

    Tip

    watch does not end with ESC nor (q)uit, but with Ctrl+C.

    The above output shows that a couple of jobs are running under the main dagman process. Keep a lookout to track whether a workflow is running or not. If you do not see any of your job in the output for sometime (say 30 seconds), we know the workflow has finished. We need to wait, as there might be delay in Condor DAGMan releasing the next job into the queue after a job has finished successfully.

    If output of pegasus-status is empty, then either your workflow has

    1. successfully completed

    2. stopped midway due to non recoverable error.

    We can now run pegasus-analyzer to analyze the workflow.

  • Using pegasus-analyzer to analyze the workflow

    $ pegasus-analyzer  -i /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001
    
    pegasus-analyzer: initializing...
    
    ************************************Summary*************************************
    
     Total jobs         :      8 (100.00%)
     # jobs succeeded   :      8 (100.00%)
     # jobs failed      :      0 (0.00%)
     # jobs unsubmitted :      0 (0.00%)
    
    **************************************Done**************************************
    
    pegasus-analyzer: end of status report
    
    
    
  • Another way to monitor the workflow is to check the jobstate.log file. This is the output file of the monitoring daemon that is parsing all the condor log files to determine the status of the jobs. It logs the events seen by Condor into a more readable form for us.

    $ more jobstate.log
    
    1290676248 INTERNAL *** MONITORD_STARTED ***
    1290676247 INTERNAL *** DAGMAN_STARTED 339.0 ***
    [..]

    In the starting of the jobstate.log, when the workflow has just started running you will see a lot of entries with status UN_READY. That designates that DAGMan has just parsed in the .dag file and has not started working on any job as yet. Initially all the jobs in the workflow are listed as UN_READY. After sometime you will see entries in jobstate.log, that shows a job is being executed etc.

    
    1290676261 create_dir_blackdiamond_0_local SUBMIT 340.0 local - 1
    1290676266 create_dir_blackdiamond_0_local EXECUTE 340.0 local - 1
    1290676266 create_dir_blackdiamond_0_local JOB_TERMINATED 340.0 local - 1
    1290676266 create_dir_blackdiamond_0_local JOB_SUCCESS 0 local - 1
    1290676266 create_dir_blackdiamond_0_local POST_SCRIPT_STARTED 340.0 local - 1
    1290676271 create_dir_blackdiamond_0_local POST_SCRIPT_TERMINATED 340.0 local - 1
    1290676271 create_dir_blackdiamond_0_local POST_SCRIPT_SUCCESS 0 local - 1

    The above shows the being submitted and then executed on the grid. In addition it lists that job is being run on the grid site local (which is your submit machine). The various states of the job while it goes through submission to execution to post processing are in UPPERCASE.

  • Successfully Completed : Let us again look at the jobstate.log. This time we need to look at the last few lines of jobstate.log

    $ tail jobstate.log
    
    1290676542 register_local_2_0 SUBMIT 347.0 local - 8
    1290676547 register_local_2_0 EXECUTE 347.0 local - 8
    1290676547 register_local_2_0 JOB_TERMINATED 347.0 local - 8
    1290676547 register_local_2_0 JOB_SUCCESS 0 local - 8
    1290676547 register_local_2_0 POST_SCRIPT_STARTED 347.0 local - 8
    1290676552 register_local_2_0 POST_SCRIPT_TERMINATED 347.0 local - 8
    1290676552 register_local_2_0 POST_SCRIPT_SUCCESS 0 local - 8
    1290676552 INTERNAL *** DAGMAN_FINISHED 0 ***
    1290676554 INTERNAL *** MONITORD_FINISHED 0 ***
    

    Looking at the last two lines we see that DAGMan finished, and pegasus-monitord finished successfully with a status 0. This means workflow ran successfully. Congratulations you ran your workflow on the local site successfully. The workflow generates a final output file f.d that resides in the directory /home/tutorial/local-storage/storage/f.d .

    To view the file, you can do the following

    $ cat /home/tutorial/local-storage/storage/f.d
    
    --- start f.c1 ----
      --- start f.b1 ----
        --- start f.a ----
          Input File for the Diamond Workflow.--- final f.a ----
        Timestamp Today: 20101223T105659.955-08:00 (1293130619.955;60.002)
        Applicationname: preprocess @ 10.0.2.15 (VPN)
        Current Workdir: /home/tutorial/local-scratch/exec/tutorial/pegasus/blackdiamond/run0001
        Systemenvironm.: i686-Linux 2.6.32-5-686
        Processor Info.: 1 x Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz @ 2797.463
        Load Averages  : 0.646 0.192 0.060
        Memory Usage MB: 502 total, 229 free, 0 shared, 39 buffered
        Swap Usage   MB: 397 total, 397 free
        Filesystem Info: /media/cdrom0            udf,iso9660    31MB total,     0B avail
        Filesystem Info: /media/floppy0           auto  7668MB total,  5436MB avail
        Output Filename: f.b1
        Input Filenames: f.a
      --- final f.b1 ----
      Timestamp Today: 20101223T105815.334-08:00 (1293130695.334;60.003)
      Applicationname: findrange @ 10.0.2.15 (VPN)
      Current Workdir: /home/tutorial/local-scratch/exec/tutorial/pegasus/blackdiamond/run0001
      Systemenvironm.: i686-Linux 2.6.32-5-686
      Processor Info.: 1 x Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz @ 2797.463
      Load Averages  : 1.444 0.509 0.177
      Memory Usage MB: 502 total, 227 free, 0 shared, 39 buffered
      Swap Usage   MB: 397 total, 397 free
      Filesystem Info: /media/cdrom0            udf,iso9660    31MB total,     0B avail
      Filesystem Info: /media/floppy0           auto  7668MB total,  5436MB avail
      Output Filename: f.c1
      Input Filenames: f.b1
    --- final f.c1 ----
    --- start f.c2 ----
      --- start f.b2 ----
        --- start f.a ----
          Input File for the Diamond Workflow.--- final f.a ----
        Timestamp Today: 20101223T105659.955-08:00 (1293130619.955;60.003)
        Applicationname: preprocess @ 10.0.2.15 (VPN)
        Current Workdir: /home/tutorial/local-scratch/exec/tutorial/pegasus/blackdiamond/run0001
        Systemenvironm.: i686-Linux 2.6.32-5-686
        Processor Info.: 1 x Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz @ 2797.463
        Load Averages  : 0.646 0.192 0.060
        Memory Usage MB: 502 total, 229 free, 0 shared, 39 buffered
        Swap Usage   MB: 397 total, 397 free
        Filesystem Info: /media/cdrom0            udf,iso9660    31MB total,     0B avail
        Filesystem Info: /media/floppy0           auto  7668MB total,  5436MB avail
        Output Filename: f.b2
        Input Filenames: f.a
      --- final f.b2 ----
      Timestamp Today: 20101223T105820.478-08:00 (1293130700.478;60.001)
      Applicationname: findrange @ 10.0.2.15 (VPN)
      Current Workdir: /home/tutorial/local-scratch/exec/tutorial/pegasus/blackdiamond/run0001
      Systemenvironm.: i686-Linux 2.6.32-5-686
      Processor Info.: 1 x Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz @ 2797.463
      Load Averages  : 1.409 0.517 0.182
      Memory Usage MB: 502 total, 228 free, 0 shared, 39 buffered
      Swap Usage   MB: 397 total, 397 free
      Filesystem Info: /media/cdrom0            udf,iso9660    31MB total,     0B avail
      Filesystem Info: /media/floppy0           auto  7668MB total,  5436MB avail
      Output Filename: f.c2
      Input Filenames: f.b2
    --- final f.c2 ----
    Timestamp Today: 20101223T105936.718-08:00 (1293130776.718;60.000)
    Applicationname: analyze @ 10.0.2.15 (VPN)
    Current Workdir: /home/tutorial/local-scratch/exec/tutorial/pegasus/blackdiamond/run0001
    Systemenvironm.: i686-Linux 2.6.32-5-686
    Processor Info.: 1 x Intel(R) Xeon(R) CPU           E5462  @ 2.80GHz @ 2797.463
    Load Averages  : 1.033 0.581 0.226
    Memory Usage MB: 502 total, 228 free, 0 shared, 40 buffered
    Swap Usage   MB: 397 total, 397 free
    Filesystem Info: /media/cdrom0            udf,iso9660    31MB total,     0B avail
    Filesystem Info: /media/floppy0           auto  7668MB total,  5436MB avail
    Output Filename: f.d
    Input Filenames: f.c1 f.c2
    
    
  • Unsuccessfully Completed (Workflow execution stopped midway) : Let us again look at the jobstate.log. Again we need to look at the last few lines of jobstate.log

    $ tail jobstate.log
    
    1290677127 stage_in_local_local_0 EXECUTE 352.0 local - 4
    1290677127 stage_in_local_local_0 JOB_TERMINATED 352.0 local - 4
    1290677127 stage_in_local_local_0 JOB_FAILURE 1 local - 4
    1290677127 stage_in_local_local_0 POST_SCRIPT_STARTED 352.0 local - 4
    1290677132 stage_in_local_local_0 POST_SCRIPT_TERMINATED 352.0 local - 4
    1290677132 stage_in_local_local_0 POST_SCRIPT_FAILURE 1 local - 4
    1290677132 INTERNAL *** DAGMAN_FINISHED 1 ***
    1290677134 INTERNAL *** MONITORD_FINISHED 0 ***
    
    

    Looking at the last two lines we see that DAGMan finished, and pegasus-monitord finished unsuccessfully with a status 1. We can easily determine which job failed. It is stage_in_local_local_0 in this case. To determine the reason for failure we need to look at it's kickstart output file which is JOBNAME.out.NNN. where NNN is 000 - NNN

1.3.2. Debugging a failed workflow using pegasus-analyzer

In this section, we will run the diamond workflow but remove the input file so that the workflow fails during execution. This is to highlight how to use pegasus-analyzer to debug a failed workflow.

First of all lets rename the input file f.a

 $ mv /scratch/tutorial/inputdata/diamond/f.a /scratch/tutorial/inputdata/diamond/f.a.old

 $ cd $HOME/pegasus-wms
 

We will now repeat exercise 2.4 and 2.5 and submit the workflow again.

Plan and Submit the diamond workflow . Pass --submit to pegasus-plan to submit in case of successful planning

$  pegasus-plan --dax `pwd`/dax/diamond.dax --force \
        --dir dags -s local -o local --nocleanup --submit -v


Use pegasus-status to track the workflow and wait it to fail

$ watch pegasus-status  /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002


-- Submitter: pegasus : <172.16.80.128:40195> : pegasus
 ID      OWNER/NODENAME   SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  96.0   pegasus         7/19 17:40   0+00:01:06 R  0   7.3  condor_dagman -f -


The --long option to pegasus-status of a running workflow gives more detail
[pegasus@pegasus pegasus-wms]$ pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002
blackdiamond-0.dag is running.
11/25 01:25:06  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/25 01:25:06   ===     ===      ===     ===     ===        ===      ===
11/25 01:25:06     1       0        1       0       0          6        0

WORKFLOW STATUS : RUNNING | 1/8 ( 12% ) | (condor processing workflow)



We can also use --long option to pegasus-status to see the FINAL status of the workflow

$ pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002
blackdiamond-0.dag FAILED (status 1)
11/25 01:25:32  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/25 01:25:32   ===     ===      ===     ===     ===        ===      ===
11/25 01:25:32     1       0        0       0       0          6        1

WORKFLOW STATUS : FAILED | 1/8 ( 12% ) | (rescue needs to be submitted)


We will now run pegasus-analyzer on the failed workflow submit directory to see what job failed.

$ pegasus-analyzer  -i $HOME/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002
pegasus-analyzer: initializing...

************************************Summary*************************************

 Total jobs         :      8 (100.00%)
 # jobs succeeded   :      1 (12.50%)
 # jobs failed      :      1 (12.50%)
 # jobs unsubmitted :      6 (75.00%)

******************************Failed jobs' details******************************

=============================stage_in_local_local_0=============================

 last state: POST_SCRIPT_FAILURE
       site: local
submit file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.sub
output file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.out.002
 error file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.err.002

**************************************Done**************************************

pegasus-analyzer: end of status report

[pegasus@pegasus pegasus-wms]$ pegasus-analyzer  -i /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002
pegasus-analyzer: initializing...

************************************Summary*************************************

 Total jobs         :      8 (100.00%)
 # jobs succeeded   :      1 (12.50%)
 # jobs failed      :      1 (12.50%)
 # jobs unsubmitted :      6 (75.00%)

******************************Failed jobs' details******************************

=============================stage_in_local_local_0=============================

 last state: POST_SCRIPT_FAILURE
       site: local
submit file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.sub
output file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.out.002
 error file: /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.err.002

-------------------------------Task #1 - Summary--------------------------------

site        : local
hostname    : pegasus
executable  : /opt/pegasus/default/bin/pegasus-transfer
arguments   : 
exitcode    : 1
working dir : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002

--Task #1 - pegasus::pegasus-transfer - pegasus::pegasus-transfer:1.0 - stdout--

2010-11-25 01:25:22,320    INFO:  Reading URL pairs from stdin
2010-11-25 01:25:22,321    INFO:  PATH=/usr/local/globus/default/bin:/opt/pegasus/default/bin:/usr/bin:/bin
2010-11-25 01:25:22,321    INFO:  LD_LIBRARY_PATH=/usr/local/globus/default/lib:/usr/java/jdk1.6.0_20/jre/lib/amd64
2010-11-25 01:25:22,321    INFO:  Executing cp commands
/bin/cp: cannot stat `/scratch/tutorial/inputdata/diamond/f.a': No such file or directory
2010-11-25 01:25:22,331 CRITICAL:  Command'/bin/cp -L"/scratch/tutorial/inputdata/diamond/f.a"
                 "/home/tutorial/local-scratch/exec/pegasus/pegasus/blackdiamond/run0002/f.a"'failed with error code 1

**************************************Done**************************************

pegasus-analyzer: end of status report

[pegasus@pegasus pegasus-wms]$ 

The above tells us that the stage-in job for the workflow failed, and points us to the stdout of the job. By default, all jobs in Pegasus are launched via kickstart that captures runtime provenance of the job and helps in debugging. Hence, the stdout of the job is the kickstart stdout which is in XML.

. the duration of the job the start time for the job the node on which the job ran the stdout/stderr of the job the arguments with which it launched the job the environment that was set for the job before it was launched. the machine information about the node that the job ran on Amongst the above information, the dagman.out file gives a coarser grained estimate of the job duration and start time

1.3.3. Kickstart and Condor DAGMan format and log files

This section explains how to read kickstart output and DAGMan Condor log files.

1.3.3.1. Kickstart

Kickstart is a light weight C executable that is shipped with the pegasus worker package. All jobs are launced via Kickstart on the remote end, unless explicitly disabled at the time of running pegasus-plan.

Kickstart does not work with

  1. Condor Standard Universe Jobs

  2. MPI jobs

Pegasus automatically disables kickstart for the above jobs.

Kickstart captures useful runtime provenance information about the job launched by it on the remote note, and puts in an XML record that it writes to it's stdout. The stdout appears in the workflow submit directory as <job>.out.00n . Some useful information captured by kickstart and logged are as follows

  1. the exitcode with which the job it launched exited

  2. the duration of the job

  3. the start time for the job

  4. the node on which the job ran

  5. the directory in which the job ran

  6. the stdout/stderr of the job

  7. the arguments with which it launched the job

  8. the environment that was set for the job before it was launched.

  9. the machine information about the node that the job ran on

1.3.3.1.1. Reading a Kickstart Output File

Lets look at the stdout of our failed job.

$ cat /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002/stage_in_local_local_0.out.002 

 <?xml version="1.0" encoding="ISO-8859-1"?>
 <invocation xmlns="http://pegasus.isi.edu/schema/invocation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://pegasus.isi.edu/schema/invocation http://pegasus.isi.edu/schema/iv-2.1.xsd" version="2.1"
  start="2010-11-29T19:10:23.862-08:00" duration="0.076" transformation="pegasus::pegasus-transfer" 
  derivation="pegasus::pegasus-transfer:1.0" resource="local" wf-label="blackdiamond" 
  wf-stamp="2010-11-29T18:57:59-08:00" interface="eth0" hostaddr="10.0.2.15" hostname="pegasus-vm.local" 
  pid="5428" uid="501" user="pegasus" gid="501" group="pegasus" umask="0022">
 
 <mainjob start="2010-11-29T19:10:23.876-08:00" duration="0.063" pid="5429">
    <usage utime="0.040" stime="0.023" minflt="2758" majflt="0" nswap="0" nsignals="0" nvcsw="5" nivcsw="20"/>
    <status raw="256"><regular exitcode="1"/></status>
    <statcall error="0">
      <file name="/opt/pegasus/default/bin/pegasus-transfer">23212F7573722F62696E2F656E762070</file>
      <statinfo mode="0100775" size="25314" inode="2022205" nlink="1" blksize="4096" blocks="64" 
               mtime="2010-11-23T13:14:52-08:00" 
             atime="2010-11-29T19:10:07-08:00" ctime="2010-11-25T00:01:52-08:00" uid="501" user="pegasus" 
               gid="501" group="pegasus"/>
    </statcall>
    <argument-vector/>
  </mainjob>
  <cwd>/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0002</cwd>
  <usage utime="0.002" stime="0.013" minflt="475" majflt="0" nswap="0" nsignals="0" nvcsw="1" nivcsw="5"/>
  

   <machine page-size="4096">
    <stamp>2010-12-23T10:56:43.817-08:00</stamp>
    <uname system="linux" nodename="pegasus-vm" release="2.6.32-5-686" machine="i686">
      #1 SMP Fri Dec 10 16:12:40 UTC 2010</uname>
   <linux>
    <ram total="527044608" free="242290688" shared="0" buffer="41041920"/>
    <swap total="417325056" free="417325056"/>
    <boot idle="1597.500">2010-12-23T10:29:16.599-08:00</boot>
    <cpu count="1" speed="2797" vendor="GenuineIntel">Intel(R) Xeon(R) CPU E5462 @ 2.80GHz</cpu>
    <load min1="0.05" min5="0.02" min15="0.00"/>
    <proc total="88" running="1" sleeping="87" vmsize="344793088" rss="123768832"/>
    <task total="101" running="1" sleeping="100"/>
   </linux>
  </machine>
 

  <statcall error="0" id="stdin">
    <descriptor number="0"/>
    <statinfo mode="0100664" size="142" inode="2250032" nlink="1" blksize="4096" blocks="16" 
     mtime="2010-11-29T19:09:20-08:00"   atime="2010-11-29T19:10:07-08:00" ctime="2010-11-29T19:09:20-08:00" 
     uid="501" user="pegasus" gid="501" group="pegasus"/>
  </statcall>

  <statcall error="0" id="stdout">
    <temporary name="/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/gs.out.awOX6p" descriptor="3"/>
    <statinfo mode="0100600" size="762" inode="2054511" nlink="1" blksize="4096" blocks="16" 
          mtime="2010-11-29T19:10:23-08:00" atime="2010-11-29T19:10:23-08:00" ctime="2010-11-29T19:10:23-08:00" 
             uid="501" user="pegasus" gid="501" group="pegasus"/>
    <data>2010-11-29 19:10:23,920    INFO:  Reading URL pairs from stdin
2010-11-29 19:10:23,921    INFO:  PATH=/usr/local/globus/default/bin:/opt/pegasus/default/bin:/usr/bin:/bin
2010-11-29 19:10:23,921    INFO:  LD_LIBRARY_PATH=/usr/local/globus/default/lib:/usr/java/jdk1.6.0_20/jre/lib/amd64/
2010-11-29 19:10:23,921    INFO:  Executing cp commands
/bin/cp: cannot stat `/scratch/tutorial/inputdata/diamond/f.a&apos;: No such file or directory
2010-11-29 19:10:23,932 CRITICAL:  Command &apos;/bin/cp -L &quot;/scratch/tutorial/inputdata/diamond/f.a&quot; 
    &quot;/home/tutorial/local-scratch/exec/pegasus/pegasus/blackdiamond/run0002/f.a&quot;&apos; failed with error code 1
</data>
  </statcall>

  <statcall error="0" id="stderr">
    <temporary name="/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/gs.err.oz9MOG" descriptor="4"/>
    <statinfo mode="0100600" size="0" inode="2054512" nlink="1" blksize="4096" blocks="8" 
    mtime="2010-11-29T19:10:23-08:00"  atime="2010-11-29T19:10:23-08:00" ctime="2010-11-29T19:10:23-08:00" 
    uid="501" user="pegasus" gid="501" group="pegasus"/>
  </statcall>

  <statcall error="2" id="gridstart">
    <!-- ignore above error -->
    <file name="condor_exec.exe"/>
  </statcall>
  <statcall error="0" id="logfile">
    <descriptor number="1"/>
    <statinfo mode="0100644" size="0" inode="2250072" nlink="1" blksize="4096" blocks="8" mtime="2010-11-29T19:10:23-08:00" 
    atime="2010-11-29T19:10:23-08:00" ctime="2010-11-29T19:10:23-08:00" uid="501" user="pegasus" gid="501" group="pegasus"/>
  </statcall>
  <statcall error="0" id="channel">
    <fifo name="/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/gs.app.qCOCwX" descriptor="5" count="0"
     rsize="0" wsize="0"/>
    <statinfo mode="010640" size="0" inode="2054524" nlink="1" blksize="4096" blocks="8" mtime="2010-11-29T19:10:23-08:00" 
     atime="2010-11-29T19:10:23-08:00" ctime="2010-11-29T19:10:23-08:00" uid="501" user="pegasus" gid="501" 
    group="pegasus"/>
  </statcall>
  <environment>
    <env key="GLOBUS_LOCATION">/usr/local/globus/default</env>
    <env key="GRIDSTART_CHANNEL">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/gs.app.qCOCwX</env>
    <env key="JAVA_HOME">/usr</env>
    <env key="LD_LIBRARY_PATH">/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:</env>
    <env key="PEGASUS_HOME">/opt/pegasus/default</env>
    <env key="TEMP">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427</env>
    <env key="TMP">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427</env>
    <env key="TMPDIR">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427</env>
    <env key="_CONDOR_ANCESTOR_4843">4862:1291085504:2790807554</env>
    <env key="_CONDOR_ANCESTOR_4862">5427:1291086623:1798288782</env>
    <env key="_CONDOR_ANCESTOR_5427">5428:1291086623:2750667008</env>
    <env key="_CONDOR_HIGHPORT">41000</env>
    <env key="_CONDOR_JOB_AD">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/.job.ad</env>
    <env key="_CONDOR_LOWPORT">40000</env>
    <env key="_CONDOR_MACHINE_AD">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427/.machine.ad</env>
    <env key="_CONDOR_SCRATCH_DIR">/opt/condor/local.pegasus/spool/local_univ_execute/dir_5427</env>
    <env key="_CONDOR_SLOT">1</env>
  </environment>
  <resource>
    <soft id="RLIMIT_CPU">unlimited</soft>
    <hard id="RLIMIT_CPU">unlimited</hard>
    <soft id="RLIMIT_FSIZE">unlimited</soft>
    <hard id="RLIMIT_FSIZE">unlimited</hard>
    <soft id="RLIMIT_DATA">unlimited</soft>
    <hard id="RLIMIT_DATA">unlimited</hard>
    <soft id="RLIMIT_STACK">unlimited</soft>
    <hard id="RLIMIT_STACK">unlimited</hard>
    <soft id="RLIMIT_CORE">0</soft>
    <hard id="RLIMIT_CORE">0</hard>
    <soft id="RESOURCE_5">unlimited</soft>
    <hard id="RESOURCE_5">unlimited</hard>
    <soft id="RLIMIT_NPROC">unlimited</soft>
    <hard id="RLIMIT_NPROC">unlimited</hard>
    <soft id="RLIMIT_NOFILE">1024</soft>
    <hard id="RLIMIT_NOFILE">1024</hard>
    <soft id="RLIMIT_MEMLOCK">32768</soft>
    <hard id="RLIMIT_MEMLOCK">32768</hard>
    <soft id="RLIMIT_AS">unlimited</soft>
    <hard id="RLIMIT_AS">unlimited</hard>
    <soft id="RLIMIT_LOCKS">unlimited</soft>
    <hard id="RLIMIT_LOCKS">unlimited</hard>
    <soft id="RLIMIT_SIGPENDING">8192</soft>
    <hard id="RLIMIT_SIGPENDING">8192</hard>
    <soft id="RLIMIT_MSGQUEUE">819200</soft>
    <hard id="RLIMIT_MSGQUEUE">819200</hard>
    <soft id="RLIMIT_NICE">0</soft>
    <hard id="RLIMIT_NICE">0</hard>
    <soft id="RLIMIT_RTPRIO">0</soft>
    <hard id="RLIMIT_RTPRIO">0</hard>
  </resource>
</invocation>

1.3.3.2. Condor DAGMan format and log files etc.

In this exercise we will learn about the DAG file format and some of the log files generated when the DAG runs.

  • Now take a look at the DAG file...

    $ cat $HOME/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/blackdiamond-0.dag
    
    
    ######################################################################
    # PEGASUS WMS GENERATED DAG FILE
    # DAG blackdiamond
    # Index = 0, Count = 1
    ######################################################################
    MAXJOBS projection 2
    
    JOB create_dir_blackdiamond_0_local create_dir_blackdiamond_0_local.sub
    SCRIPT POST create_dir_blackdiamond_0_local /opt/pegasus/default/bin/pegasus-exitcode 
      /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/create_dir_blackdiamond_0_local.out
    RETRY create_dir_blackdiamond_0_local 2
    
    JOB stage_in_local_local_0 stage_in_local_local_0.sub
    SCRIPT POST stage_in_local_local_0 /opt/pegasus/default/bin/pegasus-exitcode  
     /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/stage_in_local_local_0.out
    RETRY stage_in_local_local_0 2
    
    JOB preprocess_j1 preprocess_j1.sub
    SCRIPT POST preprocess_j1 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/preprocess_j1.out
    RETRY preprocess_j1 2
    
    JOB findrange_j2 findrange_j2.sub
    SCRIPT POST findrange_j2 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/findrange_j2.out
    RETRY findrange_j2 2
    
    JOB findrange_j3 findrange_j3.sub
    SCRIPT POST findrange_j3 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/findrange_j3.out
    RETRY findrange_j3 2
    
    JOB analyze_j4 analyze_j4.sub
    SCRIPT POST analyze_j4 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/analyze_j4.out
    RETRY analyze_j4 2
    
    JOB stage_out_local_local_2_0 stage_out_local_local_2_0.sub
    SCRIPT POST stage_out_local_local_2_0 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/stage_out_local_local_2_0.out
    RETRY stage_out_local_local_2_0 2
    
    JOB register_local_2_0 register_local_2_0.sub
    SCRIPT POST register_local_2_0 /opt/pegasus/default/bin/pegasus-exitcode   
    /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/register_local_2_0.out
    RETRY register_local_2_0 2
    
    PARENT findrange_j2 CHILD analyze_j4
    PARENT preprocess_j1 CHILD findrange_j2
    PARENT preprocess_j1 CHILD findrange_j3
    PARENT findrange_j3 CHILD analyze_j4
    PARENT analyze_j4 CHILD stage_out_local_local_2_0
    PARENT stage_in_local_local_0 CHILD preprocess_j1
    PARENT stage_out_local_local_2_0 CHILD register_local_2_0
    PARENT create_dir_blackdiamond_0_local CHILD analyze_j4
    PARENT create_dir_blackdiamond_0_local CHILD findrange_j2
    PARENT create_dir_blackdiamond_0_local CHILD preprocess_j1
    PARENT create_dir_blackdiamond_0_local CHILD findrange_j3
    PARENT create_dir_blackdiamond_0_local CHILD stage_in_local_local_0
    ######################################################################
    # End of DAG
    ##################################################################
  • ... and the dagman.out file.

    $ cat $HOME/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/blackdiamond-0.dag.dagman.out 
    
    11/25 01:10:47 ******************************************************
    11/25 01:10:47 ** condor_scheduniv_exec.339.0 (CONDOR_DAGMAN) STARTING UP
    11/25 01:10:47 ** /opt/condor/7.4.2/bin/condor_dagman
    11/25 01:10:47 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
    11/25 01:10:47 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
    11/25 01:10:47 ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
    11/25 01:10:47 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
    11/25 01:10:47 ** PID = 7844
    11/25 01:10:47 ** Log last touched time unavailable (No such file or directory)
    11/25 01:10:47 ******************************************************
    11/25 01:10:47 Using config source: /opt/condor/config/condor_config
    11/25 01:10:47 Using local config sources: 
    11/25 01:10:47    /opt/condor/config/condor_config.local
    11/25 01:10:47 DaemonCore: Command Socket at <172.16.80.129:40035>
    11/25 01:10:47 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880
    11/25 01:10:47 DAGMAN_DEBUG_CACHE_ENABLE setting: False
    11/25 01:10:47 DAGMAN_SUBMIT_DELAY setting: 0
    11/25 01:10:47 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6
    11/25 01:10:47 DAGMAN_STARTUP_CYCLE_DETECT setting: 0
    11/25 01:10:47 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 5
    11/25 01:10:47 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5
    11/25 01:10:47 allow_events (DAGMAN_IGNORE_DUPLICATE_JOB_EXECUTION, DAGMAN_ALLOW_EVENTS) setting: 114
    11/25 01:10:47 DAGMAN_RETRY_SUBMIT_FIRST setting: 1
    11/25 01:10:47 DAGMAN_RETRY_NODE_FIRST setting: 0
    11/25 01:10:47 DAGMAN_MAX_JOBS_IDLE setting: 0
    11/25 01:10:47 DAGMAN_MAX_JOBS_SUBMITTED setting: 0
    11/25 01:10:47 DAGMAN_MUNGE_NODE_NAMES setting: 1
    11/25 01:10:47 DAGMAN_PROHIBIT_MULTI_JOBS setting: 0
    11/25 01:10:47 DAGMAN_SUBMIT_DEPTH_FIRST setting: 0
    11/25 01:10:47 DAGMAN_ABORT_DUPLICATES setting: 1
    11/25 01:10:47 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: 1
    11/25 01:10:47 DAGMAN_PENDING_REPORT_INTERVAL setting: 600
    11/25 01:10:47 DAGMAN_AUTO_RESCUE setting: 1
    11/25 01:10:47 DAGMAN_MAX_RESCUE_NUM setting: 100
    11/25 01:10:47 DAGMAN_DEFAULT_NODE_LOG setting: null
    11/25 01:10:47 ALL_DEBUG setting: 
    11/25 01:10:47 DAGMAN_DEBUG setting: 
    ....
    11/25 01:10:47 Default node log file is:
     </home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/blackdiamond-0.dag.nodes.log>
    11/25 01:10:47 DAG Lockfile will be written to blackdiamond-0.dag.lock
    11/25 01:10:47 DAG Input file is blackdiamond-0.dag
    11/25 01:10:47 Parsing 1 dagfiles
    11/25 01:10:47 Parsing blackdiamond-0.dag ...
    11/25 01:10:47 Dag contains 8 total jobs
    11/25 01:10:47 Sleeping for 12 seconds to ensure ProcessId uniqueness
    11/25 01:10:59 Bootstrapping...
    11/25 01:10:59 Number of pre-completed nodes: 0
    11/25 01:10:59 Registering condor_event_timer...
    11/25 01:11:00 Sleeping for one second for log file consistency
    11/25 01:11:01 Submitting Condor Node create_dir_blackdiamond_0_local job(s)...
    11/25 01:11:01 submitting: condor_submit -a dag_node_name' '=' 'create_dir_blackdiamond_0_local -a 
    +DAGManJobId' '=' '339 -a DAGManJobId' '=' '339 -a submit_event_notes' '=' 'DAG' 'Node:' '
    create_dir_blackdiamond_0_local -a +DAGParentNodeNames' '=' '"" create_dir_blackdiamond_0_local.sub
    11/25 01:11:01 From submit: Submitting job(s).
    11/25 01:11:01 From submit: Logging submit event(s).
    11/25 01:11:01 From submit: 1 job(s) submitted to cluster 340.
    11/25 01:11:01  assigned Condor ID (340.0)
    11/25 01:11:01 Just submitted 1 job this cycle...
    11/25 01:11:01 Currently monitoring 1 Condor log file(s)
    11/25 01:11:01 Event: ULOG_SUBMIT for Condor Node create_dir_blackdiamond_0_local (340.0)
    11/25 01:11:01 Number of idle job procs: 1
    11/25 01:11:01 Of 8 nodes total:
    11/25 01:11:01  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
    11/25 01:11:01   ===     ===      ===     ===     ===        ===      ===
    11/25 01:11:01     0       0        1       0       0          7        0
    ....
    11/25 01:11:06 Currently monitoring 1 Condor log file(s)
    11/25 01:11:06 Event: ULOG_EXECUTE for Condor Node create_dir_blackdiamond_0_local (340.0)
    11/25 01:11:06 Number of idle job procs: 0
    11/25 01:11:06 Event: ULOG_JOB_TERMINATED for Condor Node create_dir_blackdiamond_0_local (340.0)
    11/25 01:11:06 Node create_dir_blackdiamond_0_local job proc (340.0) completed successfully.
    11/25 01:11:06 Node create_dir_blackdiamond_0_local job completed
    11/25 01:11:06 Running POST script of Node create_dir_blackdiamond_0_local...
    11/25 01:11:06 Number of idle job procs: 0
    11/25 01:11:06 Of 8 nodes total:
    11/25 01:11:06  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
    11/25 01:11:06   ===     ===      ===     ===     ===        ===      ===
    11/25 01:11:06     0       0        0       1       0          7        0
    11/25 01:11:11 Currently monitoring 1 Condor log file(s)
    11/25 01:11:11 Event: ULOG_POST_SCRIPT_TERMINATED for Condor Node create_dir_blackdiamond_0_local (340.0)
    11/25 01:11:11 POST Script of Node create_dir_blackdiamond_0_local completed successfully.
    11/25 01:11:11 Of 8 nodes total:
    11/25 01:11:11  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
    11/25 01:11:11   ===     ===      ===     ===     ===        ===      ===
    11/25 01:11:11     1       0        0       0       1          6        0
    ....
    11/25 01:15:52 Event: ULOG_POST_SCRIPT_TERMINATED for Condor Node register_local_2_0 (347.0)
    11/25 01:15:52 POST Script of Node register_local_2_0 completed successfully.
    11/25 01:15:52 Of 8 nodes total:
    11/25 01:15:52  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
    11/25 01:15:52   ===     ===      ===     ===     ===        ===      ===
    11/25 01:15:52     8       0        0       0       0          0        0
    11/25 01:15:52 All jobs Completed!
    11/25 01:15:52 Note: 0 total job deferrals because of -MaxJobs limit (0)
    11/25 01:15:52 Note: 0 total job deferrals because of -MaxIdle limit (0)
    11/25 01:15:52 Note: 0 total job deferrals because of node category throttles
    11/25 01:15:52 Note: 0 total PRE script deferrals because of -MaxPre limit (20)
    11/25 01:15:52 Note: 0 total POST script deferrals because of -MaxPost limit (20)
    11/25 01:15:52 **** condor_scheduniv_exec.339.0 (condor_DAGMAN) pid 7844 EXITING WITH STATUS 0
    [p
    
    

1.3.4. Removing a running workflow

Sometimes you may want to halt the execution of the workflow or just permanently remove it. You can stop/halt a workflow by running the pegasus-remove command mentioned in the output of pegasus-run

$ pegasus-remove $HOME/pegasus-wms/dags/tutorial/pegasus/diamond/runXXXX

Job 2788.0 marked for removal

1.3.5. Generating statistics and plots of a workflow run

In this section, we will generate statistics and plots of the diamond workflow we ran using pegasus-statistics and pegasus-plots

1.3.5.1. Generating Statistics Using pegasus-statistics

pegasus-statistics generates workflow execution statistics. To generate statistics run the command as shown below

$ cd $HOME/pegasus-wms


$ pegasus-statistics dags/tutorial/pegasus/blackdiamond/run0001/


tutorial@pegasus-vm:~/pegasus-wms$ pegasus-statistics dags/tutorial/pegasus/blackdiamond/run0001/



******************************************** SUMMARY ********************************************
#Legends
#Workflow runtime (min,sec) - the waltime from the start of the workflow execution to the end as 
                              reported by the DAGMAN.In case of rescue dag the value is the cumulative 
                              of all retries.
#Cumulative workflow runtime (min,sec) - the sum of the walltime of all jobs as reported by the DAGMan .
                                         In case of job retries the value is the cumulative of all retries.

Job summary
  Total - the total number of jobs in the workflow. The total number of jobs is calculated by parsing 
          the .dag file.  For workflows having SUBDAX jobs , the SUDBAX job is skipped , but the 
          calculation takes into account all the jobs that make up the SUBDAX sub workflow. 
          For workflows having SUBDAG jobs , the SUBDAG jobs are treated like regular jobs.
  Succeeded - the total number of succeeded jobs in the workflow . 
  Failed - the total number of failed jobs in the workflow .
  Unsubmitted - the total number of unsubmitted jobs in the workflow .
  Unknown - the total number of jobs that are submitted, but has not completed execution or the state
            is unknown in the workflow.

SUBDAX summary
  Total - the total number of SUBDAX jobs in the workflow 
  Succeeded - the total number of succeeded SUBDAX jobs in the workflow.
  Failed - the total number of failed SUBDAX jobs in the workflow.
  Unsubmitted - the total number of unsubmitted SUBDAX jobs in the workflow.
  Unknown - the total number of SUBDAX jobs that are submitted, but has not completed execution or
            the state is unknown in the workflow.

Workflow runtime                   :            5 min.  5 sec.
Cumulative workflow runtime        :            4 min.  0 sec.

          Total        Succeeded    Failed       Unsubmitted  Unknown        
Jobs      8            8            0            0            0              
SUBDAX    0            0            0            0            0              

Workflow execution statistics :
/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/statistics/workflow.txt

Job statistics : 
/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/statistics/jobs.txt

Workflow events with time starting from zero : 
/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/statistics/jobstate.txt

Logical transformation statistics :
/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/statistics/breakdown.txt
**************************************************************************************************

 

Workflow statistics table

Workflow statistics table contains information about the workflow run like total execution time, job's failed etc.

Table 1.1. Table Workflow Statistics

Workflow runtime 5 min. 5 sec.
Cumulative workflow runtime 4 min. 0 sec.
Total jobs 8
# jobs succeeded 8
# jobs failed 0
# jobs unsubmitted 0
# jobs unknown 0

Job statistics table

Job statistics table contains the following details about the jobs in the workflow. A sample table is shown below.

  • Job - the name of the job

  • Site - the site where the job ran

  • Kickstart(sec.) - the actual duration of the job in seconds on the remote compute node. In case of retries the value is the cumulative of all retries.

  • Post(sec.) - the postscript time as reported by DAGMan .In case of retries the value is the cumulative of all retries.

  • DAGMan(sec.) - the time between the last parent job of a job completes and the job gets submitted.In case of retries the value of the last retry is used for calculation.

  • CondorQTime(sec.) - the time between submission by DAGMan and the remote Grid submission. It is an estimate of the time spent in the condor q on the submit node .In case of retries the value is the cumulative of all retries.

  • Resource(sec.) - the time between the remote Grid submission and start of remote execution . It is an estimate of the time job spent in the remote queue .In case of retries the value is the cumulative of all retries.

  • Runtime(sec.) - the time spent on the resource as seen by Condor DAGMan . Is always >=kickstart .In case of retries the value is the cumulative of all retries.

  • Seqexec(sec.) - the time taken for the completion of a clustered job .In case of retries the value is the cumulative of all retries.

  • Seqexec-Delay(sec.) - the time difference between the time for the completion of a clustered job and sum of all the individual tasks kickstart time .In case of retries the value is the cumulative of all retries.

Table 1.2. Table Job Statistics

Job Site Kickstart Post DAGMan CondorQTime Resource Runtime CondorQLen Seqexec Seqexec-Delay
analyze_j4 local 60.03 6.00 6.00 0.00 0.00 60.00 0 - -
create_dir_blackdiamond_0_local local 0.04 5.00 14.00 0.00 0.00 0.06 0 - -
findrange_j2 local 60.03 5.00 6.00 0.00 0.00 65.00 0 - -
findrange_j3 local 60.03 5.00 6.00 0.00 0.00 60.00 0 - -
preprocess_j1 local 60.03 5.00 6.00 0.00 0.00 60.00 0 - -
register_local_2_0 local 0.50 5.00 6.00 0.00 0.00 0.05 0 - -
stage_in_local_local_0 local 0.08 6.00 6.00 0.00 0.00 0.04 0 - -
stage_out_local_local_2_0 local 0.08 5.00 6.00 0.00 0.00 0.03 0 - -

Logical transformation statistics table

Logical transformation statistics table contains information about each type of transformation in the workflow.

Table 1.3. Table: Logical Transformation Statistics

Transformation Count Mean Variance Min Max Total
diamond::analyze:2.0 1 60.1600 0.0000 60.1600 60.1600 60.1600
diamond::findrange:2.0 2 60.3100 0.0100 60.2500 60.3700 120.6200
diamond::preprocess:2.0 1 60.4800 0.0000 60.4800 60.4800 60.4800

1.3.5.2. Generating plots using pegasus-plots

pegasus-plots generates graphs and charts to visualize workflow execution. To generate graphs and charts run the command as shown below.

$ cd $HOME/pegasus-wms 
$ pegasus-plots dags/tutorial/pegasus/blackdiamond/run0001/

******  show-job *****  
Please wait, this may take a few minutes ...
****** Finished executing show-job  ***** 
******  show-run *****  
Please wait, this may take a few minutes ...
****** Finished executing show-run  ***** 
******  dag2dot  ***** 
Please wait, this may take a few minutes ...
****** Finished executing dag2dot ***** 
******  dot  ***** 
****** Finished executing dot2png ***** 
******  dax2dot  ***** 
Please wait, this may take a few minutes ...
****** Finished executing dax2dot ***** 
******  dot  ***** 
****** Finished executing dot2png ***** 



******************************************** SUMMARY ********************************************

DAX graph - 
png format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/diamond-dax.png 
eps format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/diamond-dax.eps  

DAG graph - 
png format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-dag.png 
eps format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-dag.eps 

Workflow execution Gantt chart -
png format : Failed to generate png format.Application 'convert' not available.
eps format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-2.eps

Host over time chart -
png format : Failed to generate png format.Application 'convert' not available.
eps format : /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-host.eps 
**************************************************************************************************


[pegasus@pegasus pegasus-wms]$ 

1.3.5.2.1. Abstract Worfklow / DAX Image

/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/diamond-dax.png

Figure 1.1. Figure: Black Diamond DAX Image

Figure: Black Diamond DAX Image

1.3.5.2.2. Executable Workflow / DAG Image

/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-dag.png

Figure 1.2. Figure: Black Diamond DAG Image

Figure: Black Diamond DAG Image

1.3.5.2.3. Gantt Chart of Workflow Execution

/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-2.png

X axis - time in seconds . Each tic is 60 seconds

Y axis - Job Number .

Figure 1.3. Figure: Gantt Chart of Workflow Execution

Figure: Gantt Chart of Workflow Execution

1.3.5.2.4. Host over time chart

/home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0001/graph/blackdiamond-host.png

X axis - time in seconds . Each tic is 60 seconds

Y axis - Job Number .

Figure 1.4. Figure: Gantt Chart of Workflow Execution

Figure: Gantt Chart of Workflow Execution

1.4. Planning and Executing Workflow against a Remote Resource

In this exercise we are going to run pegasus-plan to generate a executable workflow from the abstract workflow (montage.dax). The Executable workflow generated, are condor submit files that are submitted to remote grid resources using pegasus-run

The instructors have provided:

  • A dax (montage.dax) in the $HOME/pegasus-wms/dax/ directory.

You will need to write some things yourself, by following the instructions below:

  • Run pegasus-plan to generate the condor submit files out of the dax.

Instructions:

  • Let us run pegasus-plan on the montage dax on the tg_ncsa cluster. If multiple sites are available you could provide the sites using a comma "," separated list like tg_ncsa,viz etc.

    $ cd $HOME/pegasus-wms
    $ pegasus-plan -Dpegasus.schema.dax=/opt/pegasus/default/etc/dax-2.1.xsd \
                   --dir dags --sites cluster --output local --force \
                   --nocleanup --dax `pwd`/dax/montage.dax --submit -v

    The above command says that we need to plan the montage dax on the cluster site. The cluster site in the VM is managed by SGE that is running in the VM. The jobs for this workflow will be submitted to jobmanager-condor in the VM. The output data needs to be transferred back to the local host. The condor submit files are to be generated in a directory structure whose base is dags. We also are requesting that no cleanup jobs be added as we require the intermediate data on the remote host. Here is the output of pegasus-plan.

    
    2010.11.24 18:20:10.948 PST: [INFO] event.pegasus.parse.dax dax.id /home/tutorial/pegasus-wms/dax/montage.dax   
    2010.11.24 18:20:11.309 PST: [INFO] event.pegasus.parse.dax dax.id /home/tutorial/pegasus-wms/dax/montage.dax 
    2010.11.24 18:20:11.350 PST: [INFO] event.pegasus.refinement dax.id montage_0  - STARTED 
    2010.11.24 18:20:11.360 PST: [INFO] event.pegasus.siteselection dax.id montage_0  - STARTED 
    2010.11.24 18:20:11.416 PST: [INFO] event.pegasus.siteselection dax.id montage_0  - FINISHED 
    2010.11.24 18:20:11.504 PST: [INFO]  Grafting transfer nodes in the workflow 
    2010.11.24 18:20:11.505 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id montage_0  - STARTED 
    2010.11.24 18:20:11.655 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id montage_0  - FINISHED 
    2010.11.24 18:20:11.657 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id montage_0  - STARTED 
    2010.11.24 18:20:11.660 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id montage_0  - FINISHED 
    2010.11.24 18:20:11.660 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id montage_0  - STARTED 
    2010.11.24 18:20:11.661 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id montage_0  - FINISHED 
    2010.11.24 18:20:11.661 PST: [INFO] event.pegasus.refinement dax.id montage_0  - FINISHED 
    2010.11.24 18:20:11.715 PST: [INFO]  Generating codes for the concrete workflow 
    2010.11.24 18:20:12.406 PST: [INFO]  Generating codes for the concrete workflow -DONE 
    2010.11.24 18:20:12.406 PST: [INFO]  Generating code for the cleanup workflow 
    2010.11.24 18:20:12.528 PST: [INFO]  Generating code for the cleanup workflow -DONE 
    2010.11.24 18:20:12.672 PST:    
    2010.11.24 18:20:12.679 PST:   ----------------------------------------------------------------------- 
    2010.11.24 18:20:12.685 PST:   File for submitting this DAG to Condor           : montage-0.dag.condor.sub 
    2010.11.24 18:20:12.691 PST:   Log of DAGMan debugging messages                 : montage-0.dag.dagman.out 
    2010.11.24 18:20:12.704 PST:   Log of Condor library output                     : montage-0.dag.lib.out 
    2010.11.24 18:20:12.711 PST:   Log of Condor library error messages             : montage-0.dag.lib.err 
    2010.11.24 18:20:12.726 PST:   Log of the life of condor_dagman itself          : montage-0.dag.dagman.log 
    2010.11.24 18:20:12.731 PST:    
    2010.11.24 18:20:12.762 PST:   -no_submit given, not submitting DAG to Condor.  You can do this with: 
    2010.11.24 18:20:12.792 PST:   "condor_submit montage-0.dag.condor.sub" 
    2010.11.24 18:20:12.798 PST:   ----------------------------------------------------------------------- 
    2010.11.24 18:20:12.804 PST:   Submitting job(s). 
    2010.11.24 18:20:12.815 PST:   Logging submit event(s). 
    2010.11.24 18:20:12.821 PST:   1 job(s) submitted to cluster 275. 
    2010.11.24 18:20:13.504 PST:    
    2010.11.24 18:20:13.510 PST:   Your Workflow has been started and runs in base directory given below 
    2010.11.24 18:20:13.519 PST:    
    2010.11.24 18:20:13.530 PST:   cd /home/tutorial/pegasus-wms/dags/tutorial/pegasus/montage/run0001 
    2010.11.24 18:20:13.535 PST:    
    2010.11.24 18:20:13.542 PST:   *** To monitor the workflow you can run *** 
    2010.11.24 18:20:13.555 PST:    
    2010.11.24 18:20:13.562 PST:   pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/montage/run0001 
    2010.11.24 18:20:13.570 PST:    
    2010.11.24 18:20:13.578 PST:   *** To remove your workflow run *** 
    2010.11.24 18:20:13.585 PST:   pegasus-remove -d 275.0 
    2010.11.24 18:20:13.592 PST:   or 
    2010.11.24 18:20:13.604 PST:   pegasus-remove /home/tutorial/pegasus-wms/dags/tutorial/pegasus/montage/run0001 
    2010.11.24 18:20:13.610 PST:    
    2010.11.24 18:20:13.617 PST:   Time taken to execute is 3.76 seconds 
    2010.11.24 18:20:13.617 PST: [INFO] event.pegasus.planner planner.version 3.0.0  - FINISHED 
    
  • If you get any errors above while running pegasus-plan you can add -vvvvv to enable maximum verbosity on pegasus-run.

The above command submits the workflow to Condor DAGMan/CondorG. After submitting it starts a monitoring daemon pegasus-monitord that parses the condor log files to update the status of the jobs and push it in a work database.

Monitor the workflow using the commands provided in the output of the pegasus-run command and other commands explained earlier.

The workflow generates a single output file montage.jpg that resides in the directory /home/tutorial/local-storage/storage/montage.jpg if it runs successfully

The grid workflow will take time to execute on the VM. On the instructor's MAC Pro Desktop it took about 30 minutes to run.

1.5. Advanced Exercises

1.5.1. Optimizing a workflow by clustering small jobs (To Be Done offline)

Sometimes a workflow may have too many jobs whose execution time is a few seconds long. In such instances the overhead of scheduling each job on a grid is too large and the runtime of the entire workflow can be optimized by using Pegasus clustering techniques. One such technique is to cluster jobs horizontally on the same level into one or more sequential jobs.

$ cd $HOME/pegasus-wms
$ pegasus-plan -Dpegasus.schema.dax=/opt/pegasus/default/etc/dax-2.1.xsd \
            --dir `pwd`/dags --sites cluster --output local --nocleanup --force\
            --cluster horizontal --dax `pwd`/dax/montage.dax -v

After clustering the executable workflow will contain 26 jobs compared to 44 in the non clustered mode.

1.5.2. Data Reuse

In the DAX you can specify what output data products you want to track in the replica catalog. This is done by setting the register flags with the output files for a job. For our tutorial, we only register the final output data products. So if you were able to execute the diamond or the montage workflow successfully, we can do data reuse. Let us run pegasus-plan on the diamond workflow again. However, this time we will remove the --force option.

$ cd $HOME/pegasus-wms
$ pegasus-plan --dax `pwd`/dax/diamond.dax --dir `pwd`/dags -s local -o local --nocleanup -v

2010.11.25 01:35:11.186 PST: [INFO] event.pegasus.refinement dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.210 PST: [INFO] event.pegasus.reduce dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.211 PST: [INFO]  Nodes/Jobs Deleted from the Workflow during reduction  
2010.11.25 01:35:11.211 PST: [INFO]     analyze_j4 
2010.11.25 01:35:11.211 PST: [INFO]     findrange_j2 
2010.11.25 01:35:11.211 PST: [INFO]     findrange_j3 
2010.11.25 01:35:11.211 PST: [INFO]     preprocess_j1 
2010.11.25 01:35:11.211 PST: [INFO]  Nodes/Jobs Deleted from the Workflow during reduction  - DONE 
2010.11.25 01:35:11.212 PST: [INFO] event.pegasus.reduce dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.212 PST: [INFO] event.pegasus.siteselection dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.219 PST: [INFO] event.pegasus.siteselection dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.289 PST: [INFO]  Grafting transfer nodes in the workflow 
2010.11.25 01:35:11.290 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.370 PST: [INFO]  Adding stage out jobs for jobs deleted from the workflow 
2010.11.25 01:35:11.370 PST: [INFO]  The leaf file f.d is already at the output pool local 
2010.11.25 01:35:11.371 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.372 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.374 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.374 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id blackdiamond_0  - STARTED 
2010.11.25 01:35:11.375 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.375 PST: [INFO] event.pegasus.refinement dax.id blackdiamond_0  - FINISHED 
2010.11.25 01:35:11.426 PST: [INFO]  Generating codes for the concrete workflow 
2010.11.25 01:35:12.078 PST: [INFO]  Generating codes for the concrete workflow -DONE 
2010.11.25 01:35:12.083 PST:   


The executable workflow generated contains only a single NOOP job.
It seems that the output files are already at the output site. 
To regenerate the output data from scratch specify --force option.



pegasus-run -Dpegasus.user.properties=$HOME/.../blackdiamond/run0003/pegasus.4078026914028890643.properties\
 /home/tutorial/pegasus-wms/dags/tutorial/pegasus/blackdiamond/run0003

 
2010.11.25 01:35:12.083 PST:   Time taken to execute is 1.508 seconds 
2010.11.25 01:35:12.083 PST: [INFO] event.pegasus.planner planner.version 3.0.0  - FINISHED 

You can increase the debug level to see how pegasus deletes the jobs bottom up of the workflow. Pass -vvvv to pegasus-plan command.

1.5.3. Hierarchal Workflows

Pegasus 3.0 allows you to create workflows of workflows i.e your workflow can contain dax jobs that refer to the sub-workflows. In this exercise, we will execute a workflow super-diamond that will execute two diamond workflows.

Let us look at superdiamond.dax in the dax directory

$ cat $HOME/pegasus-wms/dax/superdiamond.dax

<?xml version="1.0" encoding="UTF-8"?>
<!-- generated on: 2010-11-25T08:42:30-08:00 -->
<!-- generated by: pegasus [ ?? ] -->
<adag xmlns="http://pegasus.isi.edu/schema/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.2.xsd" versi
on="3.2" name="superdiamond" index="0" count="1">

<!-- Section 1: Files - Acts as a Replica Catalog (can be empty) -->

   <file name="f.a">
      <pfn url="file:///scratch/tutorial/inputdata/diamond/f.a" site="local"/>
   </file>
   
   <file name="black-1.dax">
      <pfn url="/home/tutorial/pegasus-wms/dax/black-1.dax" site="local"/>
   </file>

   <file name="black-2.dax">
      <pfn url="/home/tutorial/pegasus-wms/dax/black-2.dax" site="local"/>
   </file>


<!-- Section 2: Executables - Acts as a Transformaton Catalog (can be empty) -->


<!-- Section 3: Transformations - Aggregates executables and Files (can be empty) -->


<!-- Section 4: Job's, DAX's or Dag's - Defines a JOB or DAX or DAG (Atleast 1 required) -->

   <dax id="d1" file="black-1.dax"  >
    <argument>-s local --force -o local</argument>
   </dax>

   <dax id="d2" file="black-2.dax"  >
    <argument>-s local --force -o local</argument>
   </dax>



<!-- Section 5: Dependencies - Parent Child relationships (can be empty) -->

   <child ref="d2">
      <parent ref="d1"/>
   </child>

</adag>

Now let us submit this super diamond workflow

$ pegasus-plan --dax `pwd`/dax/superdiamond.dax --force --submit\
               --dir dags -s local -o local --nocleanup -v

2010.11.29 21:15:49.110 PST: [INFO] event.pegasus.refinement dax.id superdiamond_0  - STARTED 
2010.11.29 21:15:49.123 PST: [INFO] event.pegasus.siteselection dax.id superdiamond_0  - STARTED 
2010.11.29 21:15:49.142 PST: [INFO] event.pegasus.siteselection dax.id superdiamond_0  - FINISHED 
2010.11.29 21:15:49.220 PST: [INFO]  Grafting transfer nodes in the workflow 
2010.11.29 21:15:49.221 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id superdiamond_0  - STARTED 
2010.11.29 21:15:49.305 PST: [INFO] event.pegasus.generate.transfer-nodes dax.id superdiamond_0  - FINISHED 
2010.11.29 21:15:49.307 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id superdiamond_0  - STARTED 
2010.11.29 21:15:49.312 PST: [INFO] event.pegasus.generate.workdir-nodes dax.id superdiamond_0  - FINISHED 
2010.11.29 21:15:49.312 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id superdiamond_0  - STARTED 
2010.11.29 21:15:49.314 PST: [INFO] event.pegasus.generate.cleanup-wf dax.id superdiamond_0  - FINISHED 
2010.11.29 21:15:49.314 PST: [INFO] event.pegasus.refinement dax.id superdiamond_0  - FINISHED 
2010.11.29 21:15:49.371 PST: [INFO]  Generating codes for the concrete workflow 
2010.11.29 21:15:50.200 PST: [INFO]  Generating codes for the concrete workflow -DONE 
2010.11.29 21:15:50.200 PST: [INFO]  Generating code for the cleanup workflow 
2010.11.29 21:15:50.323 PST: [INFO]  Generating code for the cleanup workflow -DONE 
2010.11.29 21:15:50.496 PST:    
2010.11.29 21:15:50.502 PST:   ----------------------------------------------------------------------- 
2010.11.29 21:15:50.508 PST:   File for submitting this DAG to Condor           : superdiamond-0.dag.condor.sub 
2010.11.29 21:15:50.514 PST:   Log of DAGMan debugging messages                 : superdiamond-0.dag.dagman.out 
2010.11.29 21:15:50.521 PST:   Log of Condor library output                     : superdiamond-0.dag.lib.out 
2010.11.29 21:15:50.528 PST:   Log of Condor library error messages             : superdiamond-0.dag.lib.err 
2010.11.29 21:15:50.559 PST:   Log of the life of condor_dagman itself          : superdiamond-0.dag.dagman.log 
2010.11.29 21:15:50.578 PST:    
2010.11.29 21:15:50.588 PST:   -no_submit given, not submitting DAG to Condor.  You can do this with: 
2010.11.29 21:15:50.601 PST:   "condor_submit superdiamond-0.dag.condor.sub" 
2010.11.29 21:15:50.618 PST:   ----------------------------------------------------------------------- 
2010.11.29 21:15:50.625 PST:   Submitting job(s). 
2010.11.29 21:15:50.637 PST:   Logging submit event(s). 
2010.11.29 21:15:50.642 PST:   1 job(s) submitted to cluster 1. 
2010.11.29 21:15:51.179 PST:    
2010.11.29 21:15:51.185 PST:   Your Workflow has been started and runs in base directory given below 
2010.11.29 21:15:51.191 PST:    
2010.11.29 21:15:51.197 PST:   cd /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001 
2010.11.29 21:15:51.208 PST:    
2010.11.29 21:15:51.214 PST:   *** To monitor the workflow you can run *** 
2010.11.29 21:15:51.220 PST:    
2010.11.29 21:15:51.227 PST:   pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001 
2010.11.29 21:15:51.234 PST:    
2010.11.29 21:15:51.240 PST:   *** To remove your workflow run *** 
2010.11.29 21:15:51.245 PST:   pegasus-remove -d 1.0 
2010.11.29 21:15:51.253 PST:   or 
2010.11.29 21:15:51.261 PST:   pegasus-remove /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001 
2010.11.29 21:15:51.268 PST:    
2010.11.29 21:15:51.277 PST:   Time taken to execute is 2.745 seconds 
2010.11.29 21:15:51.277 PST: [INFO] event.pegasus.planner planner.version 3.0.0  - FINISHED 

You can track the workflow using the pegasus-status command

$ watch  pegasus-status -l /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001 


After the workflow has completed you will see the black-1-f.d and black-2-f.d in the storage directory

$ ls -lh /home/tutorial/local-storage/storage/black-*


-rw-r--r-- 1 pegasus pegasus 3.6K Nov 29 21:36 /home/tutorial/local-storage/storage/black-1-f.d
-rw-r--r-- 1 pegasus pegasus 3.6K Nov 29 21:41 /home/tutorial/local-storage/storage/black-2-f.d

1.5.3.1. Directory Structure For the Hierarchal Workflows

Pegasus ensures that each of the workflows have their own submit directory and execution directories.

The table below lists the submit directories for all the workflows in this exercise

Table 1.4. Table: Submit Directory Structure for Hierarchal Workflows

superdiamond ( the outer level workflow ) /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001
black-1 ( the first sub workflow ) /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001/black-1_d1
black-2 ( the second sub workflow ) /home/tutorial/pegasus-wms/dags/tutorial/pegasus/superdiamond/run0001/black-2_d2

The table below lists the execution directories ( one per workflow ) in this exercise

Table 1.5. Table: Execution Directory Structure for Hierarchal Workflows

superdiamond ( the outer level workflow ) /home/tutorial/local-scratch/exec/tutorial/pegasus/superdiamond/run0001
black-1 ( the first sub workflow ) /home/tutorial/local-scratch/exec/tutorial/pegasus/superdiamond/run0001/black-1_d1
black-2 ( the second sub workflow ) /home/tutorial/local-scratch/exec/tutorial/pegasus/superdiamond/run0001/black-2_d2