11.4. Hierarchical Workflows

11.4.1. Introduction

The Abstract Workflow in addition to containing compute jobs, can also contain jobs that refer to other workflows. This is useful for running large workflows or ensembles of workflows.

Users can embed two types of workflow jobs in the DAX

  1. daxjob - refers to a sub workflow represented as a DAX. During the planning of a workflow, the DAX jobs are mapped to condor dagman jobs that have pegasus plan invocation on the dax ( referred to in the DAX job ) as the prescript.

    Figure 11.6. Planning of a DAX Job

    Planning of a DAX Job

  2. dagjob - refers to a sub workflow represented as a DAG. During the planning of a workflow, the DAG jobs are mapped to condor dagman and refer to the DAG file mentioned in the DAG job.

    Figure 11.7. Planning of a DAG Job

    Planning of a DAG Job

11.4.2. Specifying a DAX Job in the DAX

Specifying a DAXJob in a DAX is pretty similar to how normal compute jobs are specified. There are minor differences in terms of the xml element name ( dax vs job ) and the attributes specified. DAXJob XML specification is described in detail in the chapter on DAX API . An example DAX Job in a DAX is shown below

  <dax id="ID000002" name="black.dax" node-label="bar" >
    <profile namespace="dagman" key="maxjobs">10</profile>
    <argument>-Xmx1024 -Xms512 -Dpegasus.dir.storage=storagedir  -Dpegasus.dir.exec=execdir -o local -vvvvv --force -s dax_site </argument>
  </dax>

11.4.2.1. DAX File Locations

The name attribute in the dax element refers to the LFN ( Logical File Name ) of the dax file. The location of the DAX file can be catalogued either in the

  1. Replica Catalog

  2. Replica Catalog Section in the DAX .

    Note

    Currently, only file url's on the local site ( submit host ) can be specified as DAX file locations.

11.4.2.2. Arguments for a DAX Job

Users can specify specific arguments to the DAX Jobs. The arguments specified for the DAX Jobs are passed to the pegasus-plan invocation in the prescript for the corresponding condor dagman job in the executable workflow.

The following options for pegasus-plan are inherited from the pegasus-plan invocation of the parent workflow. If an option is specified in the arguments section for the DAX Job then that overrides what is inherited.

Table 11.2. Options inherited from parent workflow

Option Name Description
--sites list of execution sites.

It is highly recommended that users don't specify directory related options in the arguments section for the DAX Jobs. Pegasus assigns values to these options for the sub workflows automatically.

  1. --relative-dir

  2. --dir

  3. --relative-submit-dir

11.4.2.3. Profiles for DAX Job

Users can choose to specify dagman profiles with the DAX Job to control the behavior of the corresponding condor dagman instance in the executable workflow. In the example above maxjobs is set to 10 for the sub workflow.

11.4.2.4. Execution of the PRE script and Condor DAGMan instance

The pegasus plan that is invoked as part of the prescript to the condor dagman job is executed on the submit host. The log from the output of pegasus plan is redirected to a file ( ending with suffix pre.log ) in the submit directory of the workflow that contains the DAX Job. The path to pegasus-plan is automatically determined.

The DAX Job maps to a Condor DAGMan job. The path to condor dagman binary is determined according to the following rules -

  1. entry in the transformation catalog for condor::dagman for site local, else

  2. pick up the value of CONDOR_HOME from the environment if specified and set path to condor dagman as $CONDOR_HOME/bin/condor_dagman , else

  3. pick up the value of CONDOR_LOCATION from the environment if specified and set path to condor dagman as $CONDOR_LOCATION/bin/condor_dagman , else

  4. pick up the path to condor dagman from what is defined in the user's PATH

Tip

It is recommended that users specify dagman.maxpre in their properties file to control the maximum number of pegasus plan instances launched by each running dagman instance.

11.4.3. Specifying a DAG Job in the DAX

Specifying a DAGJob in a DAX is pretty similar to how normal compute jobs are specified. There are minor differences in terms of the xml element name ( dag vs job ) and the attributes specified. For DAGJob XML details,see the API Reference chapter . An example DAG Job in a DAX is shown below

  <dag id="ID000003" name="black.dag" node-label="foo" >
    <profile namespace="dagman" key="maxjobs">10</profile>
    <profile namespace="dagman" key="DIR">/dag-dir/test</profile>
  </dag>

11.4.3.1. DAG File Locations

The name attribute in the dag element refers to the LFN ( Logical File Name ) of the dax file. The location of the DAX file can be catalogued either in the

  1. Replica Catalog

  2. Replica Catalog Section in the DAX.

    Note

    Currently, only file url's on the local site ( submit host ) can be specified as DAG file locations.

11.4.3.2. Profiles for DAG Job

Users can choose to specify dagman profiles with the DAX Job to control the behavior of the corresponding condor dagman instance in the executable workflow. In the example above, maxjobs is set to 10 for the sub workflow.

The dagman profile DIR allows users to specify the directory in which they want the condor dagman instance to execute. In the example above black.dag is set to be executed in directory /dag-dir/test . The /dag-dir/test should be created beforehand.

11.4.4. File Dependencies Across DAX Jobs

In hierarchal workflows , if a sub workflow generates some output files required by another sub workflow then there should be an edge connecting the two dax jobs. Pegasus will ensure that the prescript for the child sub-workflow, has the path to the cache file generated during the planning of the parent sub workflow. The cache file in the submit directory for a workflow is a textual replica catalog that lists the locations of all the output files created in the remote workflow execution directory when the workflow executes.

This automatic passing of the cache file to a child sub-workflow ensures that the datasets from the same workflow run are used. However, the passing the locations in a cache file also ensures that Pegasus will prefer them over all other locations in the Replica Catalog. If you need the Replica Selection to consider locations in the Replica Catalog also, then set the following property.

pegasus.catalog.replica.cache.asrc  true

The above is useful in the case, where you are staging out the output files to a storage site, and you want the child sub workflow to stage these files from the storage output site instead of the workflow execution directory where the files were originally created.

11.4.5. Recursion in Hierarchal Workflows

It is possible for a user to add a dax jobs to a dax that already contain dax jobs in them. Pegasus does not place a limit on how many levels of recursion a user can have in their workflows. From Pegasus perspective recursion in hierarchal workflows ends when a DAX with only compute jobs is encountered . However, the levels of recursion are limited by the system resources consumed by the DAGMan processes that are running (each level of nesting produces another DAGMan process) .

The figure below illustrates an example with recursion 2 levels deep.

Figure 11.8. Recursion in Hierarchal Workflows

Recursion in Hierarchal Workflows

The execution time-line of the various jobs in the above figure is illustrated below.

Figure 11.9. Execution Time-line for Hierarchal Workflows

Execution Time-line for Hierarchal Workflows

11.4.6. Example

The Galactic Plane workflow is a Hierarchical workflow of many Montage workflows. For details, see Workflow of Workflows.