5.3. Data Staging Configuration

Pegasus can be broadly setup to run workflows in the following configurations

  • Shared File System

    This setup applies to where the head node and the worker nodes of a cluster share a filesystem. Compute jobs in the workflow run in a directory on the shared filesystem.

  • NonShared FileSystem

    This setup applies to where the head node and the worker nodes of a cluster don't share a filesystem. Compute jobs in the workflow run in a local directory on the worker node

  • Condor Pool Without a shared filesystem

    This setup applies to a condor pool where the worker nodes making up a condor pool don't share a filesystem. All data IO is achieved using Condor File IO. This is a special case of the non shared filesystem setup, where instead of using pegasus-transfer to transfer input and output data, Condor File IO is used.

For the purposes of data configuration various sites, and directories are defined below.

  1. Submit Host

    The host from where the workflows are submitted . This is where Pegasus and Condor DAGMan are installed. This is referred to as the "local" site in the site catalog .

  2. Compute Site

    The site where the jobs mentioned in the DAX are executed. There needs to be an entry in the Site Catalog for every compute site. The compute site is passed to pegasus-plan using --sites option

  3. Staging Site

    A site to which the separate transfer jobs in the executable workflow ( jobs with stage_in , stage_out and stage_inter prefixes that Pegasus adds using the transfer refiners) stage the input data to and the output data from to transfer to the final output site. Currently, the staging site is always the compute site where the jobs execute.

  4. Output Site

    The output site is the final storage site where the users want the output data from jobs to go to. The output site is passed to pegasus-plan using the --output option. The stageout jobs in the workflow stage the data from the staging site to the final storage site.

  5. Input Site

    The site where the input data is stored. The locations of the input data are catalogued in the Replica Catalog, and the pool attribute of the locations gives us the site handle for the input site.

  6. Workflow Execution Directory

    This is the directory created by the create dir jobs in the executable workflow on the Staging Site. This is a directory per workflow per staging site. Currently, the Staging site is always the Compute Site.

  7. Worker Node Directory

    This is the directory created on the worker nodes per job usually by the job wrapper that launches the job.

You can specifiy the data configuration to use either in

  1. properties - Specify the global property pegasus.data.configuration .

  2. site catalog - Starting 4.5.0 release, you can specify pegasus profile key named data.configuration and associate that with your compute sites in the site catalog.

5.3.1. Shared File System

By default Pegasus is setup to run workflows in the shared file system setup, where the worker nodes and the head node of a cluster share a filesystem.

Figure 5.8. Shared File System Setup

Shared File System Setup

The data flow is as follows in this case

  1. Stagein Job executes ( either on Submit Host or Head Node ) to stage in input data from Input Sites ( 1---n) to a workflow specific execution directory on the shared filesystem.

  2. Compute Job starts on a worker node in the workflow execution directory. Accesses the input data using Posix IO

  3. Compute Job executes on the worker node and writes out output data to workflow execution directory using Posix IO

  4. Stageout Job executes ( either on Submit Host or Head Node ) to stage out output data from the workflow specific execution directory to a directory on the final output site.

Tip

Set pegasus.data.configuration to sharedfs to run in this configuration.

5.3.2. Non Shared Filesystem

In this setup , Pegasus runs workflows on local file-systems of worker nodes with the the worker nodes not sharing a filesystem. The data transfers happen between the worker node and a staging / data coordination site. The staging site server can be a file server on the head node of a cluster or can be on a separate machine.

Setup

  • compute and staging site are the different

  • head node and worker nodes of compute site don't share a filesystem

  • Input Data is staged from remote sites.

  • Remote Output Site i.e site other than compute site. Can be submit host.

Figure 5.9. Non Shared Filesystem Setup

Non Shared Filesystem Setup

The data flow is as follows in this case

  1. Stagein Job executes ( either on Submit Host or on staging site ) to stage in input data from Input Sites ( 1---n) to a workflow specific execution directory on the staging site.

  2. Compute Job starts on a worker node in a local execution directory. Accesses the input data using pegasus transfer to transfer the data from the staging site to a local directory on the worker node

  3. The compute job executes in the worker node, and executes on the worker node.

  4. The compute Job writes out output data to the local directory on the worker node using Posix IO

  5. Output Data is pushed out to the staging site from the worker node using pegasus-transfer.

  6. Stageout Job executes ( either on Submit Host or staging site ) to stage out output data from the workflow specific execution directory to a directory on the final output site.

In this case, the compute jobs are wrapped as PegasusLite instances.

This mode is especially useful for running in the cloud environments where you don't want to setup a shared filesystem between the worker nodes. Running in that mode is explained in detail here.

Tip

Set pegasus.data.configuration to nonsharedfs to run in this configuration. The staging site can be specified using the --staging-site option to pegasus-plan.

5.3.3. Condor Pool Without a Shared Filesystem

This setup applies to a condor pool where the worker nodes making up a condor pool don't share a filesystem. All data IO is achieved using Condor File IO. This is a special case of the non shared filesystem setup, where instead of using pegasus-transfer to transfer input and output data, Condor File IO is used.

Setup

  • Submit Host and staging site are same

  • head node and worker nodes of compute site don't share a filesystem

  • Input Data is staged from remote sites.

  • Remote Output Site i.e site other than compute site. Can be submit host.

Figure 5.10. Condor Pool Without a Shared Filesystem

Condor Pool Without a Shared Filesystem

The data flow is as follows in this case

  1. Stagein Job executes on the submit host to stage in input data from Input Sites ( 1---n) to a workflow specific execution directory on the submit host

  2. Compute Job starts on a worker node in a local execution directory. Before the compute job starts, Condor transfers the input data for the job from the workflow execution directory on the submit host to the local execution directory on the worker node.

  3. The compute job executes in the worker node, and executes on the worker node.

  4. The compute Job writes out output data to the local directory on the worker node using Posix IO

  5. When the compute job finishes, Condor transfers the output data for the job from the local execution directory on the worker node to the workflow execution directory on the submit host.

  6. Stageout Job executes ( either on Submit Host or staging site ) to stage out output data from the workflow specific execution directory to a directory on the final output site.

In this case, the compute jobs are wrapped as PegasusLite instances.

This mode is especially useful for running in the cloud environments where you don't want to setup a shared filesystem between the worker nodes. Running in that mode is explained in detail here.

Tip

Set pegasus.data.configuration to condorio to run in this configuration. In this mode, the staging site is automatically set to site local