The DAG is an executable (concrete) workflow that can be executed over a variety of resources. When the workflow tasks are mapped to multiple resources that do not share a file system, explicit nodes are added to the workflow for orchestrating data. transfer between the tasks.
When you take the DAX workflow created in Creating Workflows, and plan it for a single remote grid execution, here a site with handle hpcc, and plan the workflow without clean-up nodes, the following concrete workflow is built:
Planning augments the original abstract workflow with ancillary tasks to facility the proper execution of the workflow. These tasks include:
the creation of remote working directories. These directories typically have name that seeks to avoid conflicts with other simultaneously running similar workflows. Such tasks use a job prefix of
create_dir.the stage-in of input files before any task which requires these files. Any file consumed by a task needs to be staged to the task, if it does not already exist on that site. Such tasks use a job prefix of
stage_in.If multiple files from various sources need to be transferred, multiple stage-in jobs will be created. Additional advanced options permit to control the size and number of these jobs, and whether multiple compute tasks can share stage-in jobs.the original DAX job is concretized into a compute task in the DAG. Compute jobs are a concatination of the job's name and id attribute from the DAX file.
the stage-out of data products to a collecting site. Data products with their transfer flag set to
falsewill not be staged to the output site. However, they may still be eligible for staging to other, dependent tasks. Stage-out tasks use a job prefix ofstage_out.If compute jobs run at different sites, an intermediary staging task with prefix
stage_interis inserted between the compute jobs in the workflow, ensuring that the data products of the parent are available to the child job.the registration of data products in a replica catalog. Data products with their register flag set to
falsewill not be registered.the clean-up of transient files and working directories. These steps can be omitted with the --no-cleanup option to the planner.
The " Reference Manual" Chapter details more about when and how staging nodes are inserted into the workflow.
The DAG will be found in file diamond-0.dag,
constructed from the name and index attributes found in the root element of the
DAX file.
###################################################################### # PEGASUS WMS GENERATED DAG FILE # DAG diamond # Index = 0, Count = 1 ###################################################################### JOB create_dir_diamond_0_hpcc create_dir_diamond_0_hpcc.sub SCRIPT POST create_dir_diamond_0_hpcc /opt/pegasus/default/bin/pegasus-exitcode create_dir_diamond_0_hpcc.out JOB stage_in_local_hpcc_0 stage_in_local_hpcc_0.sub SCRIPT POST stage_in_local_hpcc_0 /opt/pegasus/default/bin/pegasus-exitcode stage_in_local_hpcc_0.out JOB preprocess_ID000001 preprocess_ID000001.sub SCRIPT POST preprocess_ID000001 /opt/pegasus/default/bin/pegasus-exitcode preprocess_ID000001.out JOB findrange_ID000002 findrange_ID000002.sub SCRIPT POST findrange_ID000002 /opt/pegasus/default/bin/pegasus-exitcode findrange_ID000002.out JOB findrange_ID000003 findrange_ID000003.sub SCRIPT POST findrange_ID000003 /opt/pegasus/default/bin/pegasus-exitcode findrange_ID000003.out JOB analyze_ID000004 analyze_ID000004.sub SCRIPT POST analyze_ID000004 /opt/pegasus/default/bin/pegasus-exitcode analyze_ID000004.out JOB stage_out_local_hpcc_2_0 stage_out_local_hpcc_2_0.sub SCRIPT POST stage_out_local_hpcc_2_0 /opt/pegasus/default/bin/pegasus-exitcode stage_out_local_hpcc_2_0.out PARENT findrange_ID000002 CHILD analyze_ID000004 PARENT findrange_ID000003 CHILD analyze_ID000004 PARENT preprocess_ID000001 CHILD findrange_ID000002 PARENT preprocess_ID000001 CHILD findrange_ID000003 PARENT analyze_ID000004 CHILD stage_out_local_hpcc_2_0 PARENT stage_in_local_hpcc_0 CHILD preprocess_ID000001 PARENT create_dir_diamond_0_hpcc CHILD findrange_ID000002 PARENT create_dir_diamond_0_hpcc CHILD findrange_ID000003 PARENT create_dir_diamond_0_hpcc CHILD preprocess_ID000001 PARENT create_dir_diamond_0_hpcc CHILD analyze_ID000004 PARENT create_dir_diamond_0_hpcc CHILD stage_in_local_hpcc_0 ###################################################################### # End of DAG ######################################################################
The DAG file declares all jobs and links them to a Condor submit file that describes the planned, concrete job. In the same directory as the DAG file are all Condor submit files for the jobs from the picture plus a number of additional helper files.
The various instructions that can be put into a DAG file are described in Condor's DAGMAN documentation.The constituents of the submit directory are described in the "Submit Directory Details"chapter
During the mapping process, the abstract workflow undergoes a series of refinement steps that converts it to an executable form.
The abstract workflow after parsing is optionally handed over to the Data Reuse Module. The Data Reuse Algorithm in Pegasus attempts to prune all the nodes in the abstract workflow for which the output files exist in the Replica Catalog. It also attempts to cascade the deletion to the parents of the deleted node for e.g if the output files for the leaf nodes are specified, Pegasus will prune out all the workflow as the output files in which a user is interested in already exist in the Replica Catalog.
The Data Reuse Algorithm works in two passes
First Pass - Determine all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file existing in the Replica Catalog , if the output file is not an input to any of the children of the job X.
Second Pass - The algorithm removes the job whose output files exist in the Replica Catalog and tries to cascade the deletion upwards to the parent jobs. We start the breadth first traversal of the workflow bottom up.
( It is already marked for deletion in Pass 1
OR
( ALL of it's children have been marked for deletion
AND
Node's output files have transfer flags set to false
)
)
Tip
The Data Reuse Algorithm can be disabled by passing the --force option to pegasus-plan.
The abstract workflow is then handed over to the Site Selector module where the abstract jobs in the pruned workflow are mapped to the various sites passed by a user. The target sites for planning are specified on the command line using the --sites option to pegasus-plan. If not specified, then Pegasus picks up all the sites in the Site Catalog as candidate sites. Pegasus will map a compute job to a site only if Pegasus can
find an INSTALLED executable on the site
-
OR find a STAGEABLE executable that can be staged to the site as part of the workflow execution.
Pegasus supports variety of site selectors with Random being the default
-
Random
The jobs will be randomly distributed among the sites that can execute them.
-
RoundRobin
The jobs will be assigned in a round robin manner amongst the sites that can execute them. Since each site cannot execute every type of job, the round robin scheduling is done per level on a sorted list. The sorting is on the basis of the number of jobs a particular site has been assigned in that level so far. If a job cannot be run on the first site in the queue (due to no matching entry in the transformation catalog for the transformation referred to by the job), it goes to the next one and so on. This implementation defaults to classic round robin in the case where all the jobs in the workflow can run on all the sites.
-
Group
Group of jobs will be assigned to the same site that can execute them. The use of the PEGASUS profile key group in the DAX, associates a job with a particular group. The jobs that do not have the profile key associated with them, will be put in the default group. The jobs in the default group are handed over to the "Random" Site Selector for scheduling.
-
Heft
A version of the HEFT processor scheduling algorithm is used to schedule jobs in the workflow to multiple grid sites. The implementation assumes default data communication costs when jobs are not scheduled on to the same site. Later on this may be made more configurable.
The runtime for the jobs is specified in the transformation catalog by associating the pegasus profile key runtime with the entries.
The number of processors in a site is picked up from the attribute idle-nodes associated with the vanilla jobmanager of the site in the site catalog.
-
NonJavaCallout
Pegasus will callout to an external site selector.In this mode a temporary file is prepared containing the job information that is passed to the site selector as an argument while invoking it. The path to the site selector is specified by setting the property pegasus.site.selector.path. The environment variables that need to be set to run the site selector can be specified using the properties with a pegasus.site.selector.env. prefix. The temporary file contains information about the job that needs to be scheduled. It contains key value pairs with each key value pair being on a new line and separated by a =.
The following pairs are currently generated for the site selector temporary file that is generated in the NonJavaCallout.
Table 5.1. Table 1: Key Value Pairs that are currently generated for the site selector temporary file that is generated in the NonJavaCallout.
Key Value version is the version of the site selector api,currently 2.0. transformation is the fully-qualified definition identifier for the transformation (TR) namespace::name:version. derivation is the fully qualified definition identifier for the derivation (DV), namespace::name:version. job.level is the job's depth in the tree of the workflow DAG. job.id is the job's ID, as used in the DAX file. resource.id is a pool handle, followed by whitespace, followed by a gridftp server. Typically, each gridftp server is enumerated once, so you may have multiple occurances of the same site. There can be multiple occurances of this key. input.lfn is an input LFN, optionally followed by a whitespace and file size. There can be multiple occurances of this key,one for each input LFN required by the job. wf.name label of the dax, as found in the DAX's root element. wf.index is the DAX index, that is incremented for each partition in case of deferred planning. wf.time is the mtime of the workflow. wf.manager is the name of the workflow manager being used .e.g condor vo.name is the name of the virtual organization that is running this workflow. It is currently set to NONE vo.group unused at present and is set to NONE.
-
Tip
The site selector to use for site selection can be specified by setting the property pegasus.selector.site
After site selection, the workflow is optionally handed for to the job clustering module, which clusters jobs that are scheduled to the same site. Clustering is usually done on short running jobs in order to reduce the remote execution overheads associated with a job. Clustering is described in detail in the Reference Manualchapter.
Tip
The job clustering is turned on by passing the --cluster option to pegasus-plan.
After job clustering, the workflow is handed to the Data Transfer module that adds data stage-in , inter site and stage-out nodes to the workflow. Data Stage-in Nodes transfer input data required by the workflow from the locations specified in the Replica Catalog to a directory on the execution site where the job executes. In case, multiple locations are specified for the same input file, the location from where to stage the data is selected using a Replica Selector . Replica Selection is described in detail in the Replica Selection section of the Reference Manual.
The process of adding the data stage-in and data stage-out nodes is handled by Transfer Refiners. All data transfer jobs in Pegasus are executed using pegasus-transfer . The pegasus-transfer client is a python based wrapper around various transfer clients like globus-url-copy, lcg-copy, wget, cp, ln . It looks at source and destination url and figures out automatically which underlying client to use. pegasus-transfer is distributed with the PEGASUS and can be found in the bin subdirectory . Pegasus Transfer Refiners are are described in the detail in the Transfers section of the Reference Manual. The default transfer refiner that is used in Pegasus is the Bundle Transfer Refiner, that bundles data stage-in nodes and data stage-out nodes on the basis of certain pegasus profile keys associated with the workflow.
Data Registration Nodes may also be added to the final executable workflow to register the location of the output files on the final output site back in the Replica Catalog . An output file is registered in the Replica Catalog if the register flag for the file is set to true in the DAX.
The data staged-in and staged-out from a directory that is created on the head node by a create dir job in the workflow. In the vanilla case, the directory is visible to all the worker nodes and compute jobs are launched in this directory on the shared filesystem. In the case where there is no shared filesystem, users can turn on worker node execution, where the data is staged from the head node directory to a directory on the worker node filesystem. This feature will be refined further for Pegasus 3.1. To use it with Pegasus 3.0 send email to pegasus-support at isi.edu.
Tip
The replica selector to use for replica selection can be specified by setting the property pegasus.selector.replica
After the data transfer nodes have been added to the workflow, Pegasus adds a create dir jobs to the workflow. Pegasus usually , creates one workflow specific directory per compute site , that is on the shared filesystem of compute site. This directory is visible to all the worker nodes and that is where the data is staged-in by the data stage-in jobs.
After addition of the create dir jobs, the workflow is optionally handed to the cleanup module. The cleanup module adds cleanup nodes to the workflow that remove data from the directory on the shared filesystem when it is no longer required by the workflow. This is useful in reducing the peak storage requirements of the workflow.
Tip
The addition of the cleanup nodes to the workflow can be disabled by passing the --nocleanup option to pegasus-plan.
The last step of refinement process, is the code generation where Pegasus writes out the executable workflow in a form understandable by the underlying workflow executor. At present Pegasus supports the following code generators
-
Condor
This is the default code generator for Pegasus . This generator generates the executable workflow as a Condor DAG file and associated job submit files. The Condor DAG file is passed as input to Condor DAGMan for job execution.
-
Shell
This Code Generator generates the executable workflow as a shell script that can be executed on the submit host. While using this code generator, all the jobs should be mapped to site local i.e specify --sites local to pegasus-plan.
Tip
To use the Shell code Generator set the property pegasus.code.generator Shell
pegasus-plan is the main executable that takes in the abstract workflow ( DAX ) and generates an executable workflow ( usually a Condor DAG ) by querying various catalogs and performing several refinement steps. Before users can run pegasus plan the following needs to be done:
-
Populate the various catalogs
-
Replica Catalog
The Replica Catalog needs to be catalogued with the locations of the input files required by the workflows. This can be done by using pegasus-rc-client (See the Replica section of Creating Workflows).
-
Transformation Catalog
The Transformation Catalog needs to be catalogued with the locations of the executables that the workflows will use. This can be done by using pegasus-tc-client (See the Transformation section of Creating Workflows).
-
Site Catalog
The Site Catalog needs to be catalogued with the site layout of the various sites that the workflows can execute on. A site catalog can be generated for OSG by using the client pegasus-sc-client (See the Site section of the Creating Workflows).
-
-
Configure Properties
After the catalogs have been configured, the user properties file need to be updated with the types and locations of the catalogs to use. These properties are described in the basic.properties files in the etc sub directory (see the Properties section of theReference chapter.
The basic properties that need to be set usually are listed below:
Table 5.2. Table2: Basic Properties that need to be set
pegasus.catalog.replica pegasus.catalog.replica.file | pegasus.catalog.replica.url pegasus.catalog.transformation pegasus.catalog.transformation.file pegasus.catalog.site pegasus.catalog.site.file
To execute pegasus-plan user usually requires to specify the following options:
--dax the path to the DAX file that needs to be mapped.
--dir the base directory where the executable workflow is generated
--sites comma separated list of execution sites.
--output the output site where to transfer the materialized output files.
--submit boolean value whether to submit the planned workflow for execution after planning is done.
This is the reference guide to the basic properties regarding the Pegasus Workflow Planner, and their respective default values. Please refer to the advanced properties guide to know about all the properties that a user can use to configure the Pegasus Workflow Planner. Please note that the values rely on proper capitalization, unless explicitly noted otherwise.
Some properties rely with their default on the value of other properties. As a notation, the curly braces refer to the value of the named property. For instance, ${pegasus.home} means that the value depends on the value of the pegasus.home property plus any noted additions. You can use this notation to refer to other properties, though the extent of the subsitutions are limited. Usually, you want to refer to a set of the standard system properties. Nesting is not allowed. Substitutions will only be done once.
There is a priority to the order of reading and evaluating properties. Usually one does not need to worry about the priorities. However, it is good to know the details of when which property applies, and how one property is able to overwrite another. The following is a mutually exclusive list ( highest priority first ) of property file locations.
- --conf option to the tools. Almost all of the clients that use properties have a --conf option to specify the property file to pick up.
- submit-dir/pegasus.xxxxxxx.properties file. All tools that work on the submit directory ( i.e after pegasus has planned a workflow) pick up the pegasus.xxxxx.properties file from the submit directory. The location for the pegasus.xxxxxxx.propertiesis picked up from the braindump file.
- The properties defined in the user property file ${user.home}/.pegasusrc have lowest priority.
Commandline properties have the highest priority. These override any property loaded from a property file. Each commandline property is introduced by a -D argument. Note that these arguments are parsed by the shell wrapper, and thus the -D arguments must be the first arguments to any command. Commandline properties are useful for debugging purposes.
From Pegasus 3.1 release onwards, support has been dropped for the following properties that were used to signify the location of the properties file
- pegasus.properties
- pegasus.user.properties
The following example provides a sensible set of properties to be set by the user property file. These properties use mostly non-default settings. It is an example only, and will not work for you:
pegasus.catalog.replica File
pegasus.catalog.replica.file ${pegasus.home}/etc/sample.rc.data
pegasus.catalog.transformation Text
pegasus.catalog.transformation.file ${pegasus.home}/etc/sample.tc.text
pegasus.catalog.site XML3
pegasus.catalog.site.file ${pegasus.home}/etc/sample.sites.xml3
If you are in doubt which properties are actually visible, pegasus during the planning of the workflow dumps all properties after reading and prioritizing in the submit directory in a file with the suffix properties.
| Systems: | all |
| Type: | directory location string |
| Default: | "$PEGASUS_HOME" |
The property pegasus.home cannot be set in the property file. This property is automatically set up by the pegasus clients internally by determining the installation directory of pegasus. Knowledge about this property is important for developers who want to invoke PEGASUS JAVA classes without the shell wrappers.
| System: | Pegasus |
| Since: | 2.0 |
| Type: | enumeration |
| Value[0]: | RLS |
| Value[1]: | LRC |
| Value[2]: | JDBCRC |
| Value[3]: | File |
| Value[4]: | MRC |
| Default: | RLS |
Pegasus queries a Replica Catalog to discover the physical filenames (PFN) for input files specified in the DAX. Pegasus can interface with various types of Replica Catalogs. This property specifies which type of Replica Catalog to use during the planning process.
- RLS
- RLS (Replica Location Service) is a distributed replica catalog, which ships with GT4. There is an index service called Replica Location Index (RLI) to which 1 or more Local Replica Catalog (LRC) report. Each LRC can contain all or a subset of mappings. In this mode, Pegasus queries the central RLI to discover in which LRC's the mappings for a LFN reside. It then queries the individual LRC's for the PFN's. To use RLS, the user additionally needs to set the property pegasus.catalog.replica.url to specify the URL for the RLI to query. Details about RLS can be found at http://www.globus.org/toolkit/data/rls/
- LRC
- If the user does not want to query the RLI, but directly a single Local Replica Catalog. To use LRC, the user additionally needs to set the property pegasus.catalog.replica.url to specify the URL for the LRC to query. Details about RLS can be found at http://www.globus.org/toolkit/data/rls/
- JDBCRC
-
In this mode, Pegasus queries a SQL based replica catalog that
is accessed via JDBC. The sql schema's for this catalog can be
found at $PEGASUS_HOME/sql directory.
To use JDBCRC, the user additionally needs to set the following
properties
- pegasus.catalog.replica.db.url
- pegasus.catalog.replica.db.user
- pegasus.catalog.replica.db.password
- File
-
In this mode, Pegasus queries a file based replica catalog. It is neither transactionally safe, nor advised to use for production purposes in any way. Multiple concurrent access to the File will end up clobbering the contents of the file. The site attribute should be specified whenever possible. The attribute key for the site attribute is "pool".
The LFN may or may not be quoted. If it contains linear whitespace, quotes, backslash or an equality sign, it must be quoted and escaped. Ditto for the PFN. The attribute key-value pairs are separated by an equality sign without any whitespaces. The value may be in quoted. The LFN sentiments about quoting apply.
LFN PFN LFN PFN a=b [..] LFN PFN a="b" [..] "LFN w/LWS" "PFN w/LWS" [..]
To use File, the user additionally needs to specify pegasus.catalog.replica.file property to specify the path to the file based RC.
- MRC
-
In this mode, Pegasus queries multiple replica catalogs to discover the file locations on the grid. To use it set
pegasus.catalog.replica MRC
Each associated replica catalog can be configured via properties as follows.
The user associates a variable name referred to as [value] for each of the catalogs, where [value] is any legal identifier (concretely [A-Za-z][_A-Za-z0-9]*) For each associated replica catalogs the user specifies the following properties.
pegasus.catalog.replica.mrc.[value] specifies the type of replica catalog. pegasus.catalog.replica.mrc.[value].key specifies a property name key for a particular catalog
For example, if a user wants to query two lrc's at the same time he/she can specify as follows
pegasus.catalog.replica.mrc.lrc1 LRC pegasus.catalog.replica.mrc.lrc2.url rls://sukhna pegasus.catalog.replica.mrc.lrc2 LRC pegasus.catalog.replica.mrc.lrc2.url rls://smarty
In the above example, lrc1, lrc2 are any valid identifier names and url is the property key that needed to be specified.
| System: | Site Catalog |
| Since: | 2.0 |
| Type: | enumeration |
| Value[0]: | XML3 |
| Value[1]: | XML |
| Default: | XML3 |
The site catalog file is available in three major flavors: The Text and and XML formats for the site catalog are deprecated. Users can use pegasus-sc-converter client to convert their site catalog to the newer XML3 format.
- THIS FORMAT IS DEPRECATED. WILL BE REMOVED IN COMING VERSIONS. USE pegasus-sc-converter to convert XML format to XML3 Format. The "XML" format is an XML-based file. The XML format reads site catalog conforming to the old site catalog schema available at http://pegasus.isi.edu/wms/docs/schemas/sc-2.0/sc-2.0.xsd
- The "XML3" format is an XML-based file. The XML format reads site catalog conforming to the old site catalog schema available at http://pegasus.isi.edu/wms/docs/schemas/sc-3.0/sc-3.0.xsd
| System: | Site Catalog |
| Since: | 2.0 |
| Type: | file location string |
| Default: | ${pegasus.home.sysconfdir}/sites.xml3 |${pegasus.home.sysconfdir}/sites.xml |
| See also: | pegasus.catalog.site |
Running things on the grid requires an extensive description of the capabilities of each compute cluster, commonly termed "site". This property describes the location of the file that contains such a site description. As the format is currently in flow, please refer to the userguide and Pegasus for details which format is expected. The default value is dependant on the value specified for the property pegasus.catalog.site . If type of SiteCatalog used is XML3, then sites.xml3 is picked up from sysconfdir else sites.xml
| System: | Transformation Catalog |
| Since: | 2.0 |
| Type: | enumeration |
| Value[0]: | Text |
| Value[1]: | File |
| Default: | Text |
| See also: | pegasus.catalog.transformation.file |
- Text
-
In this mode, a multiline file based format is understood. The file is read and cached in memory. Any modifications, as adding or deleting, causes an update of the memory and hence to the file underneath. All queries are done against the memory representation.
The file sample.tc.text in the etc directory contains an example
Here is a sample textual format for transfomation catalog containing one transformation on two sites
tr example::keg:1.0 { #specify profiles that apply for all the sites for the transformation #in each site entry the profile can be overriden profile env "APP_HOME" "/tmp/karan" profile env "JAVA_HOME" "/bin/app" site isi { profile env "me" "with" profile condor "more" "test" profile env "JAVA_HOME" "/bin/java.1.6" pfn "/path/to/keg" arch "x86" os "linux" osrelease "fc" osversion "4" type "INSTALLED" site wind { profile env "me" "with" profile condor "more" "test" pfn "/path/to/keg" arch "x86" os "linux" osrelease "fc" osversion "4" type "STAGEABLE" - File
- THIS FORMAT IS DEPRECATED. WILL BE REMOVED IN COMING VERSIONS.
USE pegasus-tc-converter to convert File format to Text Format.
In this mode, a file format is understood. The file is
read and cached in memory. Any modifications, as adding or
deleting, causes an update of the memory and hence to the file
underneath. All queries are done against the memory
representation. The new TC file format uses 6 columns:
- The resource ID is represented in the first column.
- The logical transformation uses the colonized format ns::name:vs.
- The path to the application on the system
- The installation type is identified by one of the following keywords - all upper case: INSTALLED, STAGEABLE. If not specified, or NULL is used, the type defaults to INSTALLED.
- The system is of the format ARCH::OS[:VER:GLIBC]. The following arch types are understood: "INTEL32", "INTEL64", "SPARCV7", "SPARCV9". The following os types are understood: "LINUX", "SUNOS", "AIX". If unset or NULL, defaults to INTEL32::LINUX.
- Profiles are written in the format NS::KEY=VALUE,KEY2=VALUE;NS2::KEY3=VALUE3 Multiple key-values for same namespace are seperated by a comma "," and multiple namespaces are seperated by a semicolon ";". If any of your profile values contains a comma you must not use the namespace abbreviator.
| Systems: | Transformation Catalog |
| Type: | file location string |
| Default: | ${pegasus.home.sysconfdir}/tc.text | ${pegasus.home.sysconfdir}/tc.data |
| See also: | pegasus.catalog.transformation |
This property is used to set the path to the textual transformation catalogs of type File or Text. If the transformation catalog is of type Text then tc.text file is picked up from sysconfdir, else tc.data
Pegasus supports a number of execution environments. An execution environment is a setup where jobs from a workflow are running.
The simplest execution environment does not involve Condor. Pegasus is capable of planning small workflows for full-local execution using a shell planner. Please refer to the
examplesdirectory in your Pegasus installation, the shell planner's documentation section, or the tutorials, for details.A slighly more challenging setup is still all-local, but Condor manages queued jobs and Condor DAGMan the workflow. This setup permits limited parallelism with full scalability. With proper setup, Condor can scavenge cycles from unused computers in your department, enabling a pool of more than just a single machine.
The vanilla setup are workflows to be executed in a grid environment. This setup requires a number of configurations, and an understanding of remote sites.
Various advanced execution environment setups deal with minimally using Globus to obtain remote resources (Glide-ins), by-passing Globus completely (Condor-C), or supporting cloud computing (Nimbus, Eucalyptus, EC2). You should be familiar with the Grid Execution Environment, as some concept are borrowed.
The rest of this chapter will focus on the vanilla grid execution environment.
The vanilla grid environment shown in the figure above. We will work from the left to the right top, then the right bottom.
On the left side, you have a submit machine where Pegasus runs, Condor schedules jobs, and workflows are executed. We call it the submit host (SH), though its functionality can be assumed by a virtual machine image. In order to properly communicate over secured channels, it is important that the submit machine has a proper notion of time, i.e. runs an NTP daemon to keep accurate time. To be able to connect to remote clusters and receive connections from the remote clusters, the submit host has a public IP address to facilitate this communication.
In order to send a job request to the remote cluster, Condor wraps the job into Globus calls via Condor-G. Globus uses GRAM to manage jobs on remote sites. In terms of a software stack, Pegasus wraps the job into Condor. Condor wraps the job into Globus. Globus transports the job to the remote site, and unwraps the Globus component, sending it to the remote site's resource manager (RM).
To be able to communicate using the Globus security infrastructure (GSI), the submit machine needs to have the certificate authority (CA) certificates configured, requires a host certificate in certain circumstances, and the user a user certificate that is enabled on the remote site. On the remote end, the remote gatekeeper node requires a host certificate, all signing CA certificate chains and policy files, and a goot time source.
In a grid environment, there are one or more clusters accessible via grid middleware like the Globus Toolkit. In case of Globus, there is the Globus gatekeeper listening on TCP port 2119 of the remote cluster. The port is opened to a single machine called head node (HN).The head-node is typically located in a de-militarized zone (DMZ) of the firewall setup, as it requires limited outside connectivity and a public IP address so that it can be contacted. Additionally, once the gatekeeper accepted a job, it passes it on to a jobmanager. Often, these jobmanagers require a limited port range, in the example TCP ports 40000-41000, to call back to the submit machine.
For the user to be able to run jobs on the remote site, the user must have some form of an account on the remtoe site. The user's grid identity is passed from the submit host. An entity called grid mapfile on the gatekeeper maps the user's grid identity into a remote account. While most sites do not permit account sharing, it is possible to map multiple user certificates to the same account.
The gatekeeper is the interface through which jobs are submitted to the remote cluster's resource manager. A resource manager is a scheduling system like PBS, Maui, LSF, FBSNG or Condor that queues tasks and allocates worker nodes. The worker nodes (WN) in the remote cluster might not have outside connectivity and often use all private IP addresses. The Globus toolkit requires a shared filesystem to properly stage files between the head node and worker nodes.
Note
The shared filesystem requirement is imposed by Globus. Pegasus is capable of supporting advanced site layouts that do not require a shared filesystem. Please contact us for details, should you require such a setup.
To stage data between external sites for the job, it is recommended to enable a GridFTP server. If a shared networked filesystem is involved, the GridFTP server should be located as close to the file-server as possible. The GridFTP server requires TCP port 2811 for the control channel, and a limited port range for data channels, here as an example the TPC ports from 40000 to 41000. The GridFTP server requires a host certificate, the signing CA chain and policy files, a stable time source, and a gridmap file that maps between a user's grid identify and the user's account on the remote site.
The GridFTP server is often installed on the head node, the same as the gatekeeper, so that they can share the grid mapfile, CA certificate chains and other setups. However, for performance purposes it is recommended that the GridFTP server has its own machine.
The pevious figure shows a sample layout for sky computing (as in: multiple clouds) as supported by Pegasus. At this point, it is up to the user to provision the remote resources with a proper VM image that includes a Condor startd and proper Condor configuration to report back to a Condor collector that the Condor schedd has access to.
In this discussion, the submit host (SH) is located logically external to the cloud provider(s). The SH is the point where a user submits Pegasus workflows for execution. This site typically runs a Condor collector to gather resource announcements, or is part of a larger Condor pool that collects these announcement. Condor makes the remote resources available to the submit host’s Condor installation.
The figure above shows the way Pegasus WMS is deployed in cloud computing resources, ignoring how these resources were provisioned. The provisioning request shows multiple resources per provisioning request.
The provisioning broker -- Nimbus, Eucalyptus or EC2 -- at the remote site is responsible to allocate and set up the resources. For a multi-node request, the worker nodes often require access to a form of shared data storage. Concretely, either a POSIX-compliant shared file system (e.g. NFS, PVFS) is available to the nodes, or can be brought up for the lifetime of the application workflow. The task steps of the application workflow facilitate shared file systems to exchange intermediary results between tasks on the same cloud site. Pegasus also supports an S3 data mode for the application workflow data staging.
The initial stage-in and final stage-out of application data into and out of the node set is part of any Pegasus-planned workflow. Several configuration options exist in Pegasus to deal with the dynamics of push and pull of data, and when to stage data. In many use-cases, some form of external access to or from the shared file system that is visible to the application workflow is required to facilitate successful data staging. However, Pegasus is prepared to deal with a set of boundary cases.
The data server in the figure is shown at the submit host. This is not a strict requirement. The data server for consumed data and data products may both be different and external to the submit host.
Once resources begin appearing in the pool managed by the submit machine’s Condor collector, the application workflow can be submitted to Condor. A Condor DAGMan will manage the application workflow execution. Pegasus run-time tools obtain timing-, performance and provenance information as the application workflow is executed. At this point, it is the user's responsibility to de-provision the allocated resources.
In the figure, the cloud resources on the right side are assumed to have uninhibited outside connectivity. This enables the Condor I/O to communicate with the resources. The right side includes a setup where the worker nodes use all private IP, but have out-going connectivity and a NAT router to talk to the internet. The Condor connection broker (CCB) facilitates this setup almost effortlessly.
The left side shows a more difficult setup where the connectivity is fully firewalled without any connectivity except to in-site nodes. In this case, a proxy server process, the generic connection broker (GCB), needs to be set up in the DMZ of the cloud site to facilitate Condor I/O between the submit host and worker nodes.
If the cloud supports data storage servers, Pegasus is starting to support workflows that require staging in two steps: Consumed data is first staged to a data server in the remote site's DMZ, and then a second staging task moves the data from the data server to the worker node where the job runs. For staging out, data needs to be first staged from the job's worker node to the site's data server, and possibly from there to another data server external to the site. Pegasus is capable to plan both steps: Normal staging to the site's data server, and the worker-node staging from and to the site's data server as part of the job. We are working on expanding the current code to support a more generic set by Pegasus 3.1.
Ideally, any allocation of multiple resources shares a file system among them. This can be either a globally shared file system like NFS, or a per-request file system dynamically allocated like PVFS or NFS. Often, for the sake of speed, it is advisable that each resource has its own fast local non-shared disk space that can be transiently used by application workflow jobs.
Pegasus recommends the use of a shared file system, because it simplifies workflows and may even improve performance. Without shared file system, the same data may need to be staged to the same site but different resources multiple times. Without shared file systems, data products of a parent job need to be staged out, and staged back in to the child job, currently via an external data server, even if parent and child run on the same multi-resource allocation.
The Pegasus team explores the option of dynamically bringing up an NFS server on one resource of a multi-resource allocation for the duration of the allocation. This option is part of the configuration for an experiment’s workflow’s provisioning stage. We do not expect cloud sites to ban NFS and RPC traffic between resources of the same multi-resource allocation.
This section discusses the various resource configurations that can be used with Pegasus when Condor DAGMan is used as the workflow execution engine. It is assumed that there is a submit host where the workflow is submitted for execution. The following classification is done based the mechanism used for submitting jobs to the Grid resources and monitoring them. The classifications explored in this document are using Globus GRAM and using a Condor pool. Both of the configurations use Condor DAGMan to maintain the dependencies between the jobs, but differ in the manner as to how the jobs are launched. A combination of the above mentioned approach is also possible where some of the tasks in the workflow are executed in the Condor pool and the rest are executed on remote resources using Condor-G.
In this configuration , Pegasus schedules the jobs to run locally on the submit host. This is achieved by executing the workflow on site local ( --sites local option to pegasus-plan ). The site "local" is a reserved site in Pegasus and results in the jobs to run on the submit host in condor universe local.
In this configuration, it is assumed that the target execution system consists of one or more Grid resources. These resources may be geographically distributed and under various administrative domains. Each resource might be a single desktop computer or a network of workstations (NOW) or a cluster of dedicated machines. However, each resource must provide a Globus GRAM interface which allows the users to submit jobs remotely. In case the Grid resource consists of multiple compute nodes, e.g. a cluster or a network of workstations, there is a central entity called the head node that acts as the single point of job submissions to the resource. It is generally possible to specify whether the submitted job should run on the head node of the resource or a worker node in the cluster or NOW. In the latter case, the head node is responsible for submitting the job to a local resource management system (PBS, LSF, Condor etc) which controls all the machines in the resource. Since, the head node is the central point of job submissions to the resource it should not be used for job execution since that can overload the head node delaying further job submissions. Pegasus does not make any assumptions about the configuration of the remote resource; rather it provides the mechanisms by which such distinctions can be made.
In this configuration, Condor-G is used for submitting jobs to these resources. Condor-G is an extension to Condor that allows the jobs to be described in a Condor submit file and when the job is submitted to Condor for execution, it uses the Globus GRAM interface to submit the job to the remote resource and monitor its execution. The distinction is made in the Condor submit files by specify the universe as Grid and the grid_resource or globusscheduler attribute is used to indicate the location of the head node for the remote resource. Thus, Condor DAGMan is used for maintaining the dependencies between the jobs and Condor-G is used to launch the jobs on the remote resources using GRAM. The implicit assumption in this case is that all the worker nodes on a remote resource have access to a shared file system that can be used for data transfer between the tasks mapped on that resource. This data transfer is done using files.
Below is an example of a site configured to use Globus GRAM2| GRAM5
<site handle="isi" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu" \
mount-point="/nfs/scratch01"/>
<internal-mount-point mount-point="/nfs/scratch01"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://skynet-data.isi.edu" \
mount-point="/exports/storage01"/>
<internal-mount-point mount-point="/exports/storage01"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu" />
</site>
A Condor pool is a set of machines that use Condor for resource management. A Condor pool can be a cluster of dedicated machines or a set of distributively owned machines. Pegasus can generate concrete workflows that can be executed on a Condor pool.
The workflow is submitted using DAGMan from one of the job submission machines in the Condor pool. It is the responsibility of the Central Manager of the pool to match the task in the workflow submitted by DAGMan to the execution machines in the pool. This matching process can be guided by including Condor specific attributes in the submit files of the tasks. If the user wants to execute the workflow on the execution machines (worker nodes) in a Condor pool, there should be a resource defined in the site catalog which represents these execution machines. The universe attribute of the resource should be vanilla. There can be multiple resources associated with a single Condor pool, where each resource identifies a subset of machine (worker nodes) in the pool. Pegasus currently uses the FileSystemDomain classad[] attribute to restrict the set of machines that make up a single resource. To clarify this point, suppose there are certain execution machines in the Condor pool whose FileSystemDomain is set to “viz.isi.edu”. If the user wants to execute the workflow on these machines, then there should be a site, say “isi_viz” defined in the site catalog and the FileSystemDomain and universe attributes for this resource should be defined as “viz.isi.edu” and “vanilla” respectively. When invoking Pegasus, the user should select isi_viz as the compute resource.
<site handle="isi_viz" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/nfs/scratch01"/>
<internal-mount-point mount-point="/nfs/scratch01"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/exports/storage01"/>
<internal-mount-point mount-point="/exports/storage01"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu"/>
<profile namespace="condor" key="universe">vanilla</profile>
<profile namespace="condor" key="FileSystemDomain">viz.isi.edu</profile>
</site>
Pegasus WMS can execute workflows over Condor pool. This pool can contain machines managed by a single institution or department and belonging to a single administrative domain. This is the case for most of the Condor pools. In this section we describe how machines from different administrative domains and supercomputing centers can be dynamically added to a Condor pool for certain timeframe. These machines join the Condor pool temporarily and can be used to execute jobs in a non preemptive manner. This functionality is achieved using a Condor feature called Glide-in http://cs.wisc.edu/condor/glidein that uses Globus GRAM interface for migrating machines from different domains to a Condor pool. The number of machines and the duration for which they are required can be specified
In this case, we use the abstraction of a local Condor pool to execute the jobs in the workflow over remote resources that have joined the pool for certain timeframe. Details about the use of this feature can be found in the condor manual (http://cs.wisc.edu/condor/manual/).
A basic step to migrate in a job to a local condor pool is described below.
$condor_glidein -count 10 gatekeeper.site.edu/jobmanager-pbs
GlideIn of Remote Globus Resources
The above step glides in 10 nodes to the user’s local condor pool, from the remote pbs scheduler running at gatekeeper.site.edu. By default, the glide in binaries are installed in the users home directory on the remote host.
It is possible that the Condor pool can contain resources from multiple Grid sites. It is normally the case that the resources from a particular site share the same file system and thus use the same FileSystemDomain attribute while advertising their presence to the Central Manager of the pool. If the user wants to run his jobs on machines from a particular Grid site, he has to specify the FileSystemDomain attribute in the requirements classad expression in the submit files with a value matching the FileSystemDomain of the machines from that site. For example, the user migrates nodes from the ISI cluster (with FileSystemDomain viz.isi.org) into a Condor pool and specifies FileSystemDomain == “viz.isi.edu”. Condor would then schedule the jobs only on the nodes from the ISI VIZ cluster in the local condor pool. The FileSystemDomain can be specified for an execution site in the site catalog with condor profile namespace as follows
<site handle="isi_viz" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/nfs/scratch01"/>
<internal-mount-point mount-point="/nfs/scratch01"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/exports/storage01"/>
<internal-mount-point mount-point="/exports/storage01"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu"/>
<profile namespace="pegasus" key="style">glidein</profile>
<profile namespace="condor" key="universe">vanilla</profile>
<profile namespace="condor" key="FileSystemDomain">viz.isi.edu</profile>
</site>
Specifying the FileSystemDomain key in condor namespace for a site, triggers Pegasus into generating the requirements classad expression in the submit file for all the jobs scheduled on that particular site. For example, in the above case all jobs scheduled on site isi_condor would have the following expression in the submit file.
requirements = FileSystemDomain == “viz.isi.edu”
glideinWMS is a glidein system widely used on Open Science Grid. Pegasus has a convenience style named glideinWMS to make running workflows on top of glideinWMS easy. This style is similar to the Condor style, but provides some more defaults set up to work well with glideinWMS. All you have to do is specify the style profile in the site catalog:
<site handle="isi_viz" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/nfs/scratch01"/>
<internal-mount-point mount-point="/nfs/scratch01"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/exports/storage01"/>
<internal-mount-point mount-point="/exports/storage01"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu"/>
<profile namespace="pegasus" key="style">glideinwms</profile>
</site>
Planned jobs will have universe, requirements and rank set:
universe = vanilla
rank = DaemonStartTime
requirements = ((GLIDEIN_Entry_Name == "isi_iz") || (TARGET.Pegasus_Site == "isi_viz")) \
&& (IS_MONITOR_VM == False) && (Arch != "") && (OpSys != "") \
&& (Disk != -42) && (Memory > 1) && (FileSystemDomain != "")
This section describes the various changes required in the site catalog for Pegasus to generate an executable workflow that uses gLite blahp to directly submit to PBS on the local machine. This mode of submission should only be used when the condor on the submit host can directly talk to scheduler running on the cluster. It is recommended that the cluster that gLite talks to is designated as a separate compute site in the Pegasus site catalog. To tag a site as a gLite site the following two profiles need to be specified for the site in the site catalog
pegasus profile style with value set to glite.
condor profile grid_resource with value set to pbs|lsf
An example site catalog entry for a glite site looks as follows in the site catalog
<site handle="isi_viz_glite" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="smarty.isi.edu/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/nfs/scratch01"\>
<internal-mount-point mount-point="/nfs/scratch01"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gsiftp://viz-login.isi.edu" \
mount-point="/exports/storage01"\>
<internal-mount-point mount-point="/exports/storage01"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu" />
<!-- following profiles reqd for glite grid style-->
<profile namespace="pegasus" key="style">glite</profile>
<profile namespace="condor" key="grid_resource">pbs</profile>
</site>
As part of applying the style to the job, this style adds the following classads expressions to the job description.
+remote_queue - value picked up from globus profile queue
+remote_cerequirements - See below
The remote CE requirements are constructed from the following profiles associated with the job. The profiles for a job are derived from various sources
transformation catalog
site catalog
DAX
user properties
The following globus profiles if associated with the job are picked up and translated to corresponding glite key
hostcount -> PROCS
count -> NODES
maxwalltime -> WALLTIME
The following condor profiles if associated with the job are picked up and translated to corresponding glite key
priority -> PRIORITY
All the env profiles are translated to MYENV
The remote_cerequirements expression is constructed on the basis of the profiles associated with job . An example +remote_cerequirements classad expression in the submit file is listed below
+remote_cerequirements = "PROCS==18 && NODES==1 && PRIORITY==10 && WALLTIME==3600 \
&& PASSENV==1 && JOBNAME==\"TEST JOB\" && MYENV ==\"JAVA_HOME=/bin/java,APP_HOME=/bin/app\""
gLite blahp does not follow the remote_initialdir or initialdir classad directives. Hence, all the jobs that have this style applied don't have a remote directory specified in the submit directory. Instead, Pegasus relies on kickstart to change to the working directory when the job is launched on the remote node.
It is possible to run on a condor pool without the worker nodes sharing a filesystem and relying on Condor to do the file transfers. However, for the Condor file to transfer the input data for a job, all the input data
Must be on the submit host
The input files should be catalogued as file URL's in the Replica Catalog with the pool attribute set as local for all the locations. To make sure that local URL's are always picked up use the Local Replica Selector.
To set up Pegasus to plan workflows for Amazon you need to create an 'local-condor' entry your site catalog. The site configuration is similar to what you would create for running on a local Condor pool (See previous section).
<site handle="local-condor" arch="x86" os="LINUX">
<grid type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/>
<grid type="gt2" contact="localhost/jobmanager-condor" scheduler="unknown" jobtype="compute"/>
<head-fs>
<!-- the paths for scratch filesystem are the paths on local site as we execute create dir job
on local site. Improvements planned for 3.1 release.-->
<scratch>
<shared>
<file-server protocol="file" url="file:///" mount-point="/submit-host/scratch"/>
<internal-mount-point mount-point="/submit-host/scratch"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="file" url="file:///" mount-point="/glusterfs/scratch"/>
<internal-mount-point mount-point="/glusterfs/scratch"/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://dummyValue.url.edu" />
<profile namespace="env" key="PEGASUS_HOME" >/cluster-software/pegasus/2.4.1</profile>
<profile namespace="env" key="GLOBUS_LOCATION" >/cluster-software/globus/5.0.1</profile>
<!-- profies for site to be treated as condor pool -->
<profile namespace="pegasus" key="style" >condor</profile>
<profile namespace="condor" key="universe" >vanilla</profile>
<!-- required for Condor File IO and transferring stdout and stderr back -->
<profile namespace="condor" key="should_transfer_files">YES</profile>
<profile namespace="condor" key="transfer_output">true</profile>
<profile namespace="condor" key="transfer_error">true</profile>
<profile namespace="condor" key="WhenToTransferOutput">ON_EXIT</profile>
<!-- to enable kickstart staging from local site-->
<profile namespace="condor" key="transfer_executable">true</profile>
</site>
Next you need need to update pegasus.properties to tell Pegasus to turn on Condor File IO and execute jobs on the worker nodes filesystem.
pegasus.data.configuration = Condor
By default, Pegasus creates a workflow specific execution directory that is associated with the compute site ( in this case local-condor). However, in this scenario, the workflow specific execution directory needs to be created on the submit host, as native Condor File I/O only works against local directories. To ensure that the create dir job for the workflow for the local-condor site, you need to do the following
the paths for the scratch filesystem in the site catalog entry for local-condor site should actually refer to paths on the submit host. It is illustrated in the example site catalog entry above.
-
have an entry in the transformation catalog for pegasus::dirmanager for site local-condor that has a condor profile that makes the create dir job run in local universe instead of vanilla.
# This entry for pegaasus-dirmanager is to force the create dir jobs to run in the local universe instead of # the vanilla universe. The CONDOR transfer settings are so that stdout and stderr are # not redirected to initialdir instead of the submit dir local-condor pegasus::dirmanager /path/on/submithost/pegasus/bin/dirmanager INSTALLED INTEL32::LINUX CONDOR: :universe="local"CONDOR::requirements="";CONDOR::transfer_output="";CONDOR::transfer_error="";CONDOR::whentotransferoutput=""
Note
In Pegasus 3.1, running workflows in this setup will be made much simpler.
In order to use Amazon to execute workflows you need to a) set up an execution environment in EC2, and b) configure Pegasus to plan workflows for that environment.
There are many different ways to set up the execution environment in Amazon. The easiest way is to use a submit machine outside the cloud, and to provision several worker nodes and a file server node in the cloud as shown here:
The submit machine runs Pegasus and a Condor master (collector, schedd, negotiator),the workers run a Condor startd, and the file server node exports an NFS file system. The workers' startd is configured to connect to the master running outside the cloud. The worker also mounts the NFS file system. More information on setting up Condor for this environment can be found at http://www.isi.edu/~gideon/condor-ec2/.
To set up Pegasus to plan workflows for Amazon you need to create an 'ec2' entry your site catalog. The site configuration is similar to what you would create for running on a local Condor pool (See previous section).
<site handle="ec2" arch="x86" os="LINUX" osrelease="" osversion="" glibc="">
<!-- These entries are required, but not used -->
<grid type="gt2" contact="example.com/jobmanager-pbs" scheduler="PBS" jobtype="auxillary"/>
<grid type="gt2" contact="example.com/jobmanager-fork" scheduler="PBS" jobtype="compute"/>
<head-fs>
<scratch>
<shared>
<file-server protocol="gsiftp" url="gridftp://example.com" \
mount-point="/nfs"\>
<internal-mount-point mount-point="/nfs"/>
</shared>
</scratch>
<storage>
<shared>
<file-server protocol="gsiftp" url="gridftp://example.com" \
mount-point=""\>
<internal-mount-point mount-point=""/>
</shared>
</storage>
</head-fs>
<replica-catalog type="LRC" url="rlsn://smarty.isi.edu" />
<!-- following profiles reqd for ec2 -->
<profile namespace="env" key="PEGASUS_HOME">/usr/local/pegasus/default</profile>
<!-- The directory where a user wants to run the jobs on the
nodes retrived from ec2 -->
<profile namespace="env" key="wntmp">/mnt</profile>
<!-- Use condor style vanilla universe jobs -->
<profile namespace="pegasus" key="style">condor</profile>
<profile namespace="condor" key="universe">vanilla</profile>
<!-- This ensures that Condor sends stdout and stderr back to the submit host -->
<profile namespace="condor" key="should_transfer_files">YES</profile>
<profile namespace="condor" key="transfer_output">true</profile>
<profile namespace="condor" key="transfer_error">true</profile>
<profile namespace="condor" key="WhenToTransferOutput">ON_EXIT</profile>
<!-- This ensures that the jobs generated by Pegasus will run on the remote workers -->
<profile namespace="condor" key="requirements">(Arch==Arch)&&(Disk!=0)&&(Memory!=0)&&(OpSys==OpSys)&&(FileSystemDomain!="")</profile>
</site>
This section will show you how to use S3 to store intermediate files. In this mode, Pegasus transfers workflow inputs from the input site to S3 if they are not already in S3. When a job runs, the inputs for that job are fetched from S3 to the worker node, the job is executed, then the output files are transferred from the worker node back to S3. This is similar to how second-level staging works (SLS), and, in fact, the current S3 implementation is an extension of SLS.
In order to use S3 for your workflows, you need to first create a config file for the S3 transfer client, s3cmd. This config file needs to be installed on both the submit host, and the worker nodes. Save the file as 's3cmd.cfg'. This file will contain your Amazon security tokens, so you will want to set the permissions so that other users cannot read it (e.g. chmod 600 s3cmd.cfg). In the listing below, replace 'YOUR_AMAZON_ACCESS_KEY_HERE' and 'YOUR_AMAZON_SECRET_KEY_HERE' with your Amazon access key and secret key.
[default] access_key = YOUR_AMAZON_ACCESS_KEY_HERE secret_key = YOUR_AMAZON_SECRET_KEY_HERE acl_public = False bucket_location = US debug_syncmatch = False default_mime_type = binary/octet-stream delete_removed = False dry_run = False encrypt = False force = False gpg_command = /usr/bin/gpg gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_passphrase = guess_mime_type = False host_base = s3.amazonaws.com host_bucket = %(bucket)s.s3.amazonaws.com human_readable_sizes = False preserve_attrs = True proxy_host = proxy_port = 0 recv_chunk = 4096 send_chunk = 4096 simpledb_host = sdb.amazonaws.com use_https = False verbosity = WARNING
Next, you need to modify your site catalog to tell s3cmd the path to your config file. You can do this by specifying an environment variable called S3CFG using an 'env' profile. This needs to be added to both the 'local' site, and any execution sites where you want to use S3. This environment variable will be used by s3cmd in the S3 transfer jobs to locate the config file.
<profile namespace="env" key="S3CFG">/local/path/to/s3cmd.cfg</profile>
Next, you need to update your pegasus.properties file to tell Pegasus to turn on S3 mode. This will tell Pegasus to use S3 instead of rgular shared filesystem.
pegasus.data.configuration = S3
Note
The current S3 support does not work mixing storage solutions. Once S3 mode is turned on, Pegasus shared file system mode is turned off. For example, you currently can not use S3 and iRods storage at the same time.
Finally, if you want to use an existing bucket to store your data in S3 add this to your site catalog entry (replacing the existing workdirectory):
<scratch>
<shared>
<file-server protocol="s3" url="s3://user@amazon" mount-point="/existing-bucket"/>
<internal-mount-point mount-point="/existing-bucket"/>
</shared>
</scratch>
Otherwise, you can have Pegasus create a new bucket for each workflow by adding this to your site catalog entry (again, replacing any existing workdirectory):
<scratch>
<shared>
<file-server protocol="s3" url="s3://user@amazon" mount-point="/"/>
<internal-mount-point mount-point="/"/>
</shared>
</scratch>














