2. Configuration

Pegasus has configuration options to configure

the behavior of an individual job via profiles
the behavior of the whole system via properties

For job level configuration ( such as what environment a job is set with ), the Pegasus Workflow Mapper uses the concept of profiles. Profiles encapsulate configurations for various aspects of dealing with the Grid infrastructure. They provide an abstract yet uniform interface to specify configuration options for various layers from planner/mapper behavior to remote environment settings. At various stages during the mapping process, profiles may be added associated with the job. The system supports five diffferent namespaces, with each namespace refers to a different aspect of a job’s runtime settings. A profile’s representation in the executable workflow (e.g. the Condor submit files) depends on its namespace. Pegasus supports the following Namespaces for profiles:

env permits remote environment variables to be set.
globus sets Globus RSL parameters.
condor sets Condor configuration parameters for the submit file.
dagman introduces Condor DAGMan configuration parameters.
pegasus configures the behaviour of various planner/mapper components.
selector allows to override site selection behavior of the planner. Can be specified only in the DAX.

Note

The hints namespace is deprecated starting Pegasus 5.0. Use the selector namespace instead.

Properties are primarily used to configure the behavior of the Pegasus WMS system at a global level. The properties file is actually a java properties file and follows the same conventions as that to specify the properties.

This chapter describes various types of profiles and properties, levels of priorities for intersecting profiles, and how to specify profiles in different contexts.

2.1. Differences between Profiles and Properties

The main difference between properties and profiles is that profiles eventually get associated at a per job level in the workflow. On the other hand, properties are a way of configuring and controlling the behavior of the whole system. While all profiles can be specified in the properties file, not all properties can be used as profiles. This section lists out the properties supported by Pegasus and if any can be used as a profile, it is clearly indicated.

2.2. Profiles

2.2.1. Profile Structure Heading

All profiles are triples comprised of a namespace, a name or key, and a value. The namespace is a simple identifier. The key has only meaning within its namespace, and it’s yet another identifier. There are no constraints on the contents of a value

Profiles may be represented with different syntaxes in different context. However, each syntax will describe the underlying triple.

2.2.2. Sources for Profiles

Profiles may enter the job-processing stream at various stages. Depending on the requirements and scope a profile is to apply, profiles can be associated at

as user property settings.
workflow level
in the site catalog
in the transformation catalog

Unfortunately, a different syntax applies to each level and context. This section shows the different profile sources and syntaxes. However, at the foundation of each profile lies the triple of namespace, key and value.

2.2.2.1. User Profiles in Properties

Users can specify all profiles in the properties files where the property name is [namespace].key and value of the property is the value of the profile.

Namespace can be env|condor|globus|dagman|pegasus

Any profile specified as a property applies to the whole workflow i.e (all jobs in the workflow) unless overridden at the job level , Site Catalog , Transformation Catalog Level.

Some profiles that they can be set in the properties file are listed below

env.JAVA_HOME "/software/bin/java"

condor.periodic_release 5
condor.periodic_remove  my_own_expression
condor.stream_error true
condor.stream_output fa

globus.maxwalltime  1000
globus.maxtime      900
globus.maxcputime   10
globus.project      test_project
globus.queue        main_queue

dagman.post.arguments --test arguments
dagman.retry  4
dagman.post simple_exitcode
dagman.post.path.simple_exitcode  /bin/exitcode/exitcode.sh
dagman.post.scope all
dagman.maxpre  12
dagman.priority 13

dagman.bigjobs.maxjobs 1


pegasus.clusters.size 5

pegasus.stagein.clusters 3

2.2.2.2. Profiles in the Workflow

Examples of profiles used here include the maximum exected runtime of the job and its maximum required amount of disk space in MB. For example, these can be added to a job as follows:

# Python API
job = Job("abc")
job.add_profiles(Namespace.PEGASUS, key="walltime", value=2)
job.add_profiles(Namespace.PEGASUS, key="diskspace", value=1)

2.2.2.3. Profiles in Site Catalog

If it becomes necessary to limit the scope of a profile to a single site, these profiles should go into the site catalog. A profile in the site catalog applies to all jobs and all application that run at the site. Commonly, site catalog profiles set environment settings like the LD_LIBRARY_PATH, or globus rsl parameters like queue and project names. The following example illustrates the creation of a site called CCG, which has two profiles added to it at the end.

# Python API
ccg_site = Site(name="CCG", arch=Arch.X86_64, os_type=OS.LINUX)\
            .add_grids(
                Grid(
                    grid_type=Grid.GT5,
                    contact="obelix.isi.edu/jobmanager-fork",
                    scheduler_type=Scheduler.FORK,
                    job_type=SupportedJobs.AUXILLARY
                )
            )\
            .add_directories(
                Directory(directory_type=Directory.SHARED_SCRATCH, path="/shared-scratch")
                    .add_file_servers(FileServer(url="gsiftp://headnode.isi.edu/shared-scratch", operation_type=Operation.ALL)),
                Directory(directory_type=Directory.LOCAL_STORAGE, path="/local-storage")
                    .add_file_servers(FileServer(url="gsiftp://headnode.isi.edu/local-storage", operation_type=Operation.ALL))
            )\
            .add_profiles(Namespace.PEGASUS, key="clusters.num", value=1)\
            .add_profiles(Namespace.ENV, key="PEGASUS_HOME", value="/usr")

2.2.2.4. Profiles in Transformation Catalog

Some profiles require a narrower scope than that of what the site catalog offers as they may only apply to certain applications on certain sites. Examples of this would be transformation specific profiles such as CPU related variables or job clustering options. Such profiles are best specified in the transformation catalog.

Profiles can be associated to a specific transformation site or the transformation itself (which would then be applied to all of its transformation sites).

In the following example, the transformation keg resides on two sites (represented as TransformationSite objects). Profiles have been added to each site specificly as well as to keg itself. The environment variable, APP_HOME would be applied to keg for both the isi and wind sites.

# Python API
keg = Transformation(name="keg")
keg.add_profiles(Namespace.ENV, key="APP_HOME", value="/tmp/scratch")
keg.add_sites(
        TransformationSite(
            name="isi",
            pfn="/path/to/keg",
            is_stageable=False,
            arch=Arch.X86_64,
            os_type=OS.LINUX
        )
        .add_profiles(Namespace.ENV, key="HELLO", value="WORLD")
    )

keg.add_sites(
        TransformationSite(
            name="wind",
            pfn="/path/to/keg",
            is_stageable=False,
            arch=Arch.X86_64,
            os_type=OS.LINUX,
            is_stageable=True
        )
        .add_profiles(Namespace.ENV, key="CPATH", value="/usr/cpath")
    )

2.2.3. Profiles Conflict Resolution

Irrespective of where the profiles are specified, eventually the profiles are associated with jobs. Multiple sources may specify the same profile for the same job. For instance, a job may specify an environment variable X. The site catalog may also specify an environment variable X for the chosen site. The transformation catalog may specify an environment variable X for the chosen site and application. When the job is concretized, these three conflicts need to be resolved.

Pegasus defines a priority ordering of profiles. The higher priority takes precedence (overwrites) a profile of a lower priority.

Transformation Catalog Profiles
Site Catalog Profiles
Job Profiles (defined in the Workflow)
Profiles in Properties

2.2.4. Details of Profile Handling

The previous sections omitted some of the finer details for the sake of clarity. To understand some of the constraints that Pegasus imposes, it is required to look at the way profiles affect jobs.

2.2.4.1. Details of env Profiles

Profiles in the env namespace are translated to a semicolon-separated list of key-value pairs. The list becomes the argument for the Condor environment command in the job’s submit file.

######################################################################
# Pegasus WMS  SUBMIT FILE GENERATOR
# DAG : black-diamond, Index = 0, Count = 1
# SUBMIT FILE NAME : findrange_ID000002.sub
######################################################################
globusrsl = (jobtype=single)
environment=GLOBUS_LOCATION=/shared/globus;LD_LIBRARY_PATH=/shared/globus/lib;
executable = /shared/software/linux/pegasus/default/bin/kickstart
globusscheduler = columbus.isi.edu/jobmanager-condor
remote_initialdir = /shared/CONDOR/workdir/isi_hourglass
universe = globus
&mldr;
queue
######################################################################
# END OF SUBMIT FILE

Condor-G, in turn, will translate the environment command for any remote job into Globus RSL environment settings, and append them to any existing RSL syntax it generates. To permit proper mixing, all environment setting should solely use the env profiles, and none of the Condor nor Globus environment settings.

If kickstart starts a job, it may make use of environment variables in its executable and arguments setting.

2.2.4.2. Details of globus Profiles

Profiles in the globus Namespaces are translated into a list of paranthesis-enclosed equal-separated key-value pairs. The list becomes the value for the Condor globusrsl setting in the job’s submit file:

######################################################################
# Pegasus WMS SUBMIT FILE GENERATOR
# DAG : black-diamond, Index = 0, Count = 1
# SUBMIT FILE NAME : findrange_ID000002.sub
######################################################################
globusrsl = (jobtype=single)(queue=fast)(project=nvo)
executable = /shared/software/linux/pegasus/default/bin/kickstart
globusscheduler = columbus.isi.edu/jobmanager-condor
remote_initialdir = /shared/CONDOR/workdir/isi_hourglass
universe = globus
&mldr;
queue
######################################################################
# END OF SUBMIT FILE

For this reason, Pegasus prohibits the use of the globusrsl key in the condor profile namespace.

2.2.5. The Env Profile Namespace

The env namespace allows users to specify environment variables of remote jobs. Globus transports the environment variables, and ensure that they are set before the job starts.

The key used in conjunction with an env profile denotes the name of the environment variable. The value of the profile becomes the value of the remote environment variable.

Grid jobs usually only set a minimum of environment variables by virtue of Globus. You cannot compare the environment variables visible from an interactive login with those visible to a grid job. Thus, it often becomes necessary to set environment variables like LD_LIBRARY_PATH for remote jobs.

If you use any of the Pegasus worker package tools like transfer or the rc-client, it becomes necessary to set PEGASUS_HOME and GLOBUS_LOCATION even for jobs that run locally

Useful Environment Settings
Key Attributes	Description
Property Key: env.PEGASUS_HOME Profile Key: PEGASUS_HOME Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	Used by auxillary jobs created by Pegasus both on remote site and local site. Should be set usually set in the Site Catalog for the sites
Property Key: env.GLOBUS_LOCATION Profile Key: GLOBUS_LOCATION Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	Used by auxillary jobs created by Pegasus both on remote site and local site. Should be set usually set in the Site Catalog for the sites
Property Key: env.LD_LIBRARY_PATH Profile Key: LD_LIBRARY_PATH Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	Point this to $GLOBUS_LOCATION/lib, except you cannot use the dollar variable. You must use the full path. Applies to both, local and remote jobs that use Globus components and should be usually set in the site catalog for the sites

Even though Condor and Globus both permit environment variable settings through their profiles, all remote environment variables must be set through the means of env profiles.

2.2.6. The Globus Profile Namespace

The globus profile namespace encapsulates Globus resource specification language (RSL) instructions. The RSL configures settings and behavior of the remote scheduling system. Some systems require queue name to schedule jobs, a project name for accounting purposes, or a run-time estimate to schedule jobs. The Globus RSL addresses all these issues.

A key in the globus namespace denotes the command name of an RSL instruction. The profile value becomes the RSL value. Even though Globus RSL is typically shown using parentheses around the instruction, the out pair of parentheses is not necessary in globus profile specifications

The table below shows some commonly used RSL instructions. For an authoritative list of all possible RSL instructions refer to the Globus RSL specification.

Useful Globus RSL Instructions
Property Key	Description
Property Key: globus.count Profile Key:count Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the number of times an executable is started.
Property Key: globus.jobtype Profile Key: jobtype Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	specifies how the job manager should start the remote job. While Pegasus defaults to single, use mpi when running MPI jobs.
Property Key: globus.maxcputime Profile Key: maxcputime Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the max CPU time in minutes for a single execution of a job.
Property Key: globus.maxmemory Profile Key: maxmemory Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the maximum memory in MB required for the job
Property Key: globus.maxtime Profile Key:maxtime Scope : TC, SC, DAX, Properties Since : 2.0 Type : Integer	the maximum time or walltime in minutes for a single execution of a job.
Property Key: globus.maxwalltime Profile Key: maxwalltime Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the maximum walltime in minutes for a single execution of a job.
Property Key: globus.minmemory Profile Key: minmemory Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the minumum amount of memory required for this job
Property Key: globus.totalmemory Profile Key: totalmemory Scope : TC, SC, DAX, Properties Since : 2.0 Type :Integer	the total amount of memory required for this job
Property Key: globus.project Profile Key: project Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	associates an account with a job at the remote end.
Property Key: globus.queue Profile Key: queue Scope : TC, SC, DAX, Properties Since : 2.0 Type :String	the remote queue in which the job should be run. Used when remote scheduler is a scheduler such PBS that supports queues.

Pegasus prevents the user from specifying certain RSL instructions as globus profiles, because they are either automatically generated or can be overridden through some different means. For instance, if you need to specify remote environment settings, do not use the environment key in the globus profiles. Use one or more env profiles instead.

RSL Instructions that are not permissible
Key	Reason for Prohibition
arguments	you specify arguments in the arguments section for a job in the DAX
directory	the site catalog and properties determine which directory a job will run in.
environment	use multiple env profiles instead
executable	the physical executable to be used is specified in the transformation catalog and is also dependant on the gridstart module being used. If you are launching jobs via kickstart then the executable created is the path to kickstart and the application executable path appears in the arguments for kickstart
stdin	you specify in the abstract workflow for the job
stdout	you specify in the abstract workflow for the job
stderr	you specify in the abstract workflow for the job

2.2.7. The Condor Profile Namespace

The Condor submit file controls every detail how and where a job is run. The condor profiles permit to add or overwrite instructions in the Condor submit file.

The condor namespace directly sets commands in the Condor submit file for a job the profile applies to. Keys in the condor profile namespace denote the name of the Condor command. The profile value becomes the command’s argument. All condor profiles are translated into key=value lines in the Condor submit file

Some of the common condor commands that a user may need to specify are listed below. For an authoritative list refer to the online condor documentation. Note: Pegasus Workflow Planner/Mapper by default specify a lot of condor commands in the submit files depending upon the job, and where it is being run.

Useful Condor Commands
Property Key	Description
Property Key: condor.universe Profile Key: universe Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type : String	Pegasus defaults to either globus or scheduler universes. Set to grid for compute jobs that require grid universe. Set to vanilla to run natively in a condor pool, or to run on resources grabbed via condor glidein.
Property Key: condor.periodic_release Profile Key: periodic_release Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type : String	is the number of times job is released back to the queue if it goes to HOLD, e.g. due to Globus errors. Pegasus defaults to 3.
Property Key:condor.periodic_remove Profile Key:periodic_remove Scope : TC, SC,Abstract WF, Properties Since : 2.0 Type : String	is the number of times a job is allowed to get into HOLD state before being removed from the queue. Pegasus defaults to 3.
Property Key: condor.filesystemdomain Profile Key: filesystemdomain Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	Useful for Condor glide-ins to pin a job to a remote site.
Property Key: condor.stream_error Profile Key: stream_error Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Boolean	boolean to turn on the streaming of the stderr of the remote job back to submit host.
Property Key: condor.stream_output Profile Key: stream_output Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Boolean	boolean to turn on the streaming of the stdout of the remote job back to submit host.
Property Key: condor.priority Profile Key:priority Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	integer value to assign the priority of a job. Higher value means higher priority. The priorities are only applied for vanilla / standard/ local universe jobs. Determines the order in which a users own jobs are executed.
Property Key: condor.request_cpus Profile Key: request_cpus Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	New in Condor 7.8.0 . Number of CPU’s a job requires.
Property Key: condor.request_gpus Profile Key:request_gpus Scope : TC, SC, Abstract WF, Properties Since : 4.6 Type :String	Number of GPU’s a job requires.
Property Key: condor.request_memory Profile Key: request_memory Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	New in Condor 7.8.0 . Amount of memory a job requires.
Property Key: condor.request_disk Profile Key: request_disk Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	New in Condor 7.8.0 . Amount of disk a job requires.

Other useful condor keys, that advanced users may find useful and can be set by profiles are

should_transfer_files
transfer_output
transfer_error
whentotransferoutput
requirements
rank

Pegasus prevents the user from specifying certain Condor commands in condor profiles, because they are automatically generated or can be overridden through some different means. The table below shows prohibited Condor commands.

Condor commands prohibited in condor profiles
Key	Reason for Prohibition
arguments	you specify arguments in the arguments section for a job in the Abstract WF.
environment	use multiple env profiles instead
executable	the physical executable to be used is specified in the transformation catalog and is also dependant on the gridstart module being used. If you are launching jobs via kickstart then the executable created is the path to kickstart and the application executable path appears in the arguments for kickstart

2.2.8. The Dagman Profile Namespace

DAGMan is Condor’s workflow manager. While planners generate most of DAGMan’s configuration, it is possible to tweak certain job-related characteristics using dagman profiles. A dagman profile can be used to specify a DAGMan pre- or post-script.

Pre- and post-scripts execute on the submit machine. Both inherit the environment settings from the submit host when pegasus-submit-dag or pegasus-run is invoked.

By default, kickstart launches all jobs except standard universe and MPI jobs. Kickstart tracks the execution of the job, and returns usage statistics for the job. A DAGMan post-script starts the Pegasus application exitcode to determine, if the job succeeded. DAGMan receives the success indication as exit status from exitcode.

If you need to run your own post-script, you have to take over the job success parsing. The planner is set up to pass the file name of the remote job’s stdout, usually the output from kickstart, as sole argument to the post-script.

The table below shows the keys in the dagman profile domain that are understood by Pegasus and can be associated at a per job basis.

Useful dagman Commands that can be associated at a per job basis
Property Key	Description
Property Key: dagman.pre Profile Key:PRE Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	is the path to the pre-script. DAGMan executes the pre-script before it runs the job.
Property Key: dagman.pre.arguments Profile Key: PRE.ARGUMENTS Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	are command-line arguments for the pre-script, if any.
Property Key: dagman.post Profile Key: POST Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	is the postscript type/mode that a user wants to associate with a job. - pegasus-exitcode - pegasus will by default associate this postscript with all jobs launched via kickstart, as long the POST.SCOPE value is not set to NONE. - none -means that no postscript is generated for the jobs. This is useful for MPI jobs that are not launched via kickstart currently. - any legal identifier - Any other identifier of the form ([_A-Za-z][_A-Za-z0-9]*), than one of the 2 reserved keywords above, signifies a user postscript. This allows the user to specify their own postscript for the jobs in the workflow. The path to the postscript can be specified by the dagman profile POST.PATH.[value] where [value] is this legal identifier specified. The user postscript is passed the name of the .out file of the job as the last argument on the command line. For e.g. if the following dagman profiles were associated with a job X POST with value user_script /bin/user_postscript POST.PATH.user_script with value /path/to/user/script POST.ARGUMENTS with value -verbose then the following postscript will be associated with the job X in the .dag file is /path/to/user/script -verbose X.out where X.out contains the stdout of the job X
Property Key: dagman.post.path.[value of dagman.post] Profile Key:post.path.[value of dagman.post] Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	the path to the post script on the submit host.
Property Key: dagman.post.arguments Profile Key:POST.ARGUMENTS Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	are the command line arguments for the post script, if any.
Property Key: dagman.retry Profile Key:RETRY Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Integer Default : 1	is the number of times DAGMan retries the full job cycle from pre-script through post-script, if failure was detected.
Property Key: dagman.category Profile Key:CATEGORY Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	the DAGMan category the job belongs to.
Property Key: dagman.priority Profile Key: PRIORITY Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Integer	the priority to apply to a job. DAGMan uses this to select what jobs to release when MAXJOBS is enforced for the DAG.
Property Key: dagman.abort-dag-on Profile Key:ABORT-DAG-ON Scope : TC, Abstract WF, Since : 4.5 Type :String	The ABORT-DAG-ON key word provides a way to abort the entire DAG if a given node returns a specific exit code (AbortExitValue). The syntax for the value of the key is AbortExitValue [RETURN DAGReturnValue] . When a DAG aborts, by default it exits with the node return value that caused the abort. This can be changed by using the optional RETURN key word along with specifying the desired DAGReturnValue

The table below shows the keys in the dagman profile domain that are understood by Pegasus and can be used to apply to the whole workflow. These are used to control DAGMan’s behavior at the workflow level, and are recommended to be specified in the properties file.

Useful dagman Commands that can be specified in the properties file.
Property Key	Description
Property Key: dagman.maxpre Profile Key: MAXPRE Scope : Properties Since : 2.0 Type :String	sets the maximum number of PRE scripts within the DAG that may be running at one time
Property Key: dagman.maxpost Profile Key: MAXPOST Scope : Properties Since : 2.0 Type :String	sets the maximum number of POST scripts within the DAG that may be running at one time
Property Key: dagman.maxjobs Profile Key: MAXJOBS Scope : Properties Since : 2.0 Type :String	sets the maximum number of jobs within the DAG that will be submitted to Condor at one time.
Property Key: dagman.maxidle Profile Key:MAXIDLE Scope : Properties Since : 2.0 Type :String	Sets the maximum number of idle jobs allowed before HTCondor DAGMan stops submitting more jobs. Once idle jobs start to run, HTCondor DAGMan will resume submitting jobs. If the option is omitted, the number of idle jobs is unlimited.
Property Key: dagman.[CATEGORY-NAME].maxjobs Profile Key:[CATEGORY-NAME].MAXJOBS Scope : Properties Since : 2.0 Type :String	is the value of maxjobs for a particular category. Users can associate different categories to the jobs at a per job basis. However, the value of a dagman knob for a category can only be specified at a per workflow basis in the properties.
Property Key: dagman.post.scope Profile Key:POST.SCOPE Scope : Properties Since : 2.0 Type :String	scope for the postscripts. - If set to all , means each job in the workflow will have a postscript associated with it. - If set to none , means no job has postscript associated with it. None mode should be used if you are running vanilla / standard/ local universe jobs, as in those cases Condor traps the remote exitcode correctly. None scope is not recommended for grid universe jobs. - If set to essential, means only essential jobs have post scripts associated with them. At present the only non essential job is the replica registration job.

2.2.9. The Pegasus Profile Namespace

The pegasus profiles allow users to configure extra options to the Pegasus Workflow Planner that can be applied selectively to a job or a group of jobs. Site selectors may use a sub-set of pegasus profiles for their decision-making.

The table below shows some of the useful configuration option Pegasus understands.

Useful Pegasus Profiles
Property Key	Description
Property Key: pegasus.clusters.num Profile Key: clusters.num Scope : TC, SC, Abstract WF, Properties Since : 3.0 Type :Integer	Please refer to the Pegasus Clustering Guide for detailed description. This option determines the total number of clusters per level. Jobs are evenly spread across clusters.
Property Key: pegasus.clusters.size Profile Key:clusters.size Scope : TC, SC, Abstract WF, Properties Since : 3.0 Type : Integer	Please refer to the Pegasus Clustering Guide for detailed description. This profile determines the number of jobs in each cluster. The number of clusters depends on the total number of jobs on the level.
Property Key: pegasus.clusterer.job.aggregator Profile Key:job.aggregator Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Integer	Indicates the clustering executable that is used to run the clustered job on the remote site.
Property Key: pegasus.clusterer.job.aggregator. arguments Profile Key: job.aggregator.arguments Scope : TC, SC, Abstract WF, Properties Since : 5.0.2 Default : None See Also : pegasus.clusterer.job.aggregator	The additional arguments with which a clustering executable should be invoked.
Property Key: pegasus.gridstart Profile Key: gridstart Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	Determines the executable for launching a job. This covers both tasks ( jobs specified by the user in the Abstract Workflow) and additional jobs added by Pegasus during the planning operation. Possible values are Kickstart \| NoGridStart \| PegasusLite \| Distribute at the moment. Note: This profile should only be set by users if you know what you are doing. Otherwise, let Pegasus do the right thing based on your configuration. - Kickstart By default, all jobs executed are launched using a lightweight C executable called pegasus-kickstart. This generates valuable runtime provenance information for the job as it is executed on a remote node. This information serves as the basis for the monitoring and debugging capabilities provided by Pegasus. - NoGridStart This explicity disables the wrapping of the jobs with pegasus-kickstart. This is internally used by the planner to launch jobs directly. If this is set, then the information populated in the monitoring database is on the basis of what is recorded in the DAGMan out file. - PegasusLite This value is automatically associated by the Planner whenever the job runs in either nonsharedfs or condorio mode. The property pegasus.data.configuration decides whether a job is launched via PegasusLite or not. PegasusLite is a lightweight Pegasus wrapper generated for each job that allows a job to run in a nonshared file system environment and is responsible for staging in the input data and staging out the output data back to a remote staging site for the job. - PegasusLite.None This value if set, forces PegasusLite to not launch the job via Kickstart. - Distribute This wrapper is a HubZero specfiic wrapper that allows compute jobs that are scheduled for a local PBS cluster to be run locally on the submit host. The jobs are wrapped with a distribute wrapper that is responsible for doing the qsub and tracking of the status of the jobs in the PBS cluster.
Property Key: pegasus.gridstart.path Profile Key: gridstart.path Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :file path	Sets the path to the gridstart . This profile is best set in the Site Catalog.
Property Key: pegasus.gridstart.arguments Profile Key:gridstart.arguments Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	Sets the arguments with which GridStart is used to launch a job on the remote site.
Property Key: pegasus.gridstart.launcher Profile Key: gridstart.launcher Scope : TC, SC, Abstract WF, Properties Since : 4.9.4 Type : file path	Specifies the launcher executable to use to launch the GridStart. Is useful, when we want the kickstart invocation for the compute jobs to be submitted using jsrun, as in on sites where originally the job lands on DTN node like OLCF sites.
Property Key: pegasus.gridstart.launcher.arguments Profile Key:gridstart.launcher.arguments Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	The arguments to the launcher executable for GridStart
Property Key: pegasus.stagein.clusters Profile Key: stagein.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type :Integer	This key determines the maximum number of stage-in jobs that are can executed locally or remotely per compute site per workflow. This is used to configure the BalancedCluster Transfer Refiner, which is the Default Refiner used in Pegasus. This profile is best set in the Site Catalog or in the Properties file
Property Key: pegasus.stagein.local.clusters Profile Key: stagein.local.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type :Integer	This key provides finer grained control in determining the number of stage-in jobs that are executed locally and are responsible for staging data to a particular remote site. This profile is best set in the Site Catalog or in the Properties file
Property Key: pegasus.stagein.remote.clusters Profile Key:stagein.remote.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type :Integer	This key provides finer grained control in determining the number of stage-in jobs that are executed remotely on the remote site and are responsible for staging data to it. This profile is best set in the Site Catalog or in the Properties file
Property Key: pegasus.stageout.clusters Profile Key: stageout.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type : Integer	This key determines the maximum number of stage-out jobs that are can executed locally or remotely per compute site per workflow. This is used to configure the BalancedCluster Transfer Refiner, which is the Default Refiner used in Pegasus.
Property Key: pegasus.stageout.local.clusters Profile Key: stageout.local.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type :Integer	This key provides finer grained control in determining the number of stage-out jobs that are executed locally and are responsible for staging data from a particular remote site. This profile is best set in the Site Catalog or in the Properties file
Property Key: pegasus.stageout.remote.clusters Profile Key: stageout.remote.clusters Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type :Integer	This key provides finer grained control in determining the number of stage-out jobs that are executed remotely on the remote site and are responsible for staging data from it. This profile is best set in the Site Catalog or in the Properties file
Property Key: pegasus.group Profile Key:group Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	Tags a job with an arbitrary group identifier. The group site selector makes use of the tag.
Property Key: pegasus.change.dir Profile Key: change.dir Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Boolean	If true, tells kickstart to change into the remote working directory. Kickstart itself is executed in whichever directory the remote scheduling system chose for the job.
Property Key: pegasus.create.dir Profile Key: create.dir Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type : Boolean	If true, tells kickstart to create the the remote working directory before changing into the remote working directory. Kickstart itself is executed in whichever directory the remote scheduling system chose for the job.
Property Key: pegasus.transfer.proxy Profile Key:transfer.proxy Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :Boolean	If true, tells Pegasus to explicitly transfer the proxy for transfer jobs to the remote site. This is useful, when you want to use a full proxy at the remote end, instead of the limited proxy that is transferred by CondorG.
Property Key: pegasus.style Profile Key:style Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type :String	Sets the condor submit file style. If set to globus, submit file generated refers to CondorG job submissions. If set to condor, submit file generated refers to direct Condor submission to the local Condor pool. It applies for glidein, where nodes from remote grid sites are glided into the local condor pool. The default style that is applied is globus.
Property Key: pegasus.pmc_request_memory Profile Key:pmc_request_memory Scope : TC, SC, Abstract WF, Properties Since : 4.2 Type :Integer	This key is used to set the -m option for pegasus-mpi-cluster. It specifies the amount of memory in MB that a job requires. This profile is usually set in the Abstract WF for each job.
Property Key: pegasus.pmc_request_cpus Profile Key: pmc_request_cpus Scope : TC, SC, Abstract WF, Properties Since : 4.2 Type :Integer	This key is used to set the -c option for pegasus-mpi-cluster. It specifies the number of cpu’s that a job requires. This profile is usually set in the Abstract WF for each job.
Property Key: pegasus.pmc_priority Profile Key: pmc_priority Scope : TC, SC, Abstract WF, Properties Since : 4.2 Type :Integer	This key is used to set the -p option for pegasus-mpi-cluster. It specifies the priority for a job . This profile is usually set in the Abstract WF for each job. Negative values are allowed for priorities.
Property Key: pegasus.pmc_task_arguments Profile Key: pmc_task_arguments Scope : TC, SC, Abstract WF, Properties Since : 4.2 Type :String	The key is used to pass any extra arguments to the PMC task during the planning time. They are added to the very end of the argument string constructed for the task in the PMC file. Hence, allows for overriding of any argument constructed by the planner for any particular task in the PMC job.
Property Key: pegasus.exitcode.failuremsg Profile Key: exitcode.failuremsg Scope : TC, SC, Abstract WF, Properties Since : 4.4 Type :String	The message string that pegasus-exitcode searches for in the stdout and stderr of the job to flag failures.
Property Key: pegasus.exitcode.successmsg Profile Key: exitcode.successmsg Scope : TC, SC, Abstract WF, Properties Since : 4.4 Type :String	The message string that pegasus-exitcode searches for in the stdout and stderr of the job to determine whether a job logged it’s success message or not. Note this value is used to check for whether a job failed or not i.e if this profile is specified, and pegasus-exitcode DOES NOT find the string in the job stdout or stderr, the job is flagged as failed. The complete rules for determining failure are described in the man page for pegasus-exitcode.
Property Key: pegasus.checkpoint.time Profile Key: checkpoint.time Scope : TC, SC, Abstract WF, Properties Since : 4.5 Type : Integer	the expected time in minutes for a job after which it should be sent a TERM signal to generate a job checkpoint file
Property Key: pegasus.maxwalltime Profile Key: maxwalltime Scope : TC, SC, Abstract WF, Properties Since : 4.5 Type :Integer	the maximum walltime in minutes for a single execution of a job.
Property Key: pegasus.glite.arguments Profile Key: glite.arguments Scope : TC, SC, Abstract WF, Properties Since : 4.5 Type :String	specifies the extra arguments that must appear in the local PBS or LRMS generated script for a job, when running workflows on a local cluster with submissions through Glite. This is useful when you want to pass through special options to underlying LRMS such as PBS e.g. you can set value -l walltime=01:23:45 -l nodes=2 to specify your job’s resource requirements.
Profile Key: auxillary.local Scope : SC Since : 4.6 Type :Boolean	indicates whether auxillary jobs associated with a compute site X, can be run on local site. This CAN ONLY be specified as a profile in the site catalog and should be set when the compute site filesystem is accessible locally on the submit host.
Property Key: pegasus.condor.arguments.quote Profile Key: condor.arguments.quote Scope : SC, Properties Since : 4.6 Type :Boolean	indicates whether condor quoting rules should be applied for writing out the arguments key in the condor submit file. By default it is true unless the job is schedule to a glite style site. The value is automatically set to false for glite style sites, as condor quoting is broken in batch_gahp.
Profile Key: container.arguments Scope : TC, Workflow Since : 5.0 Type :String	indicates additional arguments that will be appended to the docker container run/singularity exec commands when the container associated with this profile is executed
Profile Key: container.launcher Scope : TC, Workflow, Properties Since : 5.1 Type :String	A wrapper around docker/singularity execution. Applies only to containerized jobs. Useful, when the the user executable has to be launched via srun for example.
Profile Key: container.launcher.arguments Scope : TC, Workflow, Properties Since : 5.1 Type :String	indicates additional arguments that will be appended to container wrapper. Comes into play only if a job has container.launcher profile associated with it.
Profile Key: pegasus_lite_env_source Scope : Site Catalog Since : 5.0 Type :String	indicates a path to a setup script residing on the submit host that needs to be sourced in PegasusLite when running the job. This profile should be associated with local site in the Site Catalog. You can also specify an environment profile named PEGASUS_LITE_ENV_SOURCE with the compute site in the Site Catalog to indicate a setup script that already exists on the compute nodes, but just needs to be sourced when executing the job.
Profile Key: relative.submit.dir Scope : TC, Workflow Since : 5.0.1 Type :String See Also : pegasus.dir.submit.mapper	indicates the relative submit directory for the job, if the Named Submit Mapper is enabled.
Profile Key: nosymlink Scope : TC, SC, Abstract WF Since : 5.0.3 Type : Boolean	This profile allows you to turn off symlinking for a job. It only comes into play when symlinking for the workflow is turned on by setting the property pegasus.transfer.links to true

2.2.9.1. Task Resource Requirements Profiles

Startng Pegasus 4.6.0 Release, users can specify pegasus profiles to describe resources requirements for their job. The planner will automatically translate them to appropriate execution environment specific directives. For example, the profiles are automatically translated to Globus RSL keys if submitting job via CondorG to remote GRAM instances, Condor Classad keys when running in a vanilla condor pool and to appropriate shell variables for Glite that can be picked up by the local attributes.sh. The profiles are described below.

Task Resource Requirement Profiles.
Property Key	Description
Property Key: pegasus.runtime Profile Key: runtime Scope : TC, SC, Abstract WF, Properties Since : 2.0 Type : Long	This profile specifies the expected runtime of a job in seconds. Refer to the Pegasus Clustering Guide for description on using it for runtime clustering.
Property Key: pegasus.clusters.maxruntime Profile Key: clusters.maxruntime Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type : Integer	Please refer to the Pegasus Clustering Guide for detailed description. This profile specifies the maximum runtime of a job in seconds.
Property Key: pegasus.cores Profile Key:cores Scope : TC, SC, Abstract WF, Properties Since : 4.0 Type : Integer	The total number of cores, required for a job. This is also used for accounting purposes in the database while generating statistics. It corresponds to the multiplier_factor in the job_instance table described here .
Property Key: pegasus.gpus Profile Key:gpus Scope : TC, SC, Abstract WF, Properties Since : 5.0 Type : Integer	The total number of gpus, required for a job.
Property Key: pegasus.nodes Profile Key: nodes Scope : TC, SC, Abstract WF, Properties Since : 4.6 Type : Integer	Indicates the the number of nodes a job requires.
Property Key: pegasus.ppn Profile Key: ppn Scope : TC, SC, Abstract WF, Properties Since : 4.6 Type : Integer	Indicates the number of processors per node . This profile is best set in the Site Catalog and usually set when running workflows with MPI jobs.
Property Key: pegasus.memory Profile Key: memory Scope : TC, SC, Abstract WF, Properties Since : 4.6 Type : Long	Indicates the maximum memory a job requires in MB.
Property Key: pegasus.diskspace Profile Key: diskspace Scope : TC, SC, Abstract WF, Properties Since : 4.6 Type : Long	Indicates the maximum diskspace a job requires in MB.

The automatic translation to various execution environment specific directives is explained below. It is important, to note that execution environment specific keys take precedence over the Pegasus profile keys. For example, Globus profile key maxruntime will be preferred over Pegasus profile key runtime when running jobs via HTCondorG.

Table mapping translation of Pegasus Task Requirements to corresponding execution environment keys.
Pegasus Task Resource Requirement Profile Key	Corresponding Globus RSL Key	Corresponding Condor Classad Key	KEY in +remote_cerequirements classad for GLITE
runtime	maxruntime		WALLTIME
cores	count	request_cpus	CORES
nodes	hostcount		NODES
ppn	xcount		PROCS
memory	maxmemory	request_memory	PER_PROCESS_MEMORY
diskspace		request_diskspace

2.2.10. The Hints Profile Namespace

The hints namespace is now deprecated and has been replaced by the selector namespace. If you have any hints profiles in your configuration, please change their namespace value to selector instead. The support for hints namespace will be dropped in a future release.

2.2.11. The Selector Profile Namespace

The selector namespace allows users to override the behavior of the Workflow Mapper during site selection. This gives you finer grained control over where a job executes and what executable it refers to. The hints namespace keys ( execution.site and pfn ) can only be specified in the input abstract workflow. It is important to note that these particular keys once specified in the workflow, cannot be overridden like other profiles.

Useful Selector Profile Keys
Key Attributes	Description
Property Key: N/A Profile Key: execution.site Scope : Abstract WF Since : 4.5 Type : String	the execution site where a job should be executed.
Property Key: N/A Profile Key: pfn Scope : TC, SC, Abstract WF, Properties Since : 4.5 Type : String	the physical file name to the main executable that a job refers to. Overrides any entries specified in the transformation catalog.
Property Key: hints.grid.jobtype Profile Key:grid.jobtype Scope : TC, SC, Abstract WF, Properties Since : 4.5 Type : String	applicable when submitting to remote sites via GRAM. The site catalog allows you to associate multiple jobmanagers with a GRAM site, for different type of jobs [compute, auxillary, transfer, register, cleanup ] that Pegasus generates in the executable workflow. This profile is usually used to ensure that a compute job executes on another job manager. For example, if in site catalog you have headnode.example.com/jobmanager-condor for compute jobs, and headnode.example.com/jobmanager-fork for auxillary jobs. Associating this profile and setting value to auxillary for a compute job, will cause the compute job to run on the fork jobmanager instead of the condor jobmanager.

2.3. Properties

Properties are primarily used to configure the behavior of the Pegasus Workflow Planner at a global level. The properties file is actually a java properties file and follows the same conventions as that to specify the properties.

Please note that the values rely on proper capitalization, unless explicitly noted otherwise.

Some properties rely with their default on the value of other properties. As a notation, the curly braces refer to the value of the named property. For instance, ${pegasus.home} means that the value depends on the value of the pegasus.home property plus any noted additions. You can use this notation to refer to other properties, though the extent of the subsitutions are limited. Usually, you want to refer to a set of the standard system properties. Nesting is not allowed. Substitutions will only be done once.

There is a priority to the order of reading and evaluating properties. Usually one does not need to worry about the priorities. However, it is good to know the details of when which property applies, and how one property is able to overwrite another. The following is a mutually exclusive list ( highest priority first ) of property file locations.

–conf option to the tools. Almost all of the clients that use properties have a –conf option to specify the property file to pick up.
submit-dir/pegasus.xxxxxxx.properties file. All tools that work on the submit directory ( i.e after pegasus has planned a workflow) pick up the pegasus.xxxxx.properties file from the submit directory. The location for the pegasus.xxxxxxx.propertiesis picked up from the braindump file.
The properties defined in the user property file ${user.home}/.pegasusrc have lowest priority.

Starting Pegasus 5.0 release, pegasus properties can also be specified as environment variables. The properties specified by an environment variable have higher precedence than those specified in a properties file.

To specify a pegasus property as an environment variable you need to do the following:

Convert your property name to upper case
Replace . with __ .
Add a leading _ to the property name.

For example, to specify pegasus.catalog.replica in your environment you will specify

_PEGASUS__CATALOG__REPLICA__FILE = /path/to/replicas.yml

Commandline properties have the highest priority. These override any property loaded from a property file. Each commandline property is introduced by a -D argument. Note that these arguments are parsed by the shell wrapper, and thus the -D arguments must be the first arguments to any command. Commandline properties are useful for debugging purposes.

From Pegasus 3.1 release onwards, support has been dropped for the following properties that were used to signify the location of the properties file

pegasus.properties
pegasus.user.properties

The basic properties that you may need to be set if using non default

types and locations are for various catalogs are listed below:

Basic Properties that you may need to set
pegasus.catalog.replica	type of replica catalog backend
pegasus.catalog.replica.file	path to replica catalog file
pegasus.catalog.transformation	type of transformation catalog
pegasus.catalog.transformation.file	path to transformation file
pegasus.catalog.site.file	path to site catalog file
pegasus.data.configuration	the data configuration mode for data staging.

If you are in doubt which properties are actually visible, pegasus during the planning of the workflow dumps all properties after reading and prioritizing in the submit directory in a file with the suffix properties.

2.3.1. Local Directories Properties

This section describes the GNU directory structure conventions. GNU distinguishes between architecture independent and thus sharable directories, and directories with data specific to a platform, and thus often local. It also distinguishes between frequently modified data and rarely changing data. These two axis form a space of four distinct directories.

Local Directories Related Properties
Key Attributes	Description
Property Key: pegasus.home.datadir Profile Key: N/A Scope : Properties Since : 2.0 Type : file path Default : ${pegasus.home}/share	The datadir directory contains broadly visible and possibly exported configuration files that rarely change. This directory is currently unused.
Property Key: pegasus.home.sysconfdir Profile Key: N/A Scope : Properties Since : 2.0 Type :file path Default : ${pegasus.home}/etc	The system configuration directory contains configuration files that are specific to the machine or installation, and that rarely change. This is the directory where the YAML/XML schema definition copies are stored, and where the base site configuration file is stored.
Property Key: pegasus.home.sharedstatedir Profile Key: N/A Scope : Properties Since : 2.0 Type :file path Default : ${pegasus.home}/com	Frequently changing files that are broadly visible are stored in the shared state directory. This is currently unused.
Property Key: pegasus.home.localstatedir Profile Key: N/A Scope : Properties Since : 2.0 Type :file path Default : ${pegasus.home}/var	Frequently changing files that are specific to a machine and/or installation are stored in the local state directory. This is currently unused
Property Key: pegasus.dir.submit.logs Profile Key: N/A Scope : Properties Since : 2.0 Type :file path Default : (no default)	This property can be used to specify the directory where the condor logs for the workflow should go to. By default, starting 4.2.1 release, Pegasus will setup the log to be in the workflow submit directory. This can create problems, in case users submit directories are on NSF. This is done to ensure that the logs are created in a local directory even though the submit directory maybe on NFS.

2.3.2. Site Directories Properties

The site directory properties modify the behavior of remotely run jobs. In rare occasions, it may also pertain to locally run compute jobs.

Site Directories Related Properties
Key Attributes	Description
Property Key: pegasus.dir.useTimestamp Profile Key: N/A Scope : Properties Since : 2.1 Type :Boolean Default : false	While creating the submit directory, Pegasus employs a run numbering scheme. Users can use this Boolean property to use a timestamp based numbering scheme instead of the runxxxx scheme.
Property Key: pegasus.dir.exec Profile Key: N/A Scope : Properties Since : 2.0 Type : file path Default : (no default)	This property modifies the remote location work directory in which all your jobs will run. If the path is relative then it is appended to the work directory (associated with the site), as specified in the site catalog. If the path is absolute then it overrides the work directory specified in the site catalog.
Property Key:pegasus.dir.submit.mapper Profile Key:N/A Scope : Properties Since : 4.7 Type : Enumeration Values : Flat\|Hashed Default : Hashed	This property modifies determines how the directory for job submit files are mapped on the submit host. - Flat: This mapper results in Pegasus placing all the job submit files in the submit directory as determined from the planner options. This can result in too many files in one directory for large workflows, and was the only option before Pegasus 4.7.0 release. - Named: This mapper results in the creation of a deep directory structure rooted at the submit directory. The relative directory for a compute job is determined based on a pegasus profile named relative.submit.dir associated with the job If the profile is not associated then the relative directory is derived from the logical transformation name associated with the job. Auxillary files are placed in the base submit directory for the worklfow. To control behavior of this mapper, users can specify the following pegasus profiles relative.submit.dir - relative submit dir to be used fo the job. - Hashed: This mapper results in the creation of a deep directory structure rooted at the submit directory. The base directory is the submit directory as determined from the planner options. By default, the directory structure created is two levels deep. To control behavior of this mapper, users can specify the following properties pegasus.dir.submit.mapper.hashed.levels - the number of directory levels used to accomodate the files. Defaults to 2. pegasus.dir.submit.mapper.hashed.multiplier - the number of files associated with a job in the submit directory. Defaults to 5.
Property Key: pegasus.dir.staging.mapper Profile Key: N/A Scope : Properties Since : 4.7 Type :Enumeration Values : Flat\|Hashed Default : Hashed	This property modifies determines how the job input and output files are mapped on the staging site. This only applies when the pegasus data configuration is set to nonsharedfs. - Flat: This mapper results in Pegasus placing all the job submit files in the staging site directory as determined from the Site Catalog and planner options. This can result in too many files in one directory for large workflows, and was the only option before Pegasus 4.7.0 release. - Hashed: This mapper results in the creation of a deep directory structure rooted at the staging site directory created by the create dir jobs. The binning is at the job level,and not at the file level i.e each job will push out it’s outputs to the same directory on the staging site, independent of the number of output files. To control behavior of this mapper, users can specify the following properties pegasus.dir.staging.mapper.hashed.levels - the number of directory levels used to accomodate the files. Defaults to 2. pegasus.dir.staging.mapper.hashed.multiplier- the number of files associated with a job in the submit directory. Defaults to 5.
Property Key: pegasus.dir.storage.mapper Profile Key : N/A Scope : Properties Since : 4.3 Type :Enumeration Values : Flat\|Fixed\|Hashed\|Replica Default : Flat	This property modifies determines how the output files are mapped on the output site storage location. In order to preserve backward compatibility, setting the boolean property pegasus.dir.storage.deep results in the Hashed output mapper to be loaded, if no output mapper property is specified. - Flat: By default, Pegasus will place the output files in the storage directory specified in the site catalog for the output site. - Fixed: Using this mapper, users can specify an externally accesible url to the storage directory in their properties file. The following property needs to be set. pegasus.dir.storage.mapper.fixed.url - an externally accessible URL to the storage directory on the output site e.g. gsiftp://outputs.isi.edu/shared/outputs Note: For hierarchal workflows, the above property needs to be set separately for each pegasusWorkflow/dax job, if you want the sub workflow outputs to goto a different directory. - Hashed: This mapper results in the creation of a deep directory structure on the output site, while populating the results. The base directory on the remote end is determined from the site catalog. Depending on the number of files being staged to the remote site a Hashed File Structure is created that ensures that only 256 files reside in one directory. To create this directory structure on the storage site, Pegasus relies on the directory creation feature of the Grid FTP server, which appeared in globus 4.0.x - Replica: This mapper determines the path for an output file on the output site by querying an output replica catalog. The output site is one that is passed on the command line. The output replica catalog can be configured by specifiing the properties with the prefix pegasus.dir.storage.replica. By default, a Regex File based backend is assumed unless overridden. For example pegasus.dir.storage.mapper.replica Regex\|File pegasus.dir.storage.mapper.replica.file the RC file at the backend to use if using a file based RC
Property Key: pegasus.dir.storage.deep Profile Key: N/A Scope : Properties Since : 2.1 Type : Boolean Default : false	This Boolean property results in the creation of a deep directory structure on the output site, while populating the results. The base directory on the remote end is determined from the site catalog. To this base directory, the relative submit directory structure ( $user/$vogroup/$label/runxxxx ) is appended. $storage = $base + $relative_submit_directory This is the base directory that is passed to the storage mapper. Note: To preserve backward compatibilty, setting this property results in the Hashed mapper to be loaded unless pegasus.dir.storage.mapper is explicitly specified. Before 4.3, this property resulted in HashedDirectory structure.
Property Key: pegasus.dir.create.strategy Profile Key: N/A Scope : Properties Since : 2.2 Type :Enumeration Values : HourGlass\|Tentacles\|Minimal Default : Minimal	If the –randomdir option is given to the Planner at runtime, the Pegasus planner adds nodes that create the random directories at the remote sites, before any jobs are actually run. The two modes determine the placement of these nodes and their dependencies to the rest of the graph. - HourGlass: It adds a make directory node at the top level of the graph, and all these concat to a single dummy job before branching out to the root nodes of the original/ concrete dag so far. So we introduce a classic X shape at the top of the graph. Hence the name HourGlass. - Tentacles: This option places the jobs creating directories at the top of the graph. However instead of constricting it to an hour glass shape, this mode links the top node to all the relevant nodes for which the create dir job is necessary. It looks as if the node spreads its tentacleas all around. This puts more load on the DAGMan because of the added dependencies but removes the restriction of the plan progressing only when all the create directory jobs have progressed on the remote sites, as is the case in the HourGlass model. - Minimal: The strategy involves in walking the graph in a BFS order, and updating a bit set associated with each job based on the BitSet of the parent jobs. The BitSet indicates whether an edge exists from the create dir job to an ancestor of the node. For a node, the bit set is the union of all the parents BitSets. The BFS traversal ensures that the bitsets are of a node are only updated once the parents have been processed.

2.3.3. Schema File Location Properties

This section defines the location of XML schema files that are used to parse the various XML document instances in the PEGASUS. The schema backups in the installed file-system permit PEGASUS operations without being online.

Schema File Location Properties
Key Attributes	Description
Property Key: pegasus.schema.dax Profile Key : N/A Scope : Properties Since : 2.0 Type :file path Default : ${pegasus.home.sysconfdir}/dax-3.4.xsd	This file is a copy of the XML schema that describes abstract DAG files that are the result of the abstract planning process, and input into any concrete planning. Providing a copy of the schema enables the parser to use the local copy instead of reaching out to the Internet, and obtaining the latest version from the Pegasus website dynamically.
Property Key:pegasus.schema.sc Profile Key:N/A Scope : Properties Since : 2.0 Type :file path Default : ${pegasus.home.sysconfdir}/sc-4.0.xsd	This file is a copy of the XML schema that describes the xml description of the site catalog. Providing a copy of the schema enables the parser to use the local copy instead of reaching out to the internet, and obtaining the latest version from the GriPhyN website dynamically.

2.3.4. Database Drivers For All Relational Catalogs

Database Driver Properties
Property Key	Description
Property Key: pegasus.catalog.*.db.driver Profile Key: N/A Scope : Properties Since : 2.0 Type : Enumeration Values :MySQL\|PostGres\|SQLiteDefault : (no default)	The database driver class is dynamically loaded, as required by the schema. Currently, only MySQL 5.x, PostGreSQL >= 8.1 and SQlite are supported. Their respective JDBC3 driver is provided as part and parcel of the PEGASUS. The * in the property name can be replaced by a catalog name to apply the property only for that catalog. Valid catalog names are replica.
Property Key:pegasus.catalog.*.db.url Profile Key:N/A Scope : Properties Since : 2.0 Type :Database URL Default : (no default)	Each database has its own string to contact the database on a given host, port, and database. Although most driver URLs allow to pass arbitrary arguments, please use the pegasus.catalog.[catalog-name].db.* keys or pegasus.catalog..db. to preload these arguments. THE URL IS A MANDATORY PROPERTY FOR ANY DBMS BACKEND.
Property Key:pegasus.catalog.*.db.user Profile Key:N/A Scope : Properties Since : 2.0 Type :String Default : (no default)	In order to access a database, you must provide the name of your account on the DBMS. This property is database-independent. THIS IS A MANDATORY PROPERTY FOR MANY DBMS BACKENDS. The * in the property name can be replaced by a catalog name to apply the property only for that catalog. Valid catalog names are replica
Property Key: pegasus.catalog.*.db.password Profile Key: N/A Scope : Properties Since : 2.0 Type :String Default : (no default)	In order to access a database, you must provide an optional password of your account on the DBMS. This property is database-independent. THIS IS A MANDATORY PROPERTY, IF YOUR DBMS BACKEND ACCOUNT REQUIRES A PASSWORD. The * in the property name can be replaced by a catalog name to apply the property only for that catalog. Valid catalog names are replica.
Property Key: pegasus.catalog..db. Profile Key: N/A Scope : Properties Since : 2.0 Type : String Default : (no default)	Each database has a multitude of options to control in fine detail the further behaviour. You may want to check the JDBC3 documentation of the JDBC driver for your database for details. The keys will be passed as part of the connect properties by stripping the “pegasus.catalog.[catalog-name].db.” prefix from them. The catalog-name can be replaced by the following values replica for Replica Catalog (RC) - Postgres >= 8.1 parses the following properties: pegasus.catalog..db.user pegasus.catalog..db.password pegasus.catalog..db.PGHOST pegasus.catalog..db.PGPORT pegasus.catalog..db.charSet pegasus.catalog..db.compatible - MySQL 5.0 parses the following properties: pegasus.catalog..db.user pegasus.catalog..db.password pegasus.catalog..db.databaseName pegasus.catalog..db.serverName pegasus.catalog..db.portNumber pegasus.catalog..db.socketFactory pegasus.catalog..db.strictUpdates pegasus.catalog..db.ignoreNonTxTables pegasus.catalog..db.secondsBeforeRetryMaster pegasus.catalog..db.queriesBeforeRetryMaster pegasus.catalog..db.allowLoadLocalInfile pegasus.catalog..db.continueBatchOnError pegasus.catalog..db.pedantic pegasus.catalog..db.useStreamLengthsInPrepStmts pegasus.catalog..db.useTimezone pegasus.catalog..db.relaxAutoCommit pegasus.catalog..db.paranoid pegasus.catalog..db.autoReconnect pegasus.catalog..db.capitalizeTypeNames pegasus.catalog..db.ultraDevHack pegasus.catalog..db.strictFloatingPoint pegasus.catalog..db.useSSL pegasus.catalog..db.useCompression pegasus.catalog..db.socketTimeout pegasus.catalog..db.maxReconnects pegasus.catalog..db.initialTimeout pegasus.catalog..db.maxRows pegasus.catalog..db.useHostsInPrivileges pegasus.catalog..db.interactiveClient pegasus.catalog..db.useUnicode pegasus.catalog..db.characterEncoding The in the property name can be replaced by a catalog name to apply the property only for that catalog. Valid catalog names are replica.
Property Key: pegasus.catalog.*.timeout Profile Key: N/A Scope : Properties Since : 4.5.1 Type :Integer Default : (no default)	This property sets a busy handler that sleeps for a specified amount of time (in seconds) when a table is locked. This property has effect only in a sqlite database. The * in the property name can be replaced by a catalog name to apply the property only for that catalog. Valid catalog names are - master - workflow

2.3.5. Catalog Related Properties

Replica Catalog Properties
Key Attributes	Description
Property Key: pegasus.catalog.replica Profile Key: N/A Scope : Properties Since : 2.0 Default : File	Pegasus queries a Replica Catalog to discover the physical filenames (PFN) for input files specified in the Abstract Workflow. Pegasus can interface with various types of Replica Catalogs. This property specifies which type of Replica Catalog to use during the planning process. JDBCRC In this mode, Pegasus queries a SQL based replica catalog that is accessed via JDBC. To use JDBCRC, the user additionally needs to set the following properties pegasus.catalog.replica.db.driver = mysql \| postgres\|sqlite pegasus.catalog.replica.db.url = <jdbc url to the database> e.g jdbc:mysql://database-host.isi.edu/database-name \| jdbc:sqlite:/shared/jdbcrc.db pegasus.catalog.replica.db.user = database-user pegasus.catalog.replica.db.password = database-password File In this mode, Pegasus queries a file based replica catalog. It is neither transactionally safe, nor advised to use for production purposes in any way. Multiple concurrent instances will clobber each other!. The site attribute should be specified whenever possible. The attribute key for the site attribute is “site”. The LFN may or may not be quoted. If it contains linear whitespace, quotes, backslash or an equality sign, it must be quoted and escaped. Ditto for the PFN. The attribute key-value pairs are separated by an equality sign without any whitespaces. The value may be in quoted. The LFN sentiments about quoting apply. LFN PFN LFN PFN a=b [..] LFN PFN a="b" [..] "LFN w/LWS" "PFN w/LWS" [..] To use File, the user additionally needs to specify pegasus.catalog.replica.file property to specify the path to the file based RC. IF not specified , defaults to $PWD/rc.txt file. YAML This is the new YAML based file format introduced in Pegasus 5.0. The format does support regular expressions similar to Regex catalog type. To specify regular expressions you need to associate an attribute named regex and set to true. To use YAML, the user additionally needs to specify pegasus.catalog.replica.file property to specify the path to the file based RC. IF not specified , defaults to $PWD/replicas.yml file. Regex In this mode, Pegasus queries a file based replica catalog. It is neither transactionally safe, nor advised to use for production purposes in any way. Multiple concurrent access to the File will end up clobbering the contents of the file. The site attribute should be specified whenever possible. The attribute key for the site attribute is “site”. The LFN may or may not be quoted. If it contains linear whitespace, quotes, backslash or an equality sign, it must be quoted and escaped. Ditto for the PFN. The attribute key-value pairs are separated by an equality sign without any whitespaces. The value may be in quoted. The LFN sentiments about quoting apply. In addition users can specifiy regular expression based LFN’s. A regular expression based entry should be qualified with an attribute named ‘regex’. The attribute regex when set to true identifies the catalog entry as a regular expression based entry. Regular expressions should follow Java regular expression syntax. For example, consider a replica catalog as shown below. Entry 1 refers to an entry which does not use a regular expressions. This entry would only match a file named ‘f.a’, and nothing else. Entry 2 referes to an entry which uses a regular expression. In this entry f.a refers to files having name as f[any-character]a i.e. faa, f.a, f0a, etc. f.a file:///Vol/input/f.a site="local" f.a file:///Vol/input/f.a site="local" regex="true" Regular expression based entries also support substitutions. For example, consider the regular expression based entry shown below. Entry 3 will match files with name alpha.csv, alpha.txt, alpha.xml. In addition, values matched in the expression can be used to generate a PFN. For the entry below if the file being looked up is alpha.csv, the PFN for the file would be generated as file:///Volumes/data/input/csv/alpha.csv. Similary if the file being lookedup was alpha.csv, the PFN for the file would be generated as file:///Volumes/data/input/xml/alpha.xml i.e. The section [0], [1] will be replaced. Section [0] refers to the entire string i.e. alpha.csv. Section [1] refers to a partial match in the input i.e. csv, or txt, or xml. Users can utilize as many sections as they wish. alpha\.(csv\|txt\|xml) file:///Vol/input/[1]/[0] site="local" regex="true" To use File, the user additionally needs to specify pegasus.catalog.replica.file property to specify the path to the file based RC. Directory In this mode, Pegasus does a directory listing on an input directory to create the LFN to PFN mappings. The directory listing is performed recursively, resulting in deep LFN mappings. For example, if an input directory $input is specified with the following structure $input $input/f.1 $input/f.2 $input/D1 $input/D1/f.3 Pegasus will create the mappings the following LFN PFN mappings internally f.1 file://$input/f.1 site="local" f.2 file://$input/f.2 site="local" D1/f.3 file://$input/D2/f.3 site="local" If you don’t want the deep lfn’s to be created then, you can set pegasus.catalog.replica.directory.flat.lfn to true In that case, for the previous example, Pegasus will create the following LFN PFN mappings internally. f.1 file://$input/f.1 site="local" f.2 file://$input/f.2 site="local" D1/f.3 file://$input/D2/f.3 site="local" pegasus-plan has –input-dir option that can be used to specify an input directory. Users can optionally specify additional properties to configure the behvavior of this implementation. - pegasus.catalog.replica.directory to specify the path to the directory containing the files - pegasus.catalog.replica.directory.site to specify a site attribute other than local to associate with the mappings. - pegasus.catalog.replica.directory.url.prefix to associate a URL prefix for the PFN’s constructed. If not specified, the URL defaults to file:// MRC In this mode, Pegasus queries multiple replica catalogs to discover the file locations on the grid. To use it set pegasus.catalog.replica MRC Each associated replica catalog can be configured via properties as follows. The user associates a variable name referred to as [value] for each of the catalogs, where [value] is any legal identifier (concretely [A-Za-z][_A-Za-z0-9]*) . For each associated replica catalogs the user specifies the following properties. pegasus.catalog.replica.mrc.[value] specifies the type of replica catalog. pegasus.catalog.replica.mrc.[value].key specifies a property name key for a particular catalog pegasus.catalog.replica.mrc.directory1 Directory pegasus.catalog.replica.mrc.directory1.directory /input/dir1 pegasus.catalog.replica.mrc.directory1.directory.site siteX pegasus.catalog.replica.mrc.directory2 Directory pegasus.catalog.replica.mrc.directory2.directory /input/dir2 pegasus.catalog.replica.mrc.directory1.directory.site siteY\| In the above example, directory1, directory2 are any valid identifier names and url is the property key that needed to be specified.
Property Key: pegasus.catalog.replica.file. Profile Key: N/A Scope : Properties Since : 2.0 Default : 1000	The path to a file based replica catalog backend
Property Key: pegasus.catalog.replica.chunk.size Profile Key: N/A Scope : Properties Since : 2.0 Default : 1000	The pegasus-rc-client takes in an input file containing the mappings upon which to work. This property determines, the number of lines that are read in at a time, and worked upon at together. This allows the various operations like insert, delete happen in bulk if the underlying replica implementation supports it.
Property Key: pegasus.catalog.replica.cache.asrc Profile Key : N/A Scope : Properties Since : 2.0 Default : false	This Boolean property determines whether to treat the cachefile specified as a supplemental replica catalog or not. User can specify on the command line to pegasus-plan a comma separated list of cache files using the –cache option. By default, the LFN->PFN mappings contained in the cache file are treated as cache, i.e if an entry is found in a cache file the replica catalog is not queried. This results in only the entry specified in the cache file to be available for replica selection. Setting this property to true, results in the cache files to be treated as supplemental replica catalogs. This results in the mappings found in the replica catalog (as specified by pegasus.catalog.replica) to be merged with the ones found in the cache files. Thus, mappings for a particular LFN found in both the cache and the replica catalog are available for replica selection.
Property Key: pegasus.catalog.replica.dax.asrc Profile Key :N/A Scope : Properties Since : 4.5.2 Default : false	This Boolean property determines whether to treat the locations of files recorded in the DAX as a supplemental replica catalog or not. By default, the LFN->PFN mappings contained in the DAX file overrides any specified in a replica catalog. This results in only the entry specified in the DAX file to be available for replica selection. Setting this property to true, results in the locations of files recorded in the DAX files to be treated as a supplemental replica catalog. This results in the mappings found in the replica catalog (as specified by pegasus.catalog.replica) to be merged with the ones found in the cache files. Thus, mappings for a particular LFN found in both the Abstract Workflow/DAX and the replica catalog are available for replica selection.
Property Key: pegasus.catalog.replica.output.* Profile Key : N/A Scope : Properties Since : 4.5.3 Default : None	Normally, the registration jobs in the executable workflow register to the replica catalog specified by the user in the properties file . This property prefix allows the user to specify a separate output replica catalog that is different from the one used for discovery of input files. This is normally the case, when a Directory or MRC based replica catalog backend that don’t support insertion of entries are used for discovery of input files. For example to specify a separate file based output replica catalog, specify pegasus.catalog.replica.output File pegasus.catalog.replica.output.file /workflow/output.rc

Site Catalog Properties
Key Attributes	Description
Property Key: pegasus.catalog.site Profile Key: N/A Scope : Properties Type : Enumeration Values : YAML\|XML Since : 2.0 Default : YAML	Pegasus supports two different types of site catalogs in YAML or XML formats Pegasus is able to auto-detect what schema a user site catalog refers to. Hence, this property may no longer be set.
Property Key: pegasus.catalog.site.file Profile Key : N/A Scope : Properties Since : 2.0 Default : $PWD/sites.yml \| $PWD/sites.xml	The path to the site catalog file, that describes the various sites and their layouts to Pegasus.

Transformation Catalog Properties
Key Attributes	Description
Property Key: pegasus.catalog.transformation Profile Key: N/A Scope : Properties Since : 2.0 Type : Enumeration Values : YAML\|Directory\|Text Default : YAML	Pegasus supports two different types of site catalogs in YAML or Text formats Pegasus is able to auto-detect what schema a user site catalog refers to. Hence, this property may no longer be set. For Directory type, refer to the catalogs chapter in the ref:reference guide <tc-directory>.
Property Key: pegasus.catalog.transformation.file Profile Key : N/A Scope : Properties Since : 2.0 Default : $PWD/transformations.yml\|$PWD/tc.txt	The path to the transformation catalog file, that describes the locations of the executables.

2.3.6. Replica Selection Properties

Replica Selection Properties
Key Attributes	Description
Property Key: pegasus.selector.replica Profile Key: N/A Scope : Properties Since : 2.0 Type :String Default : Default See Also : pegasus.selector.replica..ignore.stagein.sites See Also : pegasus.selector.replica..prefer.stagein.sites	Each job in the DAX maybe associated with input LFN’s denoting the files that are required for the job to run. To determine the physical replica (PFN) for a LFN, Pegasus queries the replica catalog to get all the PFN’s (replicas) associated with a LFN. Pegasus then calls out to a replica selector to select a replica amongst the various replicas returned. This property determines the replica selector to use for selecting the replicas. Default The selector orders the various candidate replica’s according to the following rules 1. valid file URL’s . That is URL’s that have the site attribute matching the site where the executable pegasus-transfer is executed. 2. all URL’s from preferred site (usually the compute site) 3. all other remotely accessible ( non file) URL’s Regex This replica selector allows the user allows the user to specific regular expressions that can be used to rank various PFN’s returned from the Replica Catalog for a particular LFN. This replica selector orders the replicas based on the rank. Lower the rank higher the preference. The regular expressions are assigned different rank, that determine the order in which the expressions are employed. The rank values for the regex can expressed in user properties using the property. pegasus.selector.replica.regex.rank.[value] regex-expression The value is an integer value that denotes the rank of an expression with a rank value of 1 being the highest rank. Please note that before applying any regular expressions on the PFN’s, the file URL’s that dont match the preferred site are explicitly filtered out. Restricted This replica selector, allows the user to specify good sites and bad sites for staging in data to a particular compute site. A good site for a compute site X, is a preferred site from which replicas should be staged to site X. If there are more than one good sites having a particular replica, then a random site is selected amongst these preferred sites. A bad site for a compute site X, is a site from which replica’s should not be staged. The reason of not accessing replica from a bad site can vary from the link being down, to the user not having permissions on that site’s data. The good \| bad sites are specified by the properties pegasus.replica..prefer.stagein.sites pegasus.replica..ignore.stagein.sites where the * in the property name denotes the name of the compute site. A * in the property key is taken to mean all sites. The *pegasus.replica..prefer.stagein.sites property takes precedence over pegasus.replica..ignore.stagein.sites* property i.e. if for a site X, a site Y is specified both in the ignored and the preferred set, then site Y is taken to mean as only a preferred site for a site X. Local This replica selector prefers replicas from the local host and that start with a file: URL scheme. It is useful, when users want to stagin files to a remote site from your submit host using the Condor file transfer mechanism.
Property Key: pegasus.selector.replica..ignore.stagein.sites Profile Key: N/A Scope : Properties Since : 2.0 Default : (no default) See Also : pegasus.selector.replica See Also : pegasus.selector.replica..prefer.stagein.sites	A comma separated list of storage sites from which to never stage in data to a compute site. The property can apply to all or a single compute site, depending on how the * in the property name is expanded. The * in the property name means all compute sites unless replaced by a site name. For e.g setting pegasus.selector.replica.*.ignore.stagein.sites to usc means that ignore all replicas from site usc for staging in to any compute site. Setting pegasus.replica.isi.ignore.stagein.sites to usc means that ignore all replicas from site usc for staging in data to site isi.
Property Key: pegasus.selector.replica..prefer.stagein.sites Profile Key: N/A Scope : Properties Since : 2.0 Default : (no default) See Also : pegasus.selector.replica See Also : pegasus.selector.replica..ignore.stagein.sites	A comma separated list of preferred storage sites from which to stage in data to a compute site. The property can apply to all or a single compute site, depending on how the * in the property name is expanded. The * in the property name means all compute sites unless replaced by a site name. For e.g setting pegasus.selector.replica.*.prefer.stagein.sites to usc means that prefer all replicas from site usc for staging in to any compute site. Setting pegasus.replica.isi.prefer.stagein.sites to usc means that prefer all replicas from site usc for staging in data to site isi.
Property Key: pegasus.selector.replica.regex.rank.[value] Profile Key:N/A Scope : Properties Since : 2.3.0 Default : (no default) See Also : pegasus.selector.replica	Specifies the regex expressions to be applied on the PFNs returned for a particular LFN. Refer to on information of how to construct a regex expression. The [value] in the property key is to be replaced by an int value that designates the rank value for the regex expression to be applied in the Regex replica selector. The example below indicates preference for file URL’s over URL’s referring to gridftp server at example.isi.edu

2.3.7. Site Selection Properties

Site Selection Properties

Key Attributes

Description

Property Key: pegasus.selector.site
Profile Key: N/A
Scope : Properties
Since : 2.0

Type : String

Default : Random

See Also :

pegasus.selector.site.path

See Also :

pegasus.selector.site.timeout

See Also :

pegasus.selector.site.keep.tmp

See Also :

pegasus.selector.site.env.*

The site selection in Pegasus can be on basis of any of the

following strategies.

Random

In this mode, the jobs will be randomly distributed among

the sites that can execute them.

RoundRobin

In this mode. the jobs will be assigned in a round robin
manner amongst the sites that can execute them. Since
each site cannot execute everytype of job, the round
robin scheduling is done per level on a sorted list.
The sorting is on the basis of the number of jobs a
particular site has been assigned in that level so far.
If a job cannot be run on the first site in the queue
(due to no matching entry in the transformation catalog
for the transformation referred to by the job), it goes
to the next one and so on. This implementation defaults
to classic round robin in the case where all the jobs
in the workflow can run on all the sites.

NonJavaCallout

In this mode, Pegasus will callout to an external site
selector.In this mode a temporary file is prepared
containing the job information that is passed to the
site selector as an argument while invoking it. The
path to the site selector is specified by setting
the property pegasus.site.selector.path.
The environment variables that need to be set to run
the site selector can be specified using the properties
with a pegasus.site.selector.env. prefix. The temporary
file contains information about the job that needs to be
scheduled. It contains key value pairs with each key value
pair being on a new line and separated by a =.
The following pairs are currently generated for the site
selector temporary file that is generated in the NonJavaCallout.

Key	Description
version	is the version of the site selector api, currently 2.0.
transformation	is the fully-qualified definition identifier for the transformation (TR) namespace::name:version.
derivation	is the fully qualified definition identifier for the derivation (DV), namespace::name:version.
job.level	is the job’s depth in the tree of the workflow DAG.
job.id	is the job’s ID, as used in the DAX file.
resource.id	is a site handle, followed by whitespace, followed by a gridftp server. Typically, each gridftp server is enumerated once, so you may have multiple occurances of the same site. There can be multiple occurances of this key.
input.lfn	is an input LFN, optionally followed by a whitespace and file size. There can be multiple occurances of this key,one for each input LFN required by the job.
wf.name	label of the dax, as found in the DAX’s root element. wf.index is the DAX index, that is incremented for each partition in case of deferred planning.
wf.time	is the mtime of the workflow.
wf.manager	is the name of the workflow manager being used .e.g condor
vo.name	is the name of the virtual organization that is running this workflow. It is currently set to NONE
vo.group	unused at present and is set to NONE.

Group

In this mode, a group of jobs will be assigned to the same site
that can execute them. The use of the PEGASUS profile key
group in the abstract workflow, associates a job with a
particular group. The jobs that do not have the profile
key associated with them, will be put in the default group.
The jobs in the default group are handed over to the “Random”
Site Selector for scheduling.

Heft

In this mode, a version of the HEFT processor scheduling
algorithm is used to schedule jobs in the workflow to multiple
grid sites. The implementation assumes default data communication
costs when jobs are not scheduled on to the same site. Later on
this may be made more configurable.
The runtime for the jobs is specified in the transformation
catalog by associating the pegasus profile key runtime with the
entries.
The number of processors in a site is picked up from the
attribute idle-nodes associated with the vanilla jobmanager
of the site in the site catalog.

Property Key: pegasus.selector.site.path
Profile Key: N/A
Scope : Properties
Since : 2.0
Default : (no default)

If one calls out to an external site selector using the
NonJavaCallout mode, this refers to the path where the
site selector is installed. In case other strategies are
used it does not need to be set.

Property Key: pegasus.selector.site.env.*
Profile Key: N/A
Scope : Properties
Since : 2.0
Default : (no default)

The environment variables that need to be set while callout
to the site selector. These are the variables that the user
would set if running the site selector on the command line.
The name of the environment variable is got by stripping the
keys of the prefix “pegasus.site.selector.env.” prefix from
them. The value of the environment variable is the value of
the property.
e.g

pegasus.site.selector.path.LD_LIBRARY_PATH /globus/lib

would lead to the site selector being called with the

LD_LIBRARY_PATH set to /globus/lib.

Property Key: pegasus.selector.site.timeout
Profile Key:N/A
Scope : Properties
Since : 2.3.0
Default : 60
See Also : pegasus.selector.site

It sets the number of seconds Pegasus waits to hear back
from an external site selector using the NonJavaCallout
interface before timing out.

Property Key: pegasus.selector.site.keep.tmp
Profile Key:N/A
Scope : Properties
Since : 2.3.0
Values : onerror|always|never
Default : onerror
See Also : pegasus.selector.site

It determines whether Pegasus deletes the temporary input
files that are generated in the temp directory or not.
These temporary input files are passed as input to the
external site selectors.

2.3.8. Data Staging Configuration Properties

Data Configuration Properties
Key Attributes	Description
Property Key: pegasus.data.configuration Profile Key:data.configuration Scope : Properties, Site Catalog Since : 4.0.0 Values : sharedfs\|nonsharedfs\|condorio Default : condorio See Also : pegasus.transfer.bypass.input.staging	This property sets up Pegasus to run in different environments. For Pegasus 4.5.0 and above, users can set the pegasus profile data.configuration with the sites in their site catalog, to run multisite workflows with each site having a different data configuration. sharedfs If this is set, Pegasus will be setup to execute jobs on the shared filesystem on the execution site. This assumes, that the head node of a cluster and the worker nodes share a filesystem. The staging site in this case is the same as the execution site. Pegasus adds a create dir job to the executable workflow that creates a workflow specific directory on the shared filesystem . The data transfer jobs in the executable workflow ( stage_in_ , stage_inter_ , stage_out_ ) transfer the data to this directory.The compute jobs in the executable workflow are launched in the directory on the shared filesystem. condorio If this is set, Pegasus will be setup to run jobs in a pure condor pool, with the nodes not sharing a filesystem. Data is staged to the compute nodes from the submit host using Condor File IO. The planner is automatically setup to use the submit host ( site local ) as the staging site. All the auxillary jobs added by the planner to the executable workflow ( create dir, data stagein and stage-out, cleanup ) jobs refer to the workflow specific directory on the local site. The data transfer jobs in the executable workflow ( stage_in_ , stage_inter_ , stage_out_ ) transfer the data to this directory. When the compute jobs start, the input data for each job is shipped from the workflow specific directory on the submit host to compute/worker node using Condor file IO. The output data for each job is similarly shipped back to the submit host from the compute/worker node. This setup is particularly helpful when running workflows in the cloud environment where setting up a shared filesystem across the VM’s may be tricky. pegasus.gridstart PegasusLite pegasus.transfer.worker.package true nonsharedfs If this is set, Pegasus will be setup to execute jobs on an execution site without relying on a shared filesystem between the head node and the worker nodes. You can specify staging site ( using –staging-site option to pegasus-plan) to indicate the site to use as a central storage location for a workflow. The staging site is independant of the execution sites on which a workflow executes. All the auxillary jobs added by the planner to the executable workflow ( create dir, data stagein and stage-out, cleanup ) jobs refer to the workflow specific directory on the staging site. The data transfer jobs in the executable workflow ( stage_in_ , stage_inter_ , stage_out_ transfer the data to this directory. When the compute jobs start, the input data for each job is shipped from the workflow specific directory on the submit host to compute/worker node using pegasus-transfer. The output data for each job is similarly shipped back to the submit host from the compute/worker node. The protocols supported are at this time SRM, GridFTP, iRods, S3. This setup is particularly helpful when running workflows on OSG where most of the execution sites don’t have enough data storage. Only a few sites have large amounts of data storage exposed that can be used to place data during a workflow run. This setup is also helpful when running workflows in the cloud environment where setting up a shared filesystem across the VM’s may be tricky. On loading this property, internally the following properies are set pegasus.gridstart PegasusLite pegasus.transfer.worker.package true
Property Key: pegasus.transfer.bypass.input.staging Profile Key:N/A Scope : Properties Since : 4.3.0 Type :Boolean Default : false See Also : pegasus.data.configuration	When executing in a non shared filesystem setup i.e data configuration set to nonsharedfs or condorio, Pegasus always stages the input files through the staging site i.e the stage-in job stages in data from the input site to the staging site. The PegasusLite jobs that start up on the worker nodes, then pull the input data from the staging site for each job. This property can be used to setup the PegasusLite jobs to pull input data directly from the input site without going through the staging server. This is based on the assumption that the worker nodes can access the input site. If users set this to true, they should be aware that the access to the input site is no longer throttled ( as in case of stage in jobs). If large number of compute jobs start at the same time in a workflow, the input server will see a connection from each job.
Property Key: pegasus.transfer.container.onhost Profile Key:N/A Scope : Properties Since : 5.1.0 Type :Boolean Default : true See Also : pegasus.data.configuration	When a job is specified to run in an application container such as docker or singularity, Pegasus has two options in PegasusLite on how data transfers for the job occur. The transfers can happen either on the HOST OS before the container in which the job has to execute OR inside the application container, before the user code is invoked. This property can be used to control this behavior. Prior to Pegasus 5.1.0 release, the default value for this was false i.e. the data transfers happened inside the application container.

2.3.9. Transfer Configuration Properties

Transfer Configuration Properties

Key Attributes

Description

Property Key: pegasus.transfer.*.impl
Profile Key: N/A
Scope : Properties
Since : 2.0.0
Values : Transfer|GUC
Default : Transfer
See Also : pegasus.transfer.refiner

Each compute job usually has data products that are
required to be staged in to the execution site,
materialized data products staged out to a final resting
place, or staged to another job running at a different
site. This property determines the underlying grid
transfer tool that is used to manage the transfers.

The * in the property name can be replaced to achieve
finer grained control to dictate what type of transfer
jobs need to be managed with which grid transfer tool.
Usually,the arguments with which the client is invoked
can be specified by

the property pegasus.transfer.arguments
associating the PEGASUS profile key transfer.arguments

The table below illustrates all the possible variations

of the property.

Property Name	Applies to
pegasus.transfer.stagein.impl	the stage in transfer jobs
pegasus.transfer.stageout.impl	the stage out transfer jobs
pegasus.transfer.inter.impl	the inter site transfer jobs
pegasus.transfer.setup.impl	the setup transfer job
pegasus.transfer.*.impl	apply to types of transfer jobs

Note: Since version 2.2.0 the worker package is staged
automatically during staging of executables to the remote
site. This is achieved by adding a setup transfer job to
the workflow. The setup transfer job by default uses
pegasus-transfer to stage the data. The implementation
to use can be configured by setting the property

pegasus.transfer.setup.impl

The various grid transfer tools that can be used to
manage data transfers are explained below

Transfer: This results in pegasus-transfer to be used
for transferring of files. It is a python based wrapper
around various transfer clients like globus-url-copy,
lcg-copy, wget, cp, ln . pegasus-transfer looks at source
and destination url and figures out automatically which
underlying client to use. pegasus-transfer is distributed
with the PEGASUS and can be found at
$PEGASUS_HOME/bin/pegasus-transfer.
For remote sites, Pegasus constructs the default path to
pegasus-transfer on the basis of PEGASUS_HOME env profile
specified in the site catalog. To specify a different
path to the pegasus-transfer client , users can add an
entry into the transformation catalog with fully qualified
logical name as pegasus::pegasus-transfer

GUC: This refers to the new guc client that does
multiple file transfers per invocation. The
globus-url-copy client distributed with Globus 4.x
is compatible with this mode.

Property Key: pegasus.transfer.arguments
Profile Key:transfer.arguments
Scope : Properties
Since : 2.0.0
Type :String
Default : (no default)
See Also : pegasus.transfer.lite.arguments

This determines the extra arguments with which the transfer
implementation is invoked. The transfer executable that
is invoked is dependant upon the transfer mode that has
been selected. The property can be overloaded by
associated the pegasus profile key transfer.arguments
either with the site in the site catalog or the
corresponding transfer executable in the transformation
catalog.

Property Key: pegasus.transfer.threads
Profile Key: transfer.threads
Scope : Properties
Since : 4.4.0
Type :Integer
Default : 2

This property set the number of threads pegasus-transfer
uses to transfer the files. This property to applies to
the separate data transfer nodes that are added by Pegasus
to the executable workflow. The property can be overloaded
by associated the pegasus profile key transfer.threads
either with the site in the site catalog or the
corresponding transfer executable in the transformation
catalog.

Property Key: pegasus.transfer.lite.arguments
Profile Key: transfer.lite.arguments
Scope : Properties
Since : 4.4.0
Type :String
Default : (no default)
See Also : pegasus.transfer.arguments

This determines the extra arguments with which the
PegasusLite transfer implementation is invoked. The
transfer executable that is invoked is dependant upon the
PegasusLite transfer implementation that has been
selected.

Property Key: pegasus.transfer.worker.package
Profile Key: N/A
Scope : Properties
Since : 2.0.0
Type :Boolean
Default : false
See Also : pegasus.data.configuration

By default, Pegasus relies on the worker package to be
installed in a directory accessible to the worker nodes
on the remote sites . Pegasus uses the value of
PEGASUS_HOME environment profile in the site catalog
for the remote sites, to then construct paths to pegasus
auxillary executables like kickstart, pegasus-transfer,
seqexec etc.
If the Pegasus worker package is not installed on the
remote sites users can set this property to true to
get Pegasus to deploy worker package on the nodes.
In the case of sharedfs setup, the worker package is
deployed on the shared scratch directory for the workflow,
that is accessible to all the compute nodes of the
remote sites.
When running in nonsharefs environments, the worker
package is first brought to the submit directory and then
transferred to the worker node filesystem using Condor
file IO.

Property Key:
pegasus.transfer.worker.package.autodownload
Profile Key:N/A
Scope : Properties
Since : 4.6.1
Type :Boolean
Default : true
See Also : pegasus.transfer.worker.package

If PegasusLite does not find a worker package install
matching the pegasus lite job on the worker node, it
automatically downloads the correct worker package
from the Pegasus website. However, this can mask user
errors in configuration. This property can be set to
false to disable auto downloads.

Property Key:
pegasus.transfer.worker.package.strict
Profile Key: N/A
Scope : Properties
Since : 4.6.1
Type :Boolean
Default : true
See Also : pegasus.transfer.worker.package

In PegasusLite mode, the pegasus worker package for
the jobs is shipped along with the jobs. This property
controls whether PegasusLite will do a strict match
against the architecture and os on the local worker
node, along with pegasus version. If the strict match
fails, then PegasusLite will revert to the pegasus
website to download the correct worker package.

Property Key:pegasus.transfer.links
Profile Key: N/A
Scope : Properties
Since : 2.0.0
Type :Boolean
Default : false

If this is set, and the transfer implementation is
set to Transfer i.e. using the transfer executable
distributed with the PEGASUS. On setting this property,
if Pegasus while fetching data from the Replica Catalog
sees a “site” attribute associated with the PFN that
matches the execution site on which the data has to
be transferred to, Pegasus instead of the URL
returned by the Replica Catalog replaces it with a
file based URL. This is based on the assumption that
the if the “site” attributes match, the filesystems
are visible to the remote execution directory where
input data resides. On seeing both the source and
destination urls as file based URLs the transfer
executable spawns a job that creates a symbolic
link by calling ln -s on the remote site.

Property Key:
pegasus.transfer.*.remote.sites
Profile Key:N/A
Scope : Properties
Since : 2.0.0
Type :comma separated list of sites
Default : (no default)

By default Pegasus looks at the source and destination
URL’s for to determine whether the associated transfer
job runs on the submit host or the head node of a
remote site, with preference set to run a transfer job
to run on submit host.
Pegasus will run transfer jobs on the remote sites

if the file server for the compute site is a file server i.e url prefix file://
symlink jobs need to be added that require the symlink transfer jobs to be run remotely.

This property can be used to change the default behaviour
of Pegasus and force pegasus to run different types of
transfer jobs for the sites specified on the remote site.
The table below illustrates all the possible variations
of the property.

Property Name	Applies to
pegasus.transfer.stagein.remote.sites	the stage in transfer jobs
pegasus.transfer.stageout.remote.sites	the stage out transfer jobs
pegasus.transfer.inter.remote.sites	the inter site transfer jobs
pegasus.transfer.*.remote.sites	apply to types of transfer jobs

In addition * can be specified as a property value, to

designate that it applies to all sites.

Property Key:

pegasus.transfer.staging.delimiter

Profile Key: N/A

Scope : Properties

Since : 2.0.0

Type :String

Default : :

Pegasus supports executable staging as part of the
workflow. Currently staging of statically linked
executables is supported only. An executable is
normally staged to the work directory for the
workflow/partition on the remote site. The basename
of the staged executable is derived from the
namespace,name and version of the transformation
in the transformation catalog. This property sets
the delimiter that is used for the construction
of the name of the staged executable.

Property Key:
pegasus.transfer.disable.chmod.sites
Profile Key: N/A
Scope : Properties
Since : 2.0.0
Type :comma separated list of sites
Default : (no default)

During staging of executables to remote sites, chmod
jobs are added to the workflow. These jobs run on
the remote sites and do a chmod on the staged
executable. For some sites, this maynot be required.
The permissions might be preserved, or there maybe
an automatic mechanism that does it.
This property allows you to specify the list of
sites, where you do not want the chmod jobs to
be executed. For those sites, the chmod jobs are
replaced by NoOP jobs. The NoOP jobs are executed
by Condor, and instead will immediately have a
terminate event written to the job log file and
removed from the queue.

Property Key:
pegasus.transfer.setup.source.base.url
Profile Key: N/A
Scope : Properties
Since : 2.0.0
Type :URL
Default : (no default)

This property specifies the base URL to the
directory containing the Pegasus worker package
builds. During Staging of Executable, the Pegasus
Worker Package is also staged to the remote site.
The worker packages are by default pulled from
the http server at pegasus.isi.edu. This property
can be used to override the location from where
the worker package are staged. This maybe
required if the remote computes sites don’t allow
files transfers from a http server.

2.3.10. Monitoring Properties

Monitoring Properties

Key Attributes

Description

Property Key: pegasus.monitord.events
Profile Key: N/A
Scope : Properties
Since : 3.0.2
Type : String
Default : true
See Also :pegasus.catalog.workflow.url

This property determines whether pegasus-monitord
generates log events. If log events are disabled using
this property, no bp file, or database will be created,
even if the pegasus.monitord.output property is
specified.

Property Key: pegasus.catalog.workflow.url
Profile Key: N/A
Scope : Properties
Since : 4.5
Type : String
Default : SQlite database in submit directory.
See Also : pegasus.monitord.events

This property specifies the destination for generated
log events in pegasus-monitord. By default, events are
stored in a sqlite database in the workflow directory,
which will be created with the workflow’s name, and a
“.stampede.db” extension. Users can specify an
alternative database by using a SQLAlchemy
connection string. Details are available at:

http://www.sqlalchemy.org/docs/05/reference/dialects/index.html

It is important to note that users will need to have
the appropriate db interface library installed. Which is
to say, SQLAlchemy is a wrapper around the mysql interface
library (for instance), it does not provide a MySQL
driver itself. The Pegasus distribution includes
both SQLAlchemy and the SQLite Python driver. As a
final note, it is important to mention that unlike
when using SQLite databases, using SQLAlchemy with
other database servers, e.g. MySQL or Postgres ,
the target database needs to exist. Users can also
specify a file name using this property in order
to create a file with the log events.
Example values for the SQLAlchemy connection string for
various end points are listed below

SQL Alchemy End Point	Example Value
Netlogger BP File	file:///submit/dir/myworkflow.bp
SQL Lite Database	sqlite:///submit/dir/myworkflow.db
MySQL Database	mysql://user:password@host:port/databasename

Property Key: pegasus.catalog.master.url
Profile Key: N/A
Scope : Properties
Since : 4.2
Type: String
Default :
sqlite database in $HOME/.pegasus/workflow.db
See Also : pegasus.catalog.workflow.url

This property specifies the destination for the workflow
dashboard database. By default, the workflow dashboard
datbase defaults to a sqlite database named workflow.db
in the $HOME/.pegasus directory. This is database is
shared for all workflows run as a particular user. Users
can specify an alternative database by using a SQLAlchemy
connection string. Details are available at

http://www.sqlalchemy.org/docs/05/reference/dialects/index.html

It is important to note that users will need to have the
appropriate db interface library installed. Which is to
say, SQLAlchemy is a wrapper around the mysql interface
library (for instance), it does not provide a MySQL
driver itself. The Pegasus distribution includes both
SQLAlchemy and the SQLite Python driver. As a final
note, it is important to mention that unlike when using
SQLite databases, using SQLAlchemy with other database
servers, e.g. MySQL or Postgres , the target database
needs to exist. Users can also specify a file name
using this property in order to create a file with
the log events.

Example values for the SQLAlchemy connection string

for various end points are listed below

SQL Alchemy End Point	Example Value
SQL Lite Database	sqlite:///shared/myworkflow.db
MySQL Database	mysql://user:password@host:port/databasename

Property Key: pegasus.monitord.output
Profile Key: N/A
Scope : Properties
Since : 3.0.2
Type : String
Default : SQlite database in submit directory.
See Also : pegasus.monitord.events

This property has been deprecated in favor of
pegasus.catalog.workflow.url that introduced in
4.5 release. Support for this property will be
dropped in future releases.

Property Key: pegasus.dashboard.output
Profile Key: N/A
Scope : Properties
Since : 4.2
Type : String
Default :
sqlite database in $HOME/.pegasus/workflow.db
See Also : pegasus.monitord.output

This property has been deprecated in favore of
pegasus.catalog.master.url that was introduced
in 4.5 release. Support for this property will
be dropped in future releases.

Property Key: pegasus.monitord.notifications
Profile Key: N/A
Scope : Properties
Since : 3.1.0
Type :Boolean
Default : true
See Also : pegasus.monitord.notifications.max
See Also : pegasus.monitord.notifications.timeout

This property determines how many notification
scripts pegasus-monitord will call concurrently.
Upon reaching this limit, pegasus-monitord will
wait for one notification script to finish before
issuing another one. This is a way to keep the
number of processes under control at the submit
host. Setting this property to 0 will disable
notifications completely.

Property Key: pegasus.monitord.notifications.max
Profile Key: N/A
Scope : Properties
Since : 3.1.0
Type :Integer
Default : 10
See Also : pegasus.monitord.notifications
See Also : pegasus.monitord.notifications.timeout

This property determines whether pegasus-monitord
processes notifications. When notifications are
enabled, pegasus-monitord will parse the .notify
file generated by pegasus-plan and will invoke
notification scripts whenever conditions matches
one of the notifications.

Property Key: pegasus.monitord.notifications.timeout
Profile Key: N/A
Scope : Properties
Since : 3.1.0
Type :Integer
Default : true
See Also : pegasus.monitord.notifications
See Also : pegasus.monitord.notifications.max

This property determines how long will
pegasus-monitord let notification scripts run
before terminating them. When this property is set
to 0 (default), pegasus-monitord will not terminate
any notification scripts, letting them run
indefinitely. If some notification scripts
missbehave, this has the potential problem of
starving pegasus-monitord’s notification slots
(see the pegasus.monitord.notifications.max
property), and block further notifications. In
addition, users should be aware that
pegasus-monitord will not exit until all
notification scripts are finished.

Property Key: pegasus.monitord.stdout.disable.parsing
Profile Key:N/A
Scope : Properties
Since : 3.1.1
Type :Boolean
Default : false

By default, pegasus-monitord parses the
stdout/stderr section of the kickstart to populate
the applications captured stdout and stderr in the
job instance table for the stampede schema. For
large workflows, this may slow down monitord
especially if the application is generating a
lot of output to it’s stdout and stderr. This
property, can be used to turn of the database
population.

Property Key: pegasus.monitord.arguments
Profile Key: N/A
Scope : Properties
Since : 4.6
Type :String
Default : N/A

This property specifies additional command-line
arguments that should be passed to pegasus-monitord
at startup. These additional arguments are appended
to the arguments given to pegasus-monitord.

2.3.11. Job Clustering Properties

Job Clustering Properties
Key Attributes	Description
Property Key: pegasus.clusterer.job.aggregator Profile Key: N/A Scope : Properties Since : 2.0 Type : String Values : seqexec\|mpiexec\|AWSBatch Default : seqexec	A large number of workflows executed through Pegasus, are composed of several jobs that run for only a few seconds or so. The overhead of running any job on the grid is usually 60 seconds or more. Hence, it makes sense to cluster small independent jobs into a larger job. This property determines, the executable that will be used for running the larger job on the remote site. seqexec: In this mode, the executable used to run the merged job is “pegasus-cluster” that runs each of the smaller jobs sequentially on the same node. The executable “pegasus-cluster” is a PEGASUS tool distributed in the PEGASUS worker package, and can be usually found at {pegasus.home}/bin/pegasus-cluster. mpiexec: In this mode, the executable used to run the clustered job is “pegasus-mpi-cluster” (PMC) that runs the smaller jobs via mpi on n nodes where n is the nodecount associated with the merged job. The executable “pegasus-mpi-cluster” is a PEGASUS tool distributed in the PEGASUS distribution and is built only if mpi compiler is available. AWSBatch: In this mode, the executable used to run the merged job is “pegasus-aws-batch” that runs in local universe on the submit and runs the jobs making up the cluster on AWS Batch.
Property Key: pegasus.clusterer.job.aggregator.arguments Profile Key: job.aggregator.arguments Default : None Scope : TC, SC, Abstract WF, Properties Since : 5.0.2 See Also : pegasus.clusterer.job.aggregator	The additional arguments with which a clustering executable should be invoked.
Property Key: pegasus.clusterer.job.aggregator.seqexec.log Profile Key: N/A Scope : Properties Since : 2.3 Type :Boolean Default : false See Also : pegasus.clusterer.job.aggregator See Also : pegasus.clusterer.job.aggregator.seqexec.log.global	The tool pegasus-cluster logs the progress of the jobs that are being run by it in a progress file on the remote cluster where it is executed. This property sets the Boolean flag, that indicates whether to turn on the logging or not.
Property Key: pegasus.clusterer.job.aggregator.seqexec.log Profile Key: N/A Scope : Properties Since : 2.3 Type :Boolean Default : false See Also : pegasus.clusterer.job.aggregator See Also : pegasus.clusterer.job.aggregator.seqexec.log.global	The tool pegasus-cluster logs the progress of the jobs that are being run by it in a progress file on the remote cluster where it is executed. The progress log is useful for you to track the progress of your computations and remote grid debugging. The progress log file can be shared by multiple pegasus-cluster jobs that are running on a particular cluster as part of the same workflow. Or it can be per job. This property sets the Boolean flag, that indicates whether to have a single global log for all the pegasus-cluster jobs on a particular cluster or progress log per job.
Property Key: pegasus.clusterer.job.aggregator.seqexec.firstjobfail Profile Key: N/A Scope : Properties Since : 2.2 Type :Boolean Default : true See Also : pegasus.clusterer.job.aggregator	By default “pegasus-cluster” does not stop execution even if one of the clustered jobs it is executing fails. This is because “pegasus-cluster” tries to get as much work done as possible. This property sets the Boolean flag, that indicates whether to make “pegasus-cluster” stop on the first job failure it detects.
Property Key :pegasus.clusterer.allow.single Profile Key: N/A Scope : Properties Since : 4.9 Type : Boolean Default : False	By default, Pegasus does not launch clusters that contain a single job using the clustering/job aggregator executable. This property allows you to override this behaviour and have single job clusters to be created. Applies to both horizontal and label based clustering.
Property Key: pegasus.clusterer.label.key Profile Key: N/A Scope : Properties Since : 2.0 Type : String Default : label	While clustering jobs in the workflow into larger jobs, you can optionally label your graph to control which jobs are clustered and to which clustered job they belong. This done using a label based clustering scheme and is done by associating a profile/label key in the PEGASUS namespace with the jobs in the DAX. Each job that has the same value/label value for this profile key, is put in the same clustered job. This property allows you to specify the PEGASUS profile key that you want to use for label based clustering.

2.3.12. Logging Properties

Logging Properties
Key Attributes	Description
Property Key: pegasus.log.manager Profile Key: N/A Scope : Properties Since : 2.2.0 Type : String Values : Default\|Log4J Default : Default See Also :pegasus.log.manager.formatter	This property sets the logging implementation to use for logging. Default: This implementation refers to the legacy Pegasus logger, that logs directly to stdout and stderr. It however, does have the concept of levels similar to log4j or syslog. Log4j: This implementation, uses Log4j 2.x (2.17 at time of implementation to log messages. The log4j properties can be specified in a properties file, the location of which is specified by the property pegasus.log.manager.log4j.conf .
Property Key: pegasus.log.manager.formatter Profile Key: N/A Scope : Properties Since : 2.2.0 Type : String Values : Simple\|Netlogger Default : Simple See Also :pegasus.log.manager	This property sets the formatter to use for formatting the log messages while logging. Simple This formats the messages in a simple format. The messages are logged as is with minimal formatting. Below are sample log messages in this format while ranking an abstract workflow according to performance. event.pegasus.ranking dax.id se18-gda.dax - STARTED event.pegasus.parsing.dax dax.id se18-gda-nested.dax - STARTED event.pegasus.parsing.dax dax.id se18-gda-nested.dax - FINISHED job.id jobGDA job.id jobGDA query.name getpredicted performace time 10.00 event.pegasus.ranking dax.id se18-gda.dax - FINISHED Netlogger This formats the messages in the Netlogger format , that is based on key value pairs. The netlogger format is useful for loading the logs into a database to do some meaningful analysis. Below are sample log messages in this format while ranking an abstract workflow ts=2008-09-06T12:26:20.100502Z event=event.pegasus.ranking.start msgid=6bc49c1f-112e-4cdb-af54-3e0afb5d593c eventId=event.pegasus.ranking_8d7c0a3c-9271-4c9c-a0f2-1fb57c6394d5 dax.id=se18-gda.dax prog=Pegasus ts=2008-09-06T12:26:20.100750Z event=event.pegasus.parsing.dax.start msgid=fed3ebdf-68e6-4711-8224-a16bb1ad2969 eventId=event.pegasus.parsing.dax_887134a8-39cb-40f1-b11c-b49def0c5232\ dax.id=se18-gda-nested.dax prog=Pegasus
Property Key: pegasus.log.* Profile Key: N/A Scope : Properties Since : 2.0 Type :file path Default : no default	This property sets the path to the file where all the logging for Pegasus can be redirected to. Both stdout and stderr are logged to the file specified.
Property Key: pegasus.log.memory.usage Profile Key: N/A Scope : Properties Since : 4.3.4 Type :Boolean Default : false	This property if set to true, will result in the planner writing out JVM heap memory statistics at the end of the planning process at the INFO level. This is useful, if users want to fine tune their java memory settings by setting JAVA_HEAPMAX and JAVA_HEAPMIN for large workflows.
Property Key: pegasus.metrics.app Profile Key:N/A Scope : Properties Since : 4.3.0 Type :String Default : (no default)	This property namespace allows users to pass application level metrics to the metrics server. The value of this property is the name of the application. Additional application specific attributes can be passed by using the prefix pegasus.metrics.app pegasus.metrics.app.[arribute-name] attribute-value Note: the attribute cannot be named name. This attribute is automatically assigned the value from pegasus.metrics.app

2.3.13. Cleanup Properties

Cleanup Properties
Key Attributes	Description
Property Key: pegasus.file.cleanup.strategy Profile Key: N/A Scope :Properties Since :2.2 Type :String Default :InPlace	This property is used to select the strategy of how the cleanup nodes are added to the executable workflow. InPlace The default cleanup strategy. Adds cleanup nodes per level of the workflow. Constraint Adds cleanup nodes to constraint the amount of storage space used by a workflow. Note: This property is overridden by the –cleanup option used in pegasus-plan.
Property Key:pegasus.file.cleanup.impl Profile Key: N/A Scope : Properties Since : 2.2 Type :String Default : Cleanup	This property is used to select the executable that is used to create the working directory on the compute sites. Cleanup The default executable that is used to delete files is the “pegasus-transfer” executable shipped with Pegasus. It is found at $PEGASUS_HOME/bin/pegasus-transfer in the Pegasus distribution. An entry for transformation pegasus::dirmanager needs to exist in the Transformation Catalog or the PEGASUS_HOME environment variable should be specified in the site catalog for the sites for this mode to work. RM This mode results in the rm executable to be used to delete files from remote directories. The rm executable is standard on *nix systems and is usually found at /bin/rm location.
Property Key: pegasus.file.cleanup.clusters.num Profile Key: N/A Scope : Properties Since : 4.2.0 Type :Integer	In case of the InPlace strategy for adding the cleanup nodes to the workflow, this property specifies the maximum number of cleanup jobs that are added to the executable workflow on each level.
Property Key: pegasus.file.cleanup.clusters.size Profile Key: N/A Scope : Properties Since : 4.2.0 Type :Integer Default : 2	In case of the InPlace strategy this property sets the number of cleanup jobs that get clustered into a bigger cleanup job. This parameter is only used if pegasus.file.cleanup.clusters.num is not set.
Property Key: pegasus.file.cleanup.scope Profile Key: N/A Scope : Properties Since : 2.3.0 Type :Enumeration Value :fullahead\|deferred Default : fullahead	By default in case of deferred planning InPlace file cleanup is turned OFF. This is because the cleanup algorithm does not work across partitions. This property can be used to turn on the cleanup in case of deferred planning. fullahead: This is the default scope. The pegasus cleanup algorithm does not work across partitions in deferred planning. Hence the cleanup is always turned OFF , when deferred planning occurs and cleanup scope is set to full ahead. deferred: If the scope is set to deferred, then Pegasus will not disable file cleanup in case of deferred planning. This is useful for scenarios where the partitions themselves are independant ( i.e. dont share files ). Even if the scope is set to deferred, users can turn off cleanup by specifying –nocleanup option to pegasus-plan.
Property Key: pegasus.file.cleanup.constraint.*.maxspace Profile Key: N/A Scope :Properties Since :4.6.0 Type :String Default :10737418240	This property is used to set the maximum available space (i.e., constraint) per site in Bytes. The * in the property name denotes the name of the compute site. A * in the property key is taken to mean all sites.
Property Key: pegasus.file.cleanup.constraint.deferstageins Profile Key: N/A Scope :Properties Since :4.6.0 Type :Boolean Default :False	This property is used to determine whether stage in jobs may be deferred. If this property is set to False (default), all stage in jobs will be marked as executing on the current compute site and will be executed before any task. This property has no effect when running in a multi site case.
Property Key: pegasus.file.cleanup.constraint.csv Profile Key: N/A Scope : Properties Since : 4.6.1 Type : String Default: (no default)	This property is used to specify a CSV file with a list of LFNs and their respective sizes in Bytes. The CSV file must be composed of two columns: filename and length.

2.3.14. AWS Batch Properties

AWS Batch Properties
Key Attributes	Description
Property Key: pegasus.aws.account Profile Key: N/A Scope : Properties Since : 4.9.0 Type :String Default : (no default)	This property is used to specify the amazon account under which you are running jobs.
Property Key: pegasus.aws.region Profile Key: N/A Scope : Properties Since : 4.9.0 Type :String Default : (no default)	This property is used to specify the amazon region in which you are running jobs.
Property Key: pegasus.aws.batch.job_definition Profile Key: N/A Scope : Properties Since : 4.9.0 Type :String Default : (no default)	This property is used to specify - the JSON file containing job definition to register for executing jobs OR - the ARN of existing job definition OR - basename of an existing job definition
Property Key: pegasus.aws.batch.compute_environment Profile Key:N/A Scope : Properties Since : 4.9.0 Type :String Default : (no default)	This property is used to specify - the JSON file containing compute environment to register for executing jobs OR - the ARN of existing compute environment OR - basename of an existing compute environment
Property Key: pegasus.aws.batch.job_queue Profile Key: N/A Scope : Properties Since : 4.9.0 Type :String Default : (no default)	This property is used to specify - the JSON file containing Job Queue to use for executing jobs OR - the ARN of existing job queue OR - basename of an existing job queue
Property Key: pegasus.aws.batch.s3_bucket Profile Key: N/A Scope : Properties Since : 4.9.0 Type :URL Default : (no default)	This property is used to specify the S3 Bucket URL to use for data transfers while executing jobs on AWS Batch.

2.3.15. Miscellaneous Properties

Miscellaneous Properties
Key Attributes	Description
Property Key: pegasus.code.generator Profile Key: N/A Scope : Properties Since : 3.0 Type : String Values : Condor\|Shell\|PMC Default : Condor See Also : pegasus.log.manager.formatter	This property is used to load the appropriate Code Generator to use for writing out the executable workflow. Condor This is the default code generator for Pegasus . This generator generates the executable workflow as a Condor DAG file and associated job submit files. The Condor DAG file is passed as input to Condor DAGMan for job execution. Shell This Code Generator generates the executable workflow as a shell script that can be executed on the submit host. While using this code generator, all the jobs should be mapped to site local i.e specify –sites local to pegasus-plan. PMC This Code Generator generates the executable workflow as a PMC task workflow. This is useful to run on platforms where it not feasible to run Condor such as the new XSEDE machines such as Blue Waters. In this mode, Pegasus will generate the executable workflow as a PMC task workflow and a sample PBS submit script that submits this workflow.
Property Key: pegasus.integrity.checking Profile Key: N/A Scope : Properties Since : 4.9.0 Type :none\|full\|nosymlink Default : full	This property determines the dial for pegasus integrity checking. Currently the following dials are supported none no integrity checking occurs. full In this mode, integrity checking happens at 3 levels 1. after the input data has been staged to staging server pegasus-transfer verifies integrity of the staged files. 2. before a compute task starts on a remote compute node This ensures that checksums of the data staged in match the checksums specified in the input replica catalog or the ones computed when that piece of data was generated as part of previous task in the workflow. 3. After the workflow output data has been transferred to user servers - This ensures that output data staged to the final location was not corrupted in transit. nosymlink No integrity checking is performed on input files that are symlinked. You should consider turning this on, if you think that your input files at rest are at a low risk of data corruption, and want to save on the checksum computation overheads against the shared filesystem.noneNo integrity checking is performed.
Property Key: pegasus.mode Profile Key: N/A Scope : Properties Since : 5.0 Type : production\|development\|tutorial\|debug Default : production	This property determines the mode under which pegasus operates. Currently the following modes are supported production default mode. jobs are retried 3 times, pegasus transfer executable makes 3 attempts for each transfer, and jobs remain in held state for 30 minutes, before being removed. development In this mode, integrity checking is disabled; jobs are are not retried in case of failure; pegasus-transfer executable makes only one attempt for each transfer,and jobs remain in held state for 30 seconds, before being removed. debug In this mode, integrity checking is disabled; jobs are are not retried in case of failure; pegasus-transfer executable makes only one attempt for each transfer,and jobs remain in held state for 30 seconds, before being removed. It also increases logging for the various subcomponents of Pegasus such as the planner, monitoring daemon, transfers in jobs, PegasusLite jobs and the registration jobs tutorial In this mode, jobs are not retried in case of failure; pegasus-transfer executable makes only one attempt for each transfer,and jobs remain in held state for 30 seconds before being removed.
Property Key: pegasus.condor.concurrency.limits Profile Key: N/A Scope : Properties Since : 4.5.3 Type : Boolean Default :False	This Boolean property is used to determine whether Pegasus associates default HTCondor concurrency limits with jobs or not. Setting this property to true, allows you to throttle jobs across workflows, if the workflow are set to run in pure condor environment.
Property Key: pegasus.register Profile Key: N/A Scope : Properties Since : 4.1.- Type : Boolean Default : true	Pegasus creates registration jobs to register the output files in the replica catalog. An output file is registered only - if a user has configured a replica catalog in the properties - the register flags for the output files in the abstract workflow are set to true This property can be used to turn off the creation of the registration jobs even though the files maybe marked to be registered in the replica catalog.
Property Key: pegasus.register.deep Profile Key: N/A Scope : Properties Since : 4.5.3 Type : Boolean Default : true	By default, Pegasus always registers the complete LFN that is associated with the output files in the DAX i.e if the LFN has / in it, then lfn registered in the replica catalog has the whole part. For example, if in your Abstract Workflow you have rupture/0001.rx as the name attribute for the uses tag, then in the Replica Catalog the LFN is registered as rupture/0001.rx On setting this property to false, only the basename is considered while registering in the replica catalog. In the above case, 0001.rx will be registered instead of rupture/0001.rx
Property Key: pegasus.data.reuse.scope Profile Key: N/A Scope : Properties Since : 4.5.0 Type :Enumeration Value :none\|partial\|full Default : full	This property is used to control the behavior of the data reuse algorithm in Pegasus none This is same as disabling data reuse. It is equivalent to passing the –force option to pegasus-plan on the command line. partial In this case, only certain jobs ( those that have pegasus profile key enable_for_data_reuse set to true ) are checked for presence of output files in the replica catalog. This gives users control over what jobs are deleted as part of the data reuse algorithm. full This is the default behavior, where all the jobs output files are looked up in the replica catalog.
Property Key: pegasus.catalog.transformation.mapper Profile Key:N/A Scope : Properties Since : 2.0 Type :Enumeration Value :All\|Installed\|Staged\|Submit Default : All	Pegasus supports transfer of statically linked executables as part of the executable workflow. At present, there is only support for staging of executables referred to by the compute jobs specified in the DAX file. Pegasus determines the source locations of the binaries from the transformation catalog, where it searches for entries of type STATIC_BINARY for a particular architecture type. The PFN for these entries should refer to a globus-url-copy valid and accessible remote URL. For transfer of executables, Pegasus constructs a soft state map that resides on top of the transformation catalog, that helps in determining the locations from where an executable can be staged to the remote site. This property determines, how that map is created. All In this mode, all sources with entries of type STATIC_BINARY for a particular transformation are considered valid sources for the transfer of executables. This the most general mode, and results in the constructing the map as a result of the cartesian product of the matches. Installed In this mode, only entries that are of type INSTALLED are used while constructing the soft state map. This results in Pegasus never doing any transfer of executables as part of the workflow. It always prefers the installed executables at the remote sites. Staged In this mode, only entries that are of type STATIC_BINARY are used while constructing the soft state map. This results in the concrete workflow referring only to the staged executables, irrespective of the fact that the executables are already installed at the remote end. Submit In this mode, only entries that are of type STATIC_BINARY and reside at the submit host (“site” local), are used while constructing the soft state map. This is especially helpful, when the user wants to use the latest compute code for his computations on the grid and that relies on his submit host.
Property Key: pegasus.selector.transformation Profile Key: N/A Scope : Properties Since : 2.0 Type : Enumeration Value :Random\|Installed\|Staged\|Submit Default : Random	In case of transfer of executables, Pegasus could have various transformations to select from when it schedules to run a particular compute job at a remote site. For e.g it can have the choice of staging an executable from a particular remote site, from the local (submit host) only, use the one that is installed on the remote site only. This property determines, how a transformation amongst the various candidate transformations is selected, and is applied after the property for transformation mapper has been applied. For e.g specifying pegasus.catalog.transformation.mapper as Staged and then pegasus.transformation.selector as INSTALLED does not work, as by the time this property is applied, the soft state map only has entries of type STAGEABLE. Random In this mode, a random matching candidate transformation is selected to be staged to the remote execution site. Installed In this mode, only entries that are of type INSTALLED are selected. This means that the executable workflow only refers to the transformations already pre installed on the remote sites. Staged In this mode, only entries that are of type STATIC_BINARY are selected, ignoring the ones that are installed at the remote site. Submit In this mode, only entries that are of type STATIC_BINARY and reside at the submit host (“site” local), are selected as sources for staging the executables to the remote execution sites.
Property Key: pegasus.parser.dax.preserver.linebreaks Profile Key:N/A Scope : Properties Since : 2.2.0 Type :Boolean Default : false	The DAX Parser normally does not preserve line breaks while parsing the CDATA section that appears in the arguments section of the job element in the DAX. On setting this to true, the DAX Parser preserves any line line breaks that appear in the CDATA section.
Property Key: pegasus.parser.dax.data.dependencies Profile Key: N/A Scope : Properties Since : 4.4.0 Type :Boolean Default : true	If this property is set to true, then the planner will automatically add edges between jobs in the DAX on the basis of exisitng data dependencies between jobs. For example, if a JobA generates an output file that is listed as input for JobB, then the planner will automatically add an edge between JobA and JobB.
Property Key: pegasus.parser.document.size Profile Key: N/A Scope : Properties Since : 5.1.1 Type : Integer in MB Default : 500	For parsing YAML documents, this property controls the behavior of the snakeyaml library in terms of the maximum size in MB of a document that can be parsed.
Property Key: pegasus.credential.encrypt Profile Key: N/A Scope : Properties Since : 5.1.2 Type :Boolean Default : true	If this property is set to true, then the planner will turn ON the encryption of credential when transferring the credential using HTCondor while transfers. Defaults to true.
Property Key: pegasus.condor.dag.checker.jobs Profile Key: N/A Scope : Properties Since : 5.1.2 Type : int Default : 50000	The limit on the number of jobs in the executable workflow for the condor_dag_checker tool to be invoked generated condor dag file for the workflow. Defaults to 50000.