Chapter 16. API Reference

16.1. DAX XML Schema

The DAX format is described by the XML schema instance document dax-3.6.xsd. A local copy of the schema definition is provided in the etc directory. The documentation of the XML schema and its elements can be found in dax-3.6.html as well as locally in doc/schemas/dax-3.6/dax-3.6.html in your Pegasus distribution.

16.1.1. DAX XML Schema In Detail

The DAX file format has four major sections, with the second section divided into more sub-sections. The DAX format works on the abstract or logical level, letting you focus on the shape of the workflows, what to do and what to work upon.

  1. Workflow level Metadata

    Metadata that is associated with the whole workflow. These are defined in the Metadata section.

  2. Workflow-level Notifications

    Very simple workflow-level notifications. These are defined in the Notification section.

  3. Catalogs

    The first section deals with included catalogs. While we do recommend to use external replica- and transformation catalogs, it is possible to include some replicas and transformations into the DAX file itself. Any DAX-included entry takes precedence over regular replica catalog (RC) and transformation catalog (TC) entries.

    The first section (and any of its sub-sections) is completely optional.

    1. The first sub-section deals with included replica descriptions.

    2. The second sub-section deals with included transformation descriptions.

    3. The third sub-section declares multi-item executables.

  4. Job List

    The jobs section defines the job- or task descriptions. For each task to conduct, a three-part logical name declares the task and aides identifying it in the transformation catalog or one of the executable section above. During planning, the logical name is translated into the physical executable location on the chosen target site. By declaring jobs abstractly, physical layout consideration of the target sites do not matter. The job's id uniquley identifies the job within this workflow.

    The arguments declare what command-line arguments to pass to the job. If you are passing filenames, you should refer to the logical filename using the file element in the argument list.

    Important for properly planning the task is the list of files consumed by the task, its input files, and the files produced by the task, its output files. Each file is described with a uses element inside the task.

    Elements exist to link a logical file to any of the stdio file descriptors. The profile element is Pegasus's way to abstract site-specific data.

    Jobs are nodes in the workflow graph. Other nodes include unplanned workflows (DAX), which are planned and then run when the node runs, and planned workflows (DAG), which are simply executed.

  5. Control-flow Dependencies

    The third section lists the dependencies between the tasks. The relationships are defined as child parent relationships, and thus impacts the order in which tasks are run. No cyclic dependencies are permitted.

    Dependencies are directed edges in the workflow graph.

16.1.1.1. XML Intro

If you have seen the DAX schema before, not a lot of new items in the root element. However, we did retire the (old) attributes ending in Count.

<?xml version="1.0" encoding="UTF-8"?>
<!-- generated: 2011-07-28T18:29:57Z -->
<adag xmlns="http://pegasus.isi.edu/schema/DAX"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd"
      version="3.6"
      name="diamond"
      index="0"
      count="1">

The following attributes are supported for the root element adag.

Table 16.1. Root element attributes

attribute optional? type meaning
version required VersionPattern Version number of DAX instance document. Must be 3.6.
name required string name of this DAX (or set of DAXes).
count optional positiveInteger size of list of DAXes with this name. Defaults to 1.
index optional nonNegativeInteger current index of DAX with same name. Defaults to 0.
fileCount removed nonNegativeInteger Old 2.1 attribute, removed, do not use.
jobCount removed positiveInteger Old 2.1 attribute, removed, do not use.
childCount removed nonNegativeInteger Old 2.1 attribute, removed, do not use.

The version attribute is restricted to the regular expression \d+(\.\d+(\.\d+)?)?.This expression represents the VersionPattern type that is used in other places, too. It is a more restrictive expression than before, but allows us to compute comparable version number using the following formula:

version1: a.b.c version2: d.e.f
n = a * 1,000,000 + b * 1,000 + c m = d * 1,000,000 + e * 1,000 + f
version1 > version2 if n > m

16.1.1.2. Workflow-level Metadata

Metadata associated with the whole workflow.

   <metadata key="name">diamond</metadata>
   <metadata key="createdBy">Karan Vahi</metadata>

The workflow level metadata maybe used to control the Pegasus Mapper behaviour at planning time or maybe propogated to external services while querying for job characteristics.

16.1.1.3. Workflow-level Notifications

Notifications that are generated when workflow level events happened.

  <!-- part 1.1: invocations -->
  <invoke when="at_end">/bin/date -Ins &gt;&gt; my.log</invoke>

The above snippet will append the current time to a log file in the current directory. This is with regards to the pegasus-monitord instance acting on the notification.

16.1.1.4. The Catalogs Section

The initial section features three sub-sections:

  1. a catalog of files used,

  2. a catalog of transformations used, and

  3. compound transformation declarations.

16.1.1.4.1. The Replica Catalog Section

The file section acts as in in-file replica catalog (RC). Any files declared in this section take precedence over files in external replica catalogs during planning.

  <!-- part 1.2: included replica catalog -->
  <file name="example.a" >
    <!-- profiles are optional -->
    <!-- The "stat" namespace is ONLY AN EXAMPLE -->
    <profile namespace="stat" key="size">/* integer to be defined */</profile>
    <profile namespace="stat" key="md5sum">/* 32 char hex string */</profile>
    <profile namespace="stat" key="mtime">/* ISO-8601 timestamp */</profile>

    <!-- Metadata will be supported 4.6 onwards-->
    <metadata key="timestamp" >/* ISO-8601 *or* 20100417134523:int */</metadata>
    <metadata key="origin" >ocean</metadata>

    <!-- PFN to by-pass replica catalog -->
    <!-- The "site attribute is optional -->
    <pfn url="file:///tmp/example.a" site="local">
      <profile namespace="stat" key="owner">voeckler</profile>
    </pfn>
    <pfn url="file:///storage/funky.a" site="local"/>
  </file>

  <!-- a more typical example from the black diamond -->
  <file name="f.a">
    <pfn url="file:///Users/voeckler/f.a" site="local"/>
  </file>

The first file entry above is an example of a data file with two replicas. The file element requires a logical file name. Each logical filename may have additional information associated with it, enumerated by profile elements. Each file entry may have 0 or more metadata associated with it. Each piece of metadata has a key string and type attribute describing the element's value.

Warning

The metadata element is not support as of this writing! Details may change in the future.

The file element can provide 0 or more pfn locations, taking precedence over the replica catalog. A file element that does not name any pfn children-elements will still require look-ups in external replica catalogs. Each pfn element names a concrete location of a file. Multiple locations constitute replicas of the same file, and are assumed to be usable interchangably. The url attribute is mandatory, and typically would use a file schema URL. The site attribute is optional, and defaults to value local if missing. A pfn element may have profile children-elements, which refer to attributes of the physical file. The file-level profiles refer to attributes of the logical file.

Note

The stat profile namespace is ony an example, and details about stat are not yet implemented. The proper namespaces pegasus, condor, dagman, env, hints, globus and selector enjoy full support.

The second file entry above shows a usage example from the black-diamond example workflow that you are more likely to encouter or write.

The presence of an in-file replica catalog lets you declare a couple of interesting advanced features. The DAG and DAX file declarations are just files for all practical purposes. For deferred planning, the location of the site catalog (SC) can be captured in a file, too, that is passed to the job dealing with the deferred planning as logical filename.

  <file name="black.dax" >
    <!-- specify the location of the DAX file -->
    <pfn url="file:///Users/vahi/Pegasus/work/dax-3.0/blackdiamond_dax.xml" site="local"/>
  </file>

  <file name="black.dag" >
    <!-- specify the location of the DAG file -->
    <pfn url="file:///Users/vahi/Pegasus/work/dax-3.0/blackdiamond.dag" site="local"/>
  </file>

  <file name="sites.xml" >
    <!-- specify the location of a site catalog to use for deferred planning -->
    <pfn url="file:///Users/vahi/Pegasus/work/dax-3.0/conf/sites.xml" site="local"/>
  </file>
16.1.1.4.. The Transformation Catalog Section

The executable section acts as an in-file transformation catalog (TC). Any transformations declared in this section take precedence over the external transformation catalog during planning.

  <!-- part 1.3: included transformation catalog -->
  <executable namespace="example" name="mDiffFit" version="1.0"
              arch="x86_64" os="linux" installed="true" >
    <!-- profiles are optional -->
    <!-- The "stat" namespace is ONLY AN EXAMPLE! -->
    <profile namespace="stat" key="size">5000</profile>
    <profile namespace="stat" key="md5sum">AB454DSSDA4646DS</profile>
    <profile namespace="stat" key="mtime">2010-11-22T10:05:55.470606000-0800</profile>

    <!-- metadata will be supported in 4.6 -->
    <metadata key="timestamp" >/* see above */</metadata>
    <metadata key="origin">ocean</metadata>

    <!-- PFN to by-pass transformation catalog -->
    <!-- The "site" attribute is optional -->
    <pfn url="file:///tmp/mDiffFit"          site="local"/>
    <pfn url="file:///tmp/storage/mDiffFit"  site="local"/>
  </executable>

  <!-- to be used in compound transformation later -->
  <executable namespace="example" name="mDiff" version="1.0"
              arch="x86_64" os="linux" installed="true" >
    <pfn url="file:///tmp/mDiff" site="local"/>
  </executable>

  <!-- to be used in compound transformation later -->
  <executable namespace="example" name="mFitplane" version="1.0"
              arch="x86_64" os="linux" installed="true" >
    <pfn url="file:///tmp/mDiffFitplane"  site="local">
      <profile namespace="stat" key="md5sum">0a9c38b919c7809cb645fc09011588a6</profile>
    </pfn>
    <invoke when="at_end">/path/to/my_send_email some args</invoke>
  </executable>

  <!-- a more likely example from the black diamond -->
  <executable namespace="diamond" name="preprocess" version="2.0"
              arch="x86_64"
              os="linux"
              osversion="2.6.18">
    <pfn url="file:///opt/pegasus/default/bin/keg" site="local" />
  </executable>

Logical filenames pertaining to a single executables in the transformation catalog use the executable element. Any executable element features the optional namespace attribute, a mandatory name attribute, and an optional version attribute. The version attribute defaults to "1.0" when absent. An executable typically needs additional attributes to describe it properly, like the architecture, OS release and other flags typically seen with transformations, or found in the transformation catalog.

Table 16.2. executable element attributes

attribute optional? type meaning
name required string logical transformation name
namespace optional string namespace of logical transformation, default to null value.
version optional VersionPattern version of logical transformation, defaults to "1.0".
installed optional boolean whether to stage the file (false), or not (true, default).
arch optional Architecture restricted set of tokens, see schema definition file.
os optional OSType restricted set of tokens, see schema definition file.
osversion optional VersionPattern kernel version as beginning of `uname -r`.
glibc optional VersionPattern version of libc.

The rationale for giving these flags in the executable element header is that PFNs are just identical replicas or instances of a given LFN. If you need a different 32/64 bit-ed-ness or OS release, the underlying PFN would be different, and thus the LFN for it should be different, too.

Note

We are still discussing some details and implications of this decision.

The initial examples come with the same caveats as for the included replica catalog.

Warning

The metadata element is not support as of this writing! Details may change in the future.

Similar to the replica catalog, each executable element may have 0 or more profile elements abstracting away site-specific details, zero or more metadata elements, and zero or more pfn elements. If there are no pfn elements, the transformation must still be searched for in the external transformation catalog. As before, the pfn element may have profile children-elements, referring to attributes of the physical filename itself.

Each executable element may also feature invoke elements. These enable notifications at the appropriate point when every job that uses this executable reaches the point of notification. Please refer to the notification section for details and caveats.

The last example above comes from the black diamond example workflow, and presents the kind and extend of attributes you are most likely to see and use in your own workflows.

16.1.1.4.3. The Compound Transformation Section

The compound transformation section declares a transformation that comprises multiple plain transformation. You can think of a compound transformation like a script interpreter and the script itself. In order to properly run the application, you must start both, the script interpreter and the script passed to it. The compound transformation helps Pegasus to properly deal with this case, especially when it needs to stage executables.

  <transformation namespace="example" version="1.0" name="mDiffFit" >
    <uses name="mDiffFit" />
    <uses name="mDiff" namespace="example" version="2.0" />
    <uses name="mFitPlane" />
    <uses name="mDiffFit.config" executable="false" />
  </transformation>

A transformation element declares a set of purely logical entities, executables and config (data) files, that are all required together for the same job. Being purely logical entities, the lookup happens only when the transformation element is referenced (or instantiated) by a job element later on.

The namespace and version attributes of the transformation element are optional, and provide the defaults for the inner uses elements. They are also essential for matching the transformation with a job.

The transformation is made up of 1 or more uses element. Each uses has a boolean attribute executable, true by default, or false to indicate a data file. The name is a mandatory attribute, refering to an LFN declared previously in the File Catalog (executable is false), Executable Catalog (executable is true), or to be looked up as necessary at instantiation time. The lookup catalog is determined by the executable attribute.

After uses elements, any number of invoke elements may occur to add a notification each whenever this transformation is instantiated.

The namespace and version attributes' default values inside uses elements are inherited from the transformation attributes of the same name. There is no such inheritance for uses elements with executable attribute of false.

16.1.1.5. Graph Nodes

The nodes in the DAX comprise regular job nodes, already instantiated sub-workflows as dag nodes, and still to be instantiated dax nodes. Each of the graph nodes can has a mandatory id attribute. The id attribute is currently a restriction of type NodeIdentifierPattern type, which is a restriction of the xs:NMTOKEN type to letters, digits, hyphen and underscore.

The level attribute is deprecated, as the planner will trust its own re-computation more than user input. Please do not use nor produce any level attribute.

The node-label attribute is optional. It applies to the use-case when every transformation has the same name, but its arguments determine what it really does. In the presence of a node-label value, a workflow grapher could use the label value to show graph nodes to the user. It may also come in handy while debugging.

Any job-like graph node has the following set of children elements, as defined in the AbstractJobType declaration in the schema definition:

  • 0 or 1 argument element to declare the command-line of the job's invocation.

  • 0 or more profile elements to abstract away site-specific or job-specific details.

  • 0 or 1 stdin element to link a logical file the the job's standard input.

  • 0 or 1 stdout element to link a logical file to the job's standard output.

  • 0 or 1 stderr element to link a logical file to the job's standard error.

  • 0 or more uses elements to declare consumed data files and produced data files.

  • 0 or more invoke elements to solicit notifications whence a job reaches a certain state in its life-cycle.

16.1.1.5.1. Job Nodes

A job element has a number of attributes. In addition to the id and node-label described in (Graph Nodes)above, the optional namespace, mandatory name and optional version identify the transformation, and provide the look-up handle: first in the DAX's transformation elements, then in the executable elements, and finally in an external transformation catalog.

  <!-- part 2: definition of all jobs (at least one) -->
  <job id="ID000001" namespace="example" name="mDiffFit" version="1.0"
       node-label="preprocess" >
    <argument>-a top -T 6  -i <file name="f.a"/>  -o <file name="f.b1"/></argument>

    <!-- profiles are optional -->
    <profile namespace="execution" key="site">isi_viz</profile>
    <profile namespace="condor" key="getenv">true</profile>

     <uses name="f.a" link="input" transfer="true" register="true">
         <metadata key="size">1024</metadata>
      </uses>
    <uses name="f.b" link="output" register="false" transfer="true" type="data" />

    <!-- 'WHEN' enumeration: never, start, on_error, on_success, at_end, all -->
    <!-- PEGASUS_* env-vars: event, status, submit dir, wf/job id, stdout, stderr -->
    <invoke when="start">/path/to arg arg</invoke>
    <invoke when="on_success"><![CDATA[/path/to arg arg]]></invoke>
    <invoke when="at_end"><![CDATA[/path/to arg arg]]></invoke>
  </job>

The argument element contains the complete command-line that is needed to invoke the executable. The only variable components are logical filenames, as included file elements.

The profile argument lets you encapsulate site-specific knowledge .

The stdin, stdout and stderr element permits you to connect a stdio file descriptor to a logical filename. Note that you will still have to declare these files in the uses section below.

The uses element enumerates all the files that the task consumes or produces. While it is not necessary nor required to have all files appear on the command-line, it is imperative that you declare even hidden files that your task requires in this section, so that the proper ancilliary staging- and clean-up tasks can be generated during planning.

The invoke element may be specified multiple times, as needed. It has a mandatory when attribute with the following value set:

Table 16.3. invoke element attributes

keyword job life-cycle state meaning
never never (default). Never notify of anything. This is useful to temporarily disable an existing notifications.
start submit create a notification when the job is submitted.
on_error end after a job finishes with failure (exitcode != 0).
on_success end after a job finishes with success (exitcode == 0).
at_end end after a job finishes, regardless of exitcode.
all always like start and at_end combined.

Warning

In clustered jobs, a notification can only be sent at the start or end of the clustered job, not for each member.

Each invoke is a simple local invocation of an executable or script with the specified arguments. The executable inside the invoke body will see the following environment variables:

Table 16.4. invoke/executable environment variables

variable job life-cycle state meaning
PEGASUS_EVENT always The value of the when attribute
PEGASUS_STATUS end The exit status of the graph node. Only available for end notifications.
PEGASUS_SUBMIT_DIR always In which directory to find the job (or workflow).
PEGASUS_JOBID always The job (or workflow) identifier. This is potentially more than merely the value of the id attribute.
PEGASUS_STDOUT always The filename where stdout goes. Empty and possibly non-existent at submit time (though we still have the filename). The kickstart record for job nodes.
PEGASUS_STDERR always The filename where stderr goes. Empty and possibly non-existent at submit time (though we still have the filename).

Generators should use CDATA encapsulated values to the invoke element to minimize interference. Unfortunately, CDATA cannot be nested, so if the user invocation contains a CDATA section, we suggest that they use careful XML-entity escaped strings. The notifications section describes these in further detail.

16.1.1.5.2. DAG Nodes

A workflow that has already been concretized, either by an earlier run of Pegasus, or otherwise constructed for DAGMan execution, can be included into the current workflow using the dag element.

  <dag id="ID000003" name="black.dag" node-label="foo" >
    <profile namespace="dagman" key="DIR">/dag-dir/test</profile>
    <invoke> <!-- optional, should be possible --> </invoke>
    <uses file="sites.xml" link="input" register="false" transfer="true" type="data"/>
  </dag>

The id and node-label attributes were described previously. The name attribute refers to a file from the File Catalog that provides the actual DAGMan DAG as data content. The dag element features optional profile elements. These would most likely pertain to the dagman and env profile namespaces. It should be possible to have the optional notify element in the same manner as for jobs.

A graph node that is a dag instead of a job would just use a different submit file generator to create a DAGMan invocation. There can be an argument element to modify the command-line passed to DAGMan.

16.1.1.5.3. DAX Nodes

A still to be planned workflow incurs an invocation of the Pegasus planner as part of the workflow. This still abstract sub-workflow uses the dax element.

  <dax id="ID000002" name="black.dax" node-label="bar" >
    <profile namespace="env" key="foo">bar</profile>
    <argument>-Xmx1024 -Xms512 -Dpegasus.dir.storage=storagedir  -Dpegasus.dir.exec=execdir -o local --dir ./datafind -vvvvv --force -s dax_site </argument>
    <invoke> <!-- optional, may not be possible here --> </invoke>
    <uses file="sites.xml" link="input" register="false" transfer="true" type="data" />
  </dax>

In addition to the id and node-label attributes, See Graph Nodes. The name attribute refers to a file from the File Catalog that provides the to be planned DAX as external file data content. The dax element features optional profile elements. These would most likely pertain to the pegasus, dagman and env profile namespaces. It may be possible to have the optional notify element in the same manner as for jobs.

A graph node that is a dax instead of a job would just use yet another submit file and pre-script generator to create a DAGMan invocation. The argument string pertains to the command line of the to-be-generated DAGMan invocation.

16.1.1.5.4. Inner ADAG Nodes

While completeness would argue to have a recursive nesting of adag elements, such recursive nestings are currently not supported, not even in the schema. If you need to nest workflows, please use the dax or dag element to achieve the same goal.

16.1.1.6. The Dependency Section

This section describes the dependencies between the jobs.

  <!-- part 3: list of control-flow dependencies -->
  <child ref="ID000002">
    <parent ref="ID000001" edge-label="edge1" />
  </child>
  <child ref="ID000003">
    <parent ref="ID000001" edge-label="edge2" />
  </child>
  <child ref="ID000004">
    <parent ref="ID000002" edge-label="edge3" />
    <parent ref="ID000003" edge-label="edge4" />
  </child>

Each child element contains one or more parent element. Either element refers to a job, dag or dax element id attribute using the ref attribute. In this version, we relaxed the xs:IDREF constraint in favor of a restriction on the xs:NMTOKEN type to permit a larger set of identifiers.

The parent element has an optional edge-label attribute.

Warning

The edge-label attribute is currently unused.

Its goal is to annotate edges when drawing workflow graphs.

16.1.1.7. Closing

As any XML element, the root element needs to be closed.

</adag>

16.1.2. DAX XML Schema Example

The following code example shows the XML instance document representing the diamond workflow.

<?xml version="1.0" encoding="UTF-8"?>
<adag xmlns="http://pegasus.isi.edu/schema/DAX"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX http://pegasus.isi.edu/schema/dax-3.6.xsd"
 version="3.6" name="diamond" index="0" count="1">
  <!-- part 1.1: invocations -->
  <invoke when="on_error">/bin/mailx -s &apos;diamond failed&apos; use@some.domain</invoke>

  <!-- part 1.2: included replica catalog -->
  <file name="f.a">
    <pfn url="file:///lfs/voeckler/src/svn/pegasus/trunk/examples/grid-blackdiamond-perl/f.a" site="local" />
  </file>

  <!-- part 1.3: included transformation catalog -->
  <executable namespace="diamond" name="preprocess" version="2.0" arch="x86_64" os="linux" installed="false">
    <profile namespace="globus" key="maxtime">2</profile>
    <profile namespace="dagman" key="RETRY">3</profile>
    <pfn url="file:///opt/pegasus/latest/bin/keg" site="local" />
  </executable>
  <executable namespace="diamond" name="analyze" version="2.0" arch="x86_64" os="linux" installed="false">
    <profile namespace="globus" key="maxtime">2</profile>
    <profile namespace="dagman" key="RETRY">3</profile>
    <pfn url="file:///opt/pegasus/latest/bin/keg" site="local" />
  </executable>
  <executable namespace="diamond" name="findrange" version="2.0" arch="x86_64" os="linux" installed="false">
    <profile namespace="globus" key="maxtime">2</profile>
    <profile namespace="dagman" key="RETRY">3</profile>
    <pfn url="file:///opt/pegasus/latest/bin/keg" site="local" />
  </executable>

  <!-- part 2: definition of all jobs (at least one) -->
  <job namespace="diamond" name="preprocess" version="2.0" id="ID000001">
    <argument>-a preprocess -T60 -i <file name="f.a" /> -o <file name="f.b1" /> <file name="f.b2" /></argument>
    <uses name="f.b2" link="output" register="false" transfer="true" />
    <uses name="f.b1" link="output" register="false" transfer="true" />
    <uses name="f.a" link="input" />
  </job>
  <job namespace="diamond" name="findrange" version="2.0" id="ID000002">
    <argument>-a findrange -T60 -i <file name="f.b1" /> -o <file name="f.c1" /></argument>
    <uses name="f.b1" link="input" register="false" transfer="true" />
    <uses name="f.c1" link="output" register="false" transfer="true" />
  </job>
  <job namespace="diamond" name="findrange" version="2.0" id="ID000003">
    <argument>-a findrange -T60 -i <file name="f.b2" /> -o <file name="f.c2" /></argument>
    <uses name="f.b2" link="input" register="false" transfer="true" />
    <uses name="f.c2" link="output" register="false" transfer="true" />
  </job>
  <job namespace="diamond" name="analyze" version="2.0" id="ID000004">
    <argument>-a analyze -T60 -i <file name="f.c1" /> <file name="f.c2" /> -o <file name="f.d" /></argument>
    <uses name="f.c2" link="input" register="false" transfer="true" />
    <uses name="f.d" link="output" register="false" transfer="true" />
    <uses name="f.c1" link="input" register="false" transfer="true" />
  </job>

  <!-- part 3: list of control-flow dependencies -->
  <child ref="ID000002">
    <parent ref="ID000001" />
  </child>
  <child ref="ID000003">
    <parent ref="ID000001" />
  </child>
  <child ref="ID000004">
    <parent ref="ID000002" />
    <parent ref="ID000003" />
  </child>
</adag>

The above workflow defines the black diamond from the abstract workflow section of the Introduction chapter. It will require minimal configuration, because the catalog sections include all necessary declarations.

The file element defines the location of the required input file in terms of the local machine. Please note that

  • The file element declares the required input file "f.a" in terms of the local machine. Please note that if you plan the workflow for a remote site, the has to be some way for the file to be staged from the local site to the remote site. While Pegasus will augment the workflow with such ancillary jobs, the site catalog as well as local and remote site have to be set up properlyl. For a locally run workflow you don't need to do anything.

  • The executable elements declare the same executable keg that is to be run for each the logical transformation in terms of the remote site futuregrid. To declare it for a local site, you would have to adjust the site attribute's value to local. This section also shows that the same executable may come in different guises as transformation.

  • The job elements define the workflow's logical constituents, the way to invoke the keg command, where to put filenames on the commandline, and what files are consumed or produced. In addition to the direction of files, further attributes determine whether to register the file with a replica catalog and whether to transfer it to the output site in case of a product. We are only interested in the final data product "f.d" in this workflow, and not any intermediary files. Typically, you would also want to register the data products in the replica catalog, especially in larger scenarios.

  • The child elements define the control flow between the jobs.