Pegasus 4.2.0 Released

Pegasus 4.2 is a major release of Pegasus which contains

several improvements on data management capabilities,
a new web based monitoring dashboard
job submission interfaces supported. CREAM CE is now supported
new replica catalog backends.
support for PMC only workflows and IO forwarding for PMC clustered jobs
anonymous usage metrics reporting

The data management improvements include a new simpler site catalog schema to describe the site layouts, and enables data to be transferred to and from staging sites using different protocols. A driving force behind this change was Open Science Grid, in which it is common for the compute sites to have Squid caches available to the jobs. For example, Pegasus workflows can now be configured to stage data into a staging site using SRM or GridFTP, and stage data out over HTTP. This allows the compute jobs to automatically use the Squid caching mechanism provided by the sites, when pulling in data to the worker nodes over HTTP.

Also, with this release we include a beta version of a web based monitoring dashboard (built on flask) that users can use to monitor and debug their running workflows. The dashboard provides workflow overview, graphs and job status/outputs.

Job submissions to the CREAM job management system has been implemented and tested.

New simpler replica catalog backends are included that allow the user to specify the input directory where the input files reside instead of specifying a replica catalog file that contains the mappings.

There is prototypical support for setting up Pegasus to generate the executable workflow as a PMC task workflow instead of a Condor DAGMan workflow. This is useful for environments, where Condor cannot be deployed such as Blue Waters. I/O forwarding in PMC enables each task in a PMC job to write data to an arbitrary number of shared files in a safe way. This is useful for clustered jobs that contain lots of tasks and each task only writes out a few KB of output data.

Starting with this release, Pegasus will send anonymous usage statistics to the Pegasus development team. Collecting this anonymous informationis mandated by the main Pegasus funding agency, NSF. Please refer to

http://pegasus.isi.edu/wms/docs/latest/funding_citing_usage.php#usage_statistics

for more details on our privacy policy and configuration.

Any questions on the release, please send email to pegasus-support@isi.edu

New Features

New Site Catalog Schema

Pegasus 4.2 release introduces a version 4 for the site catalog schema. The schema represents a simpler way to describing a site and organizes the site information by various directories accessible on the site for the workflow to use.

The schema is described in our user guide here

http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#sc-XML4

and examples in our distribution have been updated to use the new schema. Sample site catalog files in the newer format can also be found in the etc directory.

With 4.2, Pegasus will autoload the appropriate site catalog schema backend by inspecting the version number in the site catalog file at runtime.

Users can use the client pegasus-sc-converter to convert their existing site catalogs in XML3 format to the newer versions. Users can choose to specify pegasus.catalog.site as XML or leave it undefined.

The 4.2 release no longer supports the following old Site Catalog Implementations

– VORS

– MYOSG

– XML2

– Text
Improved Data Management Capabilities

Users can now specify different protocols to push data to a directory on the staging site and retrieve data from the directory using another protocol.

For example users can use a HTTP file server to retrieve data ( pull ) data from a staging site to worker node and use another protocol say scp to push data back to the staging site after a job completes. This is particularly useful when you want to leverage a high throughput HTTP deployment backed by SQUID proxies when serving input data to compute nodes.

Users can specify different file servers for a particular directory by specifying different operation attribute on the file servers. The operation attribute can take enumerated set of values

– put ( use the server only for put operations to the directory )

– get ( use the server only for get operations to the directory)

– all ( use it for both get and put operations)
Online Workflow Dashboard

This release includes a beta version of a web based monitoring dashboard ( built on flask ) that users can use to monitor and debug their running workflows.

The dashboard is meant to be run per user, and lists all the workflows run by that user. The dashboard gets a list of running workflows by looking up a sqlite database in the users home directory ~HOME/.pegasus/workflow.db . This database is populated by pegasus-monitord everytime a new root workflow is executed.

Detailed information for each workflow is retrieved from the stampede database for the each workflow.

The workflow dashboard lists all the user workflows on the home page and are color coded. Green indicates a successful workflow, red indicates a failed workflow while blue indicates a running workflows.

Users can click on a workflow to drill down and get more information about the workflow that leads to a workflow page. The workflow page has identifying metadata about the workflow, and has a tabbed interface that can be used to traverse through the list of sub workflows, failed, running and successful jobs.

Each job or sub workflow can be clicked to get further details about that entity .Clicking on a failed/successful job will lead to an invocation details page that will have the contents of the associated kickstart record displayed.

The charts button can be clicked to generate relevant charts about the workflow execution such as the

– Workflow Gantt Chart

– Job Distribution by Count/Time

– Time Chart by Job/Invocation

The statistics button can be clicked to display a page that lists the statistics for a particular workflow. The statistics page displays statistics similar to what the command line tool pegasus-statistics displays.

The workflow dashboad can be started by a a command line tool called pegasus-dashboard.
Usage Statistics Collection

Pegasus WMS is primarily a NSF funded project as part of the NSF SI2 track. The SI2 program focuses on robust, reliable, usable and sustainable software infrastructure that is critical to the CIF21 vision. As part of the requirements of being funded under this program, Pegasus WMS is required to gather usage statistics of Pegasus WMS and report it back to NSF in annual reports. The metrics will also enable us to improve our software as they will include errors encountered during the use of our software.

More details about our policy and metrics collected can be found online at http://pegasus.isi.edu/wms/docs/latest/funding_citing_usage.php#usage_st…
Support for CREAMCE submissions

CREAM is a webservices based job submission front end for remote compute clusters. It can be viewed as a replaced for Globus GRAM and is mainly popular in Europe. It widely used in the Italian Grid.

In order to submit a workflow to compute site using the CREAMCE front end, the user needs to specify the following for the site in their site catalog

– pegasus profile style with value set to cream

– grid gateway defined for the site with contact attribute set to CREAMCE frontend and scheduler attribute to remote

scheduler.

– a remote queue can be optionally specified using globus profile queue with value set to queue-name

More details can be found here

http://pegasus.isi.edu/wms/docs/latest/execution_environments.php#creamc…
Initial Support for PMC only workflows

Pegasus can now be configured to generate a workflow in terms of a PMC input workflow. This is useful to run on platforms where it not feasible to run Condor such as the new XSEDE machines such as Blue Waters. In this mode, Pegasus will generate the executable workflow as a PMC task workflow and a sample PBS submit script that submits this workflow.

Users can modify the generated PBS script to tailor it to their particular cluster.

To use Pegasus in this mode, set

pegasus.code.generator PMC

In this mode, the workflow should be configured to submit to a single execution site.
New options –input-dir and output-dir for pegasus-plan

The planner now has –input-dir and –output-dir options. This allows the planner to read mappings for input files from an input directory and stage the results to an output directory.

If , the output-dir option is set then the planner updates the storage directory for the output site specified by the user. If none is specified , then the local site entry is updated.
Directory based Replica Catalog

Users can now setup Pegasus to read the input file mappings from an input directory. Details on how to use and configure Pegasus in this mode can be found here

http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#idp11375504
Regular Expressions support in File based Replica Catalog

Users can now specify a regex expression in a file based replica catalog to specify paths for mulitple files/data sets.

To use it you need to set

pegasus.catalog.replica to Regex

More details can be found here

http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#idp11375504
Support for IO Forwarding in PMC ( pegasus-mpi-cluster )

In workflows that have lots of small tasks it is common for the I/O written by those tasks to be very small. For example, a workflow may have 10,000 tasks that each write a few KB of data. Typically each task writes to its own file, resulting in 10,000 files. This I/O pattern is very inefficient on many parallel file systems because it requires the file system to handle a large number of metadata operations, which are a bottleneck in many parallel file systems.

In order to address this use case PMC implements a feature that we call “I/O Forwarding”. I/O forwarding enables each task in a PMC job to write data to an arbitrary number of shared files in a safe way. It does this by having PMC worker processes collect data written by the task and send it over over the high-speed network using MPI messaging to the PMC master process, where it is written to the output file. By having one process (the PMC master process) write to the file all of the I/O from many parallel tasks can be synchronized and written out to the files safely.

More details on how IO Forwarding works can be found in the manpage for PMC under the section I/O Forwarding

http://pegasus.isi.edu/wms/docs/trunk/cli-pegasus-mpi-cluster.php
Clustering of cleanup jobs

The InPlace cleanup algorithm that adds cleanup jobs to the executable workflow , now clusters the cleanup jobs by default for each level of the workflow . This keeps in check the number of cleanup jobs created for large workflows.

The number of cleanup jobs added per level can be set by the following property

pegasus.file.cleanup.clusters.num

It defaults to 2.
Planner has support for SHIWA Bundles

The planner can be take in shiwa bundles to execute workflows. For this to happen, the bundle need to be created in shiwa gui with the appropriate Pegasus Plugins

More details at

https://jira.isi.edu/browse/PM-638
Improvements to pegasus-statistics

There is now a single API call executed to get the succeeded and failed count for job and sub workflows.
Improvements to planner performance

The performance of the planner has been improved for large workflows.
Renamed –output option to –output-site

The –output option has been deprecated and replaced by a new option –output-site
Removed support for pegasus.dir.storage

Pegasus no longer supports pegasus.dir.storage property. The storage directory can only be specified in the site catalog for a site, or use the –output-directory option to the planner

Bugs Fixed

Failure in Data Reuse if a deleted job had an output file that had to be transferredThere was a bug, where the planner failed in case of data reusue if any of the deleted jobs had output files that needed to be transferred. More details athttps://jira.isi.edu/browse/PM-675