Pegasus 4.3.0 Released

We are happy to announce the release of Pegasus 4.3.

Pegasus 4.3 is a major release of Pegasus which contains all the enhancements and bugfixes in 4.2.2 release

New features and Improvements in 4.3 include

improvements to pegasus lite and optimizations for input file staging in non shared filesystem deployments
support for output mappers, allowing users finer grained control over where to place the outputs on an output site
support for SSH based submissions on top of Condor BOSCO.
substantial improvements to pegasus-kickstart including ability to track peak memory usage for jobs
improvements to pegasus-s3 , pegasus-analyzer and pegasus-transfer
new tool called pegasus-archive

New Features

Support for bypassing transfer of input files via the staging site

In the non shared filesystem deployments ( pegasus.data.configuration = nonsharedfs|condorio) users, can now setup pegasus to transfer the input files directly to the worker nodes without going through the staging site. This can be done by setting the following property to true

pegasus.transfer.bypass.input.staging

In the nonsharedfs case, if the input files are already present on a shared disk accessible from the worker nodes, pegasus lite can symlink instead of copying them over to the local directory on the worker node. The cleanup algorithm was updated to ignore files that are directly pulled to the worker nodes from the input site locations.
Support for DAX generation jobs in hierarchical workflows

Pegasus now has support for having a dax generation job in the workflow. This allows users to add long running dax generation jobs as a compute job in the workflow, that can be run remotely. These dax generation jobs need to be a parent of the associated DAX job. Pegasus will ensure that the DAX generated on a remote site is brought to the local site for the associated sub workflow corresponding to the DAX job to be planned.

Earlier, the only way for hierarchal workflows was that the DAX’es for the sub workflows had to be pre generated and the paths to the dax files was specified in the DAX jobs. Pegasus did not automatically handle separate DAX generations jobs out of the box. More details can be found at https://jira.isi.edu/browse/PM-667. There is an example checked in the share/pegasus/examples/dynamic-hierarchy directory.
Support for output mappers

Pegasus now has support for output mappers, that allow users fine grained control over how the output files on the output site are laid out. The mappers can be configured by setting the following property

pegasus.dir.storage.mapper

The following mappers are supported

Flat: By default, Pegasus will place the output files in the storage directory specified in the site catalog for the output site.

Hashed: This mapper results in the creation of a deep directory structure on the output site, while populating the results. The base directory on the remote end is determined from the site catalog. Depending on the number of files being staged to the remote site a Hashed File Structure is created that ensures that only 256 files reside in one directory. To create this directory structure on the storage site, Pegasus relies on the directory creation feature of the Grid FTP server, which appeared in globus 4.0.x

Replica: This mapper determines the path for an output file on the output site by querying an output replica catalog. The output site is one that is passed on the command line. The output replica catalog can be configured by specifiing the properties

pegasus.dir.storage.mapper.replica Regex|File

pegasus.dir.storage.mapper.replica.file the RC file at the backend to use if using a file based RC
Support for SSH based submissions

Pegasus now exposes a ssh style to enable submission to remote sites using SSH. This builds upon the Condor BOSCO functionality that allows for submission over ssh.

Check out the bosco-shared-fs example in the share/pegasus/examples directory for a sample site catalog and configuration.
Support for JDBC based Replica Catalog

Resurrected support for JDBC backed Replica Catalog in Pegasus. Users can use pegasus-rc-client to interact with the JDBC backend.
Reduced Dependencies for create dir jobs

Pegasus earlier added edges between create dir jobs and all the compute jobs scheduled for that particular site. Pegasus now adds edges from the create dir job to a compute job only if a create dir job is not reachable from one of the parents of the job. This strategy is now the default for 4.3.
New tool called pegasus-archive

Pegasus 4.3 has a new tool called pegasus-archive that compresses a workflow submit directory in a way that

allows pegasus-dashboard, pegasus-statistics, pegasus-plots, and pegasus-analyzer to keep working. More information can be found in the manpage for the tool.
pegasus-transfer enhancements
The internal pegasus-transfer tool was improved to do multi-hop staging in the case of two incompatible protocols being used for source and destination of a transfer. For example, if a workflow requires the transfer of a file from GridFTP to S3, pegasus-transfer will split the transfer up into two transfers: GridFPT->Local and Local->S3. This is transparent to the end-user and the Pegasus planner.
pegasus-mpi-cluster enhancements

Added a –maxfds to control size of FDCache. This argument to PMC that enables the user to set the maximum number of file descriptors that will be cached by PMC in I/O forwarding. This is to help SCEC accomplish coscheduling on BlueWaters.
pegasus-kickstart can track peak memory usage for the jobs launched by itpegasus-kickstart now add per-pid I/O, memory and CPU usage. These changes add one or more <proc> elements inside all of the <*job> elements. The new <proc> elements are only available on Linux systems with kernels >2.5.60 that support ptrace with exit events. The new <proc> element contains information about
1. the peak memory usage of each child process,
2. the start and end times of the processes,
3. the number of characters and bytes read and written,
4. the utime and stime, and the pid and parent pid.
This information can be used to compute the resource usage of a job
pegasus-kickstart enhancementsAdded a -q option to reduce output
This new option omits the <data> part of the <statcall> recordsfor stdout and stderr if the job succeeds. If the job fails, then the <data> is included.

When kickstart is executed on a Linux kernel >= 3.0, logic in the machine extensions code of kickstart prevented the proc statistics gathering, because it was a reasonable assumption that the API might have changed (it did between 2.4 and 2.6). This restriction is now removed.

The behavior of the -B option was changed so that it grabs the last N bytes of stdio instead of the first N bytes of stdio if thesize of stdio is larger than the -B option.

The invocation record that kickstart writes out is now consistent with new invocation schema version 2.2

This version adds the <proc> element under <*job>, and renames the <linux>/<proc> element to <linux>/<procs> to eliminate the name collision.

pegasus-kickstart now sets the max size of a single argument to 128k instead of earlier 2048 characters, which appears to be the individual limit in Linux. If the total size of all the arguments is over the total limit, then execve will fail, so we don’t try to catch that in the argument parser.
pegasus-s3 enhancements

The put -b/–create-bucket option was made more efficient. There is no need to check if the bucket exists before calling create_bucket because it is a noop if the bucket already exists.pegasus-s3 does not rely on mmap for upload and download. This should reduces the memory usage of pegasus-s3 for large files.

Updated the boto version Boto 2.5.2 to better support multipart uploads.

Added upload rate info to put command output.

pegasus-s3 now supports transfers from one s3 bucket to another.
pegasus-analyzer enhancements

pegasus-analyzer earlier did not detect prescript failures. If a job’s prescript failed ( for example the planner instance on a subworkflow for a hierarchical workflow ) , then that failure was not recorded in the monitoring database. This led pegasus-analyzer to not report the prescript failures. Changes were made in the monitoring daemon to ensure those errors are detected and associated correctly in the database. More details can be found at https://jira.isi.edu/browse/PM-704

pegasus-analyzer can be used to debug pegasus lite workflows now using the –debug-job option.It facilitates the debugging of a failed pegasus lite job by creating a shell script that can be run locally. The –debug-job option creates a shell script in a tmp directory that can be executed to pull in the input data and execute the job. It also now has a –local-executable option that can be used to pass to the local executable for the job that is being debugged.
pegasus-statistics can generate statistics across multiple root workflows

pegasus-statistics now has a -m option to generate statitsics across multiple root workflows. User can pass either multiple workflow submit directories or workflow uuids separated by whitespace.

This feature is also useful is the runs for multiple root workflows are populated in the same database in mysql.

For e.g

pegasus-statistics -Dpegasus.monitord.output=mysql://user:password@host:port/databasename -s all -m
pegasus-lite stages out output files in case of failureIn the nonsharedfs case, PegasusLite now always attempt to transfer the output files even if the main command of the script fails.
Details at https://jira.isi.edu/browse/PM-701
Directory backed Replica Catalog now supports flat lfn’s

By default the directory based replica catalog backend constructs deep lfn’s while traversing an input directory.

For example, if input directory is points to a directory input then

input/deep/f.a file results in LFN called deep/f.a

If a user sets, pegasus.catalog.replica.directory.flat.lfn to true

then the leaf is only constructed for creating the lfn.

For example

input/deep/f.a will result in lfn f.a
Updated jglobus jars and globus rls client jar
Pegasus now ships with updated jglobus and globus rls client jars that allow us to use the proxies generated using newer certificates to authenticate against a RLS deployment. The RLS client jar shipped with pegasus works with JGlobus 2.0.5.
Updated proxy detection logic in the planner.

pegasus.local.env property is no longer supported. To use it users need to just do env.VARIABLE_NAME in their properties. The planner now uses GSSManager class from jglobus to determe the DN of the proxy for writing out in the braindump file.
Support for SQLite 3.7.13 for the stampede statistics layer

SQLIte 3.7.12 introduced a bug as to how the nested aggregate queries are handled. This is fixed in version 3.7.14 , but version 3.7.13 is what Debian installs when one does it through apt. The query that generates the jobs.txt file was updated so as to not to fail . The update query works across all the recent SQLite versions
Changes to tutorial VM image

The tutorial image was updated so that the udev persistent rules for eth0 are disabled. Added a GNOME X desktop to the VM. The VM image can now grow to 8GB
dax2dot now implements transitive reduction algorithm to reduce extra edges to the workflow

The dax2dot now implments a transitive reduction algorithm to remove extra edges from the workflow. It also has Improved handling of -f option. This fixes PM-721 by treating files and jobs as equivalent Nodes so that transitive reduction works in the case where the DAG contains a mix of File Nodes and Job Nodes. Non-redundant Job-Job edges will still be rendered if the user specifies -f, but redundant edges will be removed. If the user specifies both -f and -s, then there will be many redundant edges in a typical workflow. Sometimes the -f option will cause cycles in the graph (e.g. files with “inout” linkage, or jobs with a file that is both an input and output). In those cases the -s option must also be specified.
Better handling in monitord for submit host crashes

monitord now detects consecutive workflow started events. In this case, it inserts an intervening workflow end event with status set to 2 to indicate unknown failure. This case can happen, when condor dies on the submit host, say because of power failure. The intervening workflow end event is inserted to ensure that the queries don’t to the database don’t fail because of mismatched start and end events.
Application Metrics Reporting

Applications can now enable the planer to pass application defined metrics to the metrics server.

This allows the metrics on the server to be grouped by application name.

In order to do that, please set the property

pegasus.metrics.app application-name

Users can also associated arbitary key value pairs that can be passed to the server.

pegasus.metrics.app.attribute1 value1
Change of maxpre settings for hierarchal workflows

Changed the default for maxpre from 2 to 1. More sensible in context of ensemble manager.

Bugs Fixed

memory explosion for monitord when parsing large PMC workflows

For large SCEC workflows using PMC it was noticed that monitord memory usage exploded when parsing large hierarchal workflows with PMC enabled ( tens of thousands of jobs in one PMC cluster). This is now fixed . More details can be found at https://jira.isi.edu/browse/PM-712
kickstart segfault in tracing

If the job forks lots of children then the size of the buffer for the

final invocation record fills up with <proc> tags and causes a segfault.
kickstart segafault on missing argument

kickstart segfaulted on missing arguments. This is now fixed.
pegasus-dagman used pegasus bindir in the search path for determining condor location

Fixed bug bringing in the location of Pegasus when determining which HTCondor to use
JAVA DAX API stdout| stderr handling

Changed the handling for stdout | stderr | stdin files in the JAVA DAX API. Corresponding uses files are now only added when we are printing out the ADAG contents in the toXML method if not already specified by the user. This also removes the warning messages, where a user adds a uses section for a stdout file with different transfer and register flags explicitly in their DAX generators.
Fix for heft site selector

The Heft site selector was not correctly initialized if a user did not specify any execution sites on the command line. this is now fixed.