## Pegasus 4.2.x Series

### Pegasus 4.2.2

**Release Date:**  May 1, 2013

Pegasus 4.2.2 is a minor release, that has minor enhancements and
fixes some bugs to Pegasus 4.2.0 release.  Improvements include

- support for sever side pagination for pegasus-dashboard
- support for lcg-utils command line clients to retrieve and
  push data to SRM servers
- installation of Pegasus python libraries  in standard system
  locations


#### IMPROVEMENTS

1) Rotation of monitord logs

   monitord is automatically launched by pegasus-dagman.  When
   launching monitord, pegasus-dagman sets up the monitord to a log
   file it initializes. However monitord also took a backup of the log
   when it started up as it detected the log file existed. This led to
   two monitord log files in the submit directory which was
   confusing. Now only pegasus-dagman setsup the monitord log.

    More details can be found at
     PM-688 [\#806](https://github.com/pegasus-isi/pegasus/issues/806)

2) Monitord Recovery in case of SQLLite DB

   If a monitord gets killed on a currently running workflow, then it
   restarts from the start. The information in the recovery file it
   writes out is insufficient to recover gracefully. In case of
   SQLlite DB , monitord does not attempt to expunge the information
   from the database. Instead it takes a backup of the sqlite database
   in the submit directory.

   More details can be found at
    PM-689 [\#807](https://github.com/pegasus-isi/pegasus/issues/807)

3) Support for lcg-utils for srm transfers

   The pegasus-create-dir, pegasus-cleanup and pegasus-transfer
   clients were updated to include support for lcg utils to do
   operations against a SRM server

    Note that lcg utils takes precedence if both lcg-cp and srm-copy
    are available.

4) Improvements to the dashboard

   - Use Content Delivery Networks as source for jQuery, jQueryUI, and
     DataTables plugin.
   - Most tables in dashboard now have server side pagination, to
      enable large workflows.
   - Replaced radio buttons with jQuery buttons for a better look and
     feel.
   - Made Statistics/Charts links more prominent.
   - Added a drop down to filter list of workflows run in last hour,
     day, month, or year.

5) Newer examples added in the examples directory

   The release has new examples checked in that highlight
     - how to use the nonshared fs against a remote staging site that
       has a scp server.
     - use glite submission to a local PBS cluster using the sharedfs
       data configuration
     - use the nonsharedfs case, where we use SRM as a staging site
       using CREAMCE submission

6) Pegasus python libraries are installed in standard system locations

   The RPM and DEB packages now installs the Python modules in the
   standard system locations. Users should no longer have to set
   PYTHONPATH or add to the include paths in their DAX generators.

7) Condor job logs are no longer in the /tmp directory

   pegasus.condor.logs.symlink now defaults to false. This is to
   ensure compatibility with condor 7.9.4 onwards and ticket
   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1419 DAGMAn
   will fail by default now if it detects that common log is in /tmp

#### BUGS FIXED

1) Externally Accessible URL's for staged executables broken for SRM

   In certain cases, for SRM file servers in the site catalog, the URL
   constructed to a staged executable was incorrect. This is now
   fixed.

   More details can be found at
    PM-686 [\#804](https://github.com/pegasus-isi/pegasus/issues/804)

2) pegasus-exitcode cluster-summary w/submitted=0

   If the output file has a cluster-summary record, and  the number of
   submitted tasks is 0, then the job succeeded. This fixes an error
   SCEC had that was  introduced when the "tasks" and "submitted"
   values in cluster-summary were separated for PMC.

3) Pegasus Lite did not support jobs with stdin file tracked in the DAX

   In the pegasus lite case, support for jobs with their stdin tracked
   in the DAX was broken. This is now fixed.

   More details can be found at
    PM-694 [\#812](https://github.com/pegasus-isi/pegasus/issues/812)

4) pegasus-cleanup did not support symlink deletion

   In case where symlinks to the input files are created in the
   scratch directory on the staging-site, the pegasus-cleanup job was
   created with symlink urls to be deleted. This led to the jobs
   failing as pegasus-cleanup did not support deletion of
   symlinks.This is now fixed .

   Additionally, the planner sets up the cleanup jobs to run on the
   remote if the url to b deleted is a file url or a symlink url

   More details can be found at
    PM-696 [\#814](https://github.com/pegasus-isi/pegasus/issues/814)

5) pegasus-createdir and pegasus-transfer with S3 buckets

   pegasus-createdir and pegasus-transfer did translate the S3 bucket
   name correctly if it contained a -. This is now fixed. Also the
   clients don't fail if the bucket already exists.

6) Bug fixes to the cleanup algorithm

   The planner exited with an index out of bounds exception when data
   reuse was triggered and an output file that needed to be staged was
   required to be deleted. This is fixed

   Also, the clustering of the cleanup jobs resulted in not all the
   files to be deleted by the cleanup jobs.

   Improvements were made how excess edges were removed from the
   graph. The edge removal was done per file instead of per cleanup
   job. This fix drastically reduces the runtime for workflows with
   lots of files that need to be cleaned up.

   More details can be found  at
    PM-699 [\#817](https://github.com/pegasus-isi/pegasus/issues/817)

7) pegasus-analyzer detects prescript failures in the DB mode

   Pegasus analyzer in the database mode was not detecting pre script
   failures for dax jobs as the associated job instance was not
   updated with the exitcode. Changed the way how monitord handles
   failures for sub workflows. In case of pre script failures, the
   prescript failure exitcode is recorded in addition to the stdout of
   the planner log. More details at

    PM-704 [\#822](https://github.com/pegasus-isi/pegasus/issues/822)


8) monitord tracks non kickstarted  files with rotated stdout and stderr files

   monitord did not track the rotated stdout and stderr of jobs that
   were not launched by kickstart. Because of this the stdout and
   stderr was not populated. This is now fixed. More details at

   PM-685 [\#803](https://github.com/pegasus-isi/pegasus/issues/803)

9) Planner fails on determining the DN from a proxy file

   The planner uses the Java COG jar to determine the DN from a proxy file. It
   was discovered that for proxies generated from  an X.509 end entity credential,
   by a GSI-enabled OpenSSH server results in a NPE in the COG jar.

   The planner now catches all the exceptions while trying to determine the DN.
   There is never a FATAL error if unable to determine the DN.

10) pegasus-exitcode checks for the existence of .err file

   The pegasuslite_failures function did not check for missing stderr files. As a
   result, if exitcode was called in a scenario where there was no .err file, then
   it failed trying to determine if None is a valid path.

### Pegasus 4.2.1

**Release Date:** April 27, 2013

Pegasus 4.2.1 was tagged but never officially released. Users are advised to use
4.2.2 instead.  The difference between 4.2.2 and 4.2.1 is that it does not have
the fix for

- planner fails if there is NullPointerException in the underlying COG code
  when trying to determine the DN.
- pegasus-exitcode checks for existense of .err file before trying to base the
  exitcode on it's contents.

### Pegasus 4.2.0

**Release Date:** January 29, 2013

This a major release of Pegasus which contains

 - several improvements on data management capabilities
 - a new web based monitoring dashboard
 - job submission interfaces supported. CREAM CE is now supported
 - new replica catalog backends.
 - support for PMC only workflows and IO forwarding for PMC clustered jobs
 - anonymous usage metrics reporting.

The data management improvements include a new simpler site catalog
schema to describe the site layouts, and enables data to be
transferred to and from staging sites using different protocols. A
driving force behind this change was Open Science Grid, in which it is
common for the compute sites to have Squid caches available to the
jobs.  For example, Pegasus workflows can now be configured to stage
data into a staging site using SRM or GridFTP, and stage data out over
HTTP. This allows the compute jobs to automatically  use the Squid
caching mechanism provided by the sites, when pulling in data to the
worker nodes over HTTP.

Also, with the release we include a beta version of a web based
monitoring dashboard (built on flask) that users can use to monitor and
debug their running workflows. The dashboard provides workflow overview,
graphs and job status/outputs.

Job submissions to the CREAM job management system has been implemented
and tested.

New simpler replica catalog backends are included that allow the user
to specify the input directory where the input files reside instead of
specifying a replica catalog file that contains the mappings.

There is prototypical support for setting up Pegasus to generate the
executable workflow as a PMC task workflow instead of a Condor DAGMan
workflow. This is useful for environments, where Condor cannot be
deployed such as Blue Waters. I/O forwarding in PMC enables each task
in a PMC job to write data to an arbitrary number of shared files in a
safe way. This is useful for clustered jobs that contain lots of tasks
and each task only writes out a few KB of output data.

Starting with this release, Pegasus will send anonymous usage statistics
to the Pegasus development team. Collecting this anonymous information
is mandated by the main Pegasus funding agency, NSF. Please refer to
http://pegasus.isi.edu/wms/docs/latest/funding_citing_usage.php#usage_statistics
for more details on our privacy policy and configuration.

#### NEW FEATURES

1) New Site Catalog Schema

   Pegasus 4.2 release introduces a version 4 for the site catalog
   schema. The schema represents a simpler way to describing a site
   and organizes the site information by various directories
   accessible on the site for the workflow to use.

   The schema is described in our user guide here
   http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#sc-XML4

   and examples in our distribution have been updated to use the new
   schema. Sample site catalog files in the newer format can also be
   found in the etc directory.

   With 4.2, Pegasus will autoload the appropriate site catalog schema
   backend by inspecting the version number in the site catalog file
   at runtime.

   Users can use the client pegasus-sc-converter to convert their
   existing site catalogs in XML3 format to the newer versions. Users
   can choose to specify pegasus.catalog.site as XML or leave it
   undefined.

   The 4.2 release no longer supports the following old Site Catalog
   Implementations
   		   - VORS
		   - MYOSG
		   - XML2
		   - Text

2) Improved Data Management Capabilities

   Users can now specify different protocols to push data to a
   directory on the staging site and retrieve data from the directory
   using another protocol.

   For example users can use a HTTP file server to retrieve data (
   pull ) data from a staging site to worker node and use another
   protocol say scp to push data back to the staging site after a job
   completes. This is particularly useful when you want to leverage a
   high throughput HTTP deployment backed by SQUID proxies when
   serving input data to compute nodes.

   Users can specify different file servers for a particular directory
   by specifying different operation attribute on the file
   servers. The operation attribute can take enumerated set of values

      - put  ( use the server only for put operations to the directory )
      - get ( use the server only for get operations to the directory)
      - all ( use it for both get and put operations)


3) Online Workflow Dashboard

   This release includes a beta version of a web based monitoring
   dashboard ( built on flask ) that users can use to monitor and
   debug their running workflows.

   The dashboard is meant to be run per user, and lists all the
   workflows run by that user. The dashboard gets a list of running
   workflows by looking up a sqlite database in the users home
   directory ~HOME/.pegasus/workflow.db . This database is populated
   by pegasus-monitord everytime a new root workflow is
   executed. Detailed information for each workflow is retrieved from
   the stampede database for the each workflow.

   The workflow dashboard lists all the user workflows on the home
   page and are color coded. Green indicates a successful workflow,
   red indicates a failed workflow while blue indicates a running
   workflows.

   Users can click on a workflow to drill down and get more
   information about the workflow that leads to a workflow page. The
   workflow page has identifying metadata about the workflow, and has
   a tabbed interface that can be used to traverse through the list of
   sub workflows, failed, running and successful jobs.

   Each job or sub workflow can be clicked to get further details
   about that entity .Clicking on a failed/successful job will lead to
   an invocation details page that will have the contents of the
   associated kickstart record displayed.

   The charts button can be clicked to generate relevant charts about
   the workflow execution such as the
       - Workflow Gantt Chart
       - Job Distribution by Count/Time
       - Time Chart by Job/Invocation


   The statistics button can be clicked to display a page that lists
   the statistics for a particular workflow. The statistics page
   displays statistics similar to what the command line tool
   pegasus-statistics displays.


   The workflow dashboad can be started by a  a command line tool
   called pegasus-dashboard.


4) Usage Statistics Collection

   Pegasus WMS is primarily a NSF funded project as part of the NSF
   SI2 track. The SI2 program focuses on robust, reliable, usable and
   sustainable software infrastructure that is critical to the CIF21
   vision. As part of the requirements of being funded under this
   program, Pegasus WMS is required to gather usage statistics of
   Pegasus WMS and report it back to NSF in annual reports. The
   metrics will also enable us to improve our software as they will
   include errors encountered during the use of our software.

   More details about our policy and metrics collected can be found
   online at
   http://pegasus.isi.edu/wms/docs/latest/funding_citing_usage.php#usage_statistics


5) Support for CREAMCE submissions

   CREAM is a webservices based job submission front end for remote
   compute clusters. It can be viewed as a replaced for Globus GRAM
   and is mainly popular in Europe. It widely used in the Italian
   Grid.

   In order to submit a workflow to compute site using the CREAMCE
   front end, the user needs to specify the following for the site in
   their site catalog

   	 - pegasus profile style with value set to cream
	 - grid gateway defined for the site with contact attribute
	   set to CREAMCE frontend and scheduler attribute to remote
	   scheduler.
	 -  a remote queue can be optionally specified using globus
	  profile queue with value set to queue-name

   More details can be found here

   http://pegasus.isi.edu/wms/docs/latest/execution_environments.php#creamce_submission

6) Initial Support for PMC only workflows

   Pegasus can now be configured to generate a workflow in terms of a
   PMC input workflow. This is useful to run on platforms where it not
   feasible to run Condor such as the new XSEDE machines such as Blue
   Waters. In this mode, Pegasus will generate the executable workflow
   as a PMC task workflow and a sample PBS submit script that submits
   this workflow .

   Users can modify the generated PBS script to tailor it to their
   particular cluster.

   To use Pegasus in this mode, set

   pegasus.code.generator PMC

   In this mode, the workflow should be configured to submit to a
   single execution site.

6) New options --input-dir and output-dir for  pegasus-plan

   The planner now has --input-dir and --output-dir options. This
   allows the planner to read mappings for input files from an input
   directory and stage the results to an output directory.

   If , the output-dir option is set then the planner updates the storage
   directory for the output site specified by the user. If none is
   specified , then the local site entry is updated.

7) Directory based Replica Catalog

   Users can now setup Pegasus to read the input file mappings from an
   input directory. Details on how to use and configure Pegasus in
   this mode can be found here

   http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#idp11375504

8) Regular Expressions support in File based Replica Catalog

   Users can now specify a regex expression in a file based replica
   catalog to specify paths for mulitple files/data sets.

   To use it you need to set
   pegasus.catalog.replica to Regex

   More details can be found here

   http://pegasus.isi.edu/wms/docs/latest/creating_workflows.php#idp11375504


9) Support for IO Forwarding in PMC ( pegasus-mpi-cluster )

   In workflows that have lots of small tasks it is common for the I/O
   written by those tasks to be very small. For example, a workflow
   may have 10,000 tasks that each write a few KB of data. Typically
   each task writes to its own file, resulting in 10,000 files. This
   I/O pattern is very inefficient on many parallel file systems
   because it requires the file system to handle a large number of
   metadata operations, which are a bottleneck in many parallel file
   systems.

   In order to address this use case PMC implements a feature that we
   call "I/O Forwarding". I/O forwarding enables each task in a PMC
   job to write data to an arbitrary number of shared files in a safe
   way. It does this by having PMC worker processes collect data
   written by the task and send it over over the high-speed network
   using MPI messaging to the PMC master  process, where it is written
   to the output file. By having one process  (the PMC master process)
   write to the file all of the I/O from many parallel  tasks can be
   synchronized and written out to the files safely.

   More details on how IO Forwarding works can be found in the manpage
   for PMC under the section I/O Forwarding

   http://pegasus.isi.edu/wms/docs/trunk/cli-pegasus-mpi-cluster.php


10) Clustering of cleanup jobs

    The InPlace cleanup algorithm that adds cleanup jobs to the
    executable workflow , now clusters the cleanup jobs by
    default for each level of the workflow . This keeps in check the
    number of cleanup jobs created for large workflows.

    The number of cleanup jobs added per level can be set by the
    following  property

    pegasus.file.cleanup.clusters.num

    It defaults to 2.

11) Planner has support for SHIWA Bundles

    The planner can be take in shiwa bundles to execute workflows. For
    this to happen, the bundle need to be created in shiwa gui with
    the appropriate Pegasus Plugins

    More details at
     PM-638 [\#756](https://github.com/pegasus-isi/pegasus/issues/756)

12) Improvements to pegasus-statistics

    There is now a single API call executed to get the succeeded and
    failed count for job and sub workflows.

13) Improvements to planner performance

    The performance of the planner has been improved for large
    workflows.

14) Renamed --output option to --output-site

   The --output option has been deprecated and replaced by a new
   option --output-site

15) Removed support for pegasus.dir.storage

   Pegasus no longer supports pegasus.dir.storage property. The
   storage directory can only be specified in the site catalog for a
   site.


#### BUGS FIXED

1) Failure in Data Reuse if a deleted job had an output file that had
to be transferred

   There was a bug, where the planner failed in case of data reusue if
   any of the deleted jobs had output files that needed to be
   transferred.

   More details at
    PM-675 [\#793](https://github.com/pegasus-isi/pegasus/issues/793)