Pegasus 4.2.2 Released

with No Comments

Pegasus 4.2.2 is a minor release, that has minor enhancements and fixes bugs to Pegasus 4.2.0 release. Improvements in 4.2.2 include

  • support for sever side pagination for pegasus-dashboard
  • support for lcg-utils command line clients to retrieve and push data to SRM servers
  • installation of Pegasus python libraries  in standard system locations
  • examples for using CREAMCE and glite submissions

Improvements

  1. Improvements to the dashboard
    • Use Content Delivery Networks as source for jQuery, jQueryUI, and DataTables plugin.
    •  Most tables in dashboard now have server side pagination, to enable large workflows.
    •  Replaced radio buttons with jQuery buttons for a better look and feel.
    •  Made Statistics/Charts links more prominent.
    •  Added a drop down to filter list of workflows run in last hour, day, month, or year.
  2. Support for lcg-utils for srm transfers in pegasus-create-dir pegasus-cleanup and pegasus-transfer
    • the clients were updated to include support for lcg utils to do operations against a SRM server
      Note that lcg utils takes precedence if both lcg-cp and srm-copy are available.
  3. Newer examples added in the examples directory
    • the release has new examples checked in that highlight how to
      • use the nonshared fs against a remote staging site that has a scp server.
      • use glite submission to a local PBS cluster using the sharedfs data configuration
      • use the nonsharedfs case, where we use SRM as a staging site using CREAMCE submission
  4. Pegasus python libraries are installed in standard system locations
    • The RPM and DEB packages now installs the Python modules in the standard system locations. Users should no longer have to set PYTHONPATH or add to the include paths in their DAX generators.
  5. Rotation of monitord logs
    • monitord is automatically launched by pegasus-dagman.  When launching monitord, pegasus-dagman sets up the monitord to a log file it initializes. However monitord also took a backup of the log when it started up as it detected the log file existed. This led to two monitord log files in the submit directory which was confusing. Now only pegasus-dagman setsup the monitord log.
      More details can be found at
  6. Monitord Recovery in case of SQLLite DB
    • If a monitord gets killed on a currently running workflow, then it restarts from the start. The information in the recovery file it writes out is insufficient to recover gracefully. In case of SQLlite DB , monitord does not attempt to expunge the information from the database. Instead it takes a backup of the sqlite database in the submit directory.
      More details can be found at
  7. Condor job logs are no longer in the /tmp directory

Bugs Fixed:

  1. Externally Accessible URL’s for staged executables broken for SRM
    • In certain cases, for SRM file servers in the site catalog, the URL constructed to a staged executable was incorrect. This is now fixed.
      More details can be found at
  2. pegasus-exitcode cluster-summary w/submitted=0
    • If the output file has a cluster-summary record, and  the number of submitted tasks is 0, then the job succeeded. This fixes an error SCEC had that was  introduced when the “tasks” and “submitted” values in cluster-summary were separated for PMC.
  3. Pegasus Lite did not support jobs with stdin file tracked in the DAX
    • In the pegasus lite case, support for jobs with their stdin tracked in the DAX was broken. This is now fixed.
      More details can be found at
  4. pegasus-cleanup did not support symlink deletion
    • In case where symlinks to the input files are created in the scratch directory on the staging-site, the pegasus-cleanup job was created with symlink urls to be deleted. This led to the jobs failing as pegasus-cleanup did not support deletion of symlinks.This is now fixed
      Additionally, the planner sets up the cleanup jobs to run on the remote if the url to b deleted is a file url or a symlink url

      More details can be found at

  5. pegasus-createdir and pegasus-transfer with S3 buckets
    • pegasus-createdir and pegasus-transfer did translate the S3 bucket name correctly if it contained a -. This is now fixed. Also the clients don’t fail if the bucket already exists.
  6. Bug fixes to the cleanup algorithm
    • The planner exited with an index out of bounds exception when data reuse was triggered and an output file that needed to be staged was required to be deleted. This is fixed
    • Also, the clustering of the cleanup jobs resulted in not all the files to be deleted by the cleanup jobs.
    • Improvements were made how excess edges were removed from the graph. The edge removal was done per file instead of per cleanup job. This fix drastically reduces the runtime for workflows with lots of files that need to be cleaned up.

      More details can be found at

  7. pegasus-analyzer detects prescript failures in the DB mode
    • Pegasus analyzer in the database mode was not detecting pre script failures for dax jobs as the associated job instance was not updated with the exitcode. Changed the way how monitord handles failures for sub workflows. In case of pre script failures, the prescript failure exitcode is recorded in addition to the stdout of the planner log.

      More details at
      https://jira.isi.edu/browse/PM-704

  8. monitord tracks non kickstarted  files with rotated stdout and stderr files
    • monitord did not track the rotated stdout and stderr of jobs that were not launched by kickstart. Because of this the stdout and stderr was not populated. This is now fixed.

      More details at
      https://jira.isi.edu/browse/PM-685

  9. Planner fails on determining the DN from a proxy file
       The planner uses the Java COG jar to determine the DN from a proxy file. It
       was discovered that for proxies generated from  an X.509 end entity credential,
       by a GSI-enabled OpenSSH server results in a NPE in the COG jar.
       The planner now catches all the exceptions while trying to determine the DN.
       There is never a FATAL error if unable to determine the DN.
  10.  pegasus-exitcode checks for the existence of .err file
       The pegasuslite_failures function did not check for missing stderr files. As a
       result, if exitcode was called in a scenario where there was no .err file, then
       it failed trying to determine if None is a valid path.