Pegasus 4.5.4 Released

with No Comments

We are happy to announce the release of Pegasus 4.5.4. Pegasus 4.5.4 is a minor release, which contains minor enhancements and fixes bugs. This will most likely be the last release in the 4.5 series, and unless you have specific reasons to stay with the 4.5.x series, we recommend to upgrade to 4.6.0.

New Features

  • [PM-1003] – planner should report information about what options were used in the planner
    Planner now reports additional metrics such as command line options, whether PMC was used and number of deleted tasks to the metrics server.
  • [PM-1007] – “undelete” or attach/detach for pegasus-submitdir
    pegasus-submit dir has two new commands : attach, which adds the workflow to the dashboard (or corrects the path), and detach, which removes the workflow from the dashboard.
  • [PM-1030] – pegasus-monitord should parse the new dagman output that reports timestamps from condor user log
    Starting 8.5.2 , HTCondor DAGMan record sthe condor job log timestamps in the ULOG event messages in the end of the log message. monitord was updated to prefer these timestamps for the job events if present in the DAGMan logs.

Improvements

  • [PM-896] – Document events that monitord publishes
    The netlogger messages generated by monitord that are used for populated the workflow database and master database, are now documented at https://pegasus.isi.edu/wms/docs/4.5.4cvs/stampede_wf_events.php
  • [PM-995] – changes to Pegasus tutorial
    Pegasus tutorial was reorganized and simplified to focus more on the pegasus-dashboard, and debugging exercises
  • [PM-1033] – update monitord to handle updated log messages in dagman.out file
    Starting 8.5.x series, some of the dagman log messages in dagman.out file were updated to have HTCondor instead of Condor. This broke the monitord parsing regex’s and hence it was not able to parse information from the dagman.out file. This is now fixed.
  • [PM-1034] – Make it more difficult for users to break pegasus-submitdir archive
    Adding locking mechanism internally, to make pegasus-submitdir more robust , when a user accidently kills an archive operation .
  • [PM-1040] – pegasus-analyzer should be able to handle cases where the workflow failed to start
    pegasus-analyzer now detects if a workflow failed to start because of DAGMan fail on NFS error setting, and also displays any errors in *.dag.lib.err files.

Bugs Fixed

  • [PM-921] – Specified env is not provided to monitord
    The environment for pegasus-monitord is now set in the dagman.sub file. The following order is used: pick system environment, override it with env profiles in properties and then from the local site entry in the site catalog.
  • [PM-999] – pegasus-transfer taking too long to finish in case of retries
    pegasus-transfer has moved to a exponential back-off: min(5 ** (attempt_current + 1) + random.randint(1, 20), 300)
    That means that failures for short running transfers will still take time, but is necessary to ensure scalability of real world workflows
  • [PM-1008] – Dashboard file browser file list breaks with sub-directories
    Dashboard filebrowser broke when there were sub directories in the submit directory. this is now fixed
  • [PM-1009] – File browser just says “Error” if submit_dir in workflow db is incorrect
    File browser gives a more informative message when submit directory recorded in the database does not actually exist
  • [PM-1011] – OSX installer no longer works on El Capitan
    El Capitan has a new “feature” that disables root from modifying files in /usr with some exceptions (e.g. /usr/local). Since the earlier installer installed Pegasus in /usr, it no longer worked. Installer was updated to install Pegasus in /usr/local instead.
  • [PM-1012] – pegasus-gridftp fails with “no key” error
    The SSL proxies jar was updated . The error was triggered because of following JGlobus issue: https://github.com/jglobus/JGlobus/issues/146
  • [PM-1017] – pegasus-s3 fails with [SSL: CERTIFICATE_VERIFY_FAILED]
    s3.amazonaws.com has a cert that was issued by a CA that is not in the cacerts.txt file bundled with boto 2.5.2. Boto bundled with Pegasus was updated to 2.38.0
  • [PM-1021] – kickstart stat for jobs in the workflow does not work for clustered jobs
    kickstart stat did not work for clustered jobs. This is now fixed.
  • [PM-1022] – dynamic hierarchy tests failed randomly
    The DAX jobs were not considered for cleanup. Because of this, if there was a compute job that generated the DAX the subdax job required, sometimes the cleanup of the dax file happened before the subdax job finished. This is now fixed.
  • [PM-1039] – pegasus-analyzer fails with: TypeError: unsupported operand type(s) for -: ‘int’ and ‘NoneType’
    pegasus-analyzer threw a stacktrace when a workflow did not start because of DAGMan NFS settings. This is now fixed.
  • [PM-1041] – pegasus-db-admin 4.5.4 gives a stack trace when run on pegasus 4.6 workflow submit dir
    A clean error is displayed, if pegasus-db-admin from 4.5.4 is run against a workflow submit directory from a higher Pegasus version.