Pegasus 4.5.4 Released

with No Comments

We are happy to announce the release of Pegasus 4.5.4. Pegasus 4.5.4 is a minor release, which contains minor enhancements and fixes bugs. This will most likely be the last release in the 4.5 series, and unless you have specific reasons to stay with the 4.5.x series, we recommend to upgrade to 4.6.0.

New Features

  1. [PM-1003] – planner should report information about what options were #1120 used in the planner

    Planner now reports additional metrics such as command line options, whether PMC was used and number of deleted tasks to the metrics server.

  2. [PM-1007] – “undelete” or attach/detach for pegasus-submitdir #1124

    pegasus-submit dir has two new commands : attach, which adds the workflow to the dashboard (or corrects the path), and detach, which removes the workflow from the dashboard.

  3. [PM-1030] – pegasus-monitord should parse the new dagman output #1144 that reports timestamps from condor user log

    Starting 8.5.2 , HTCondor DAGMan record sthe condor job log timestamps in the ULOG event messages in the end of the log message. monitord was updated to prefer these timestamps for the job events if present in the DAGMan logs.

Improvements

  1. [PM-896] – Document events that monitord publishes #1014

    The netlogger messages generated by monitord that are used for populated the workflow database and master database, are now documented at https://pegasus.isi.edu/wms/docs/4.5.4cvs/stampede_wf_events.php

  2. [PM-995] – changes to Pegasus tutorial #1112

    Pegasus tutorial was reorganized and simplified to focus more on the pegasus-dashboard, and debugging exercises

  3. [PM-1033] – update monitord to handle updated log messages in dagman.out file #1147

    Starting 8.5.x series, some of the dagman log messages in dagman.out file were updated to have HTCondor instead of Condor. This broke the monitord parsing regex’s and hence it was not able to parse information from the dagman.out file. This is now fixed.

  4. [PM-1034] – Make it more difficult for users to break pegasus-submitdir archive #1148

    Adding locking mechanism internally, to make pegasus-submitdir more robust , when a user accidently kills an archive operation .

  5. [PM-1040] – pegasus-analyzer should be able to handle cases where the workflow failed to start #1154

    pegasus-analyzer now detects if a workflow failed to start because of DAGMan fail on NFS error setting, and also displays any errors in *.dag.lib.err files.

Bugs Fixed

  1. [PM-921] – Specified env is not provided to monitord #1039

    The environment for pegasus-monitord is now set in the dagman.sub file. The following order is used: pick system environment, override it with env profiles in properties and then from the local site entry in the site catalog.

  2. [PM-999] – pegasus-transfer taking too long to finish in case of retries #1116

    pegasus-transfer has moved to a exponential back-off: min(5 ** (attempt_current + 1) + random.randint(1, 20), 300) That means that failures for short running transfers will still take time, but is necessary to ensure scalability of real world workflows .

  3. [PM-1008] – Dashboard file browser file list breaks with sub-directories #1125

    Dashboard filebrowser broke when there were sub directories in the submit directory. this is now fixed.

  4. [PM-1009] – File browser just says “Error” if submit_dir in workflow db is incorrect #1126

    File browser gives a more informative message when submit directory recorded in the database does not actually exist.

  5. [PM-1011] – OSX installer no longer works on El Capitan #1128

    El Capitan has a new “feature” that disables root from modifying files in /usr with some exceptions (e.g. /usr/local). Since the earlier installer installed Pegasus in /usr, it no longer worked. Installer was updated to install Pegasus in /usr/local instead.

  6. [PM-1012] – pegasus-gridftp fails with “no key” error #1129

    The SSL proxies jar was updated . The error was triggered because of following JGlobus issue: https://github.com/jglobus/JGlobus/issues/146

  7. [PM-1017] – pegasus-s3 fails with [SSL: CERTIFICATE_VERIFY_FAILED] #1132

    s3.amazonaws.com has a cert that was issued by a CA that is not in the cacerts.txt file bundled with boto 2.5.2. Boto bundled with Pegasus was updated to 2.38.0

  8. [PM-1021] – kickstart stat for jobs in the workflow does not work for clustered jobs #1136

    kickstart stat did not work for clustered jobs. This is now fixed.

  9. [PM-1022] – dynamic hierarchy tests failed randomly #1137

    The DAX jobs were not considered for cleanup. Because of this, if there was a compute job that generated the DAX the subdax job required, sometimes the cleanup of the dax file happened before the subdax job finished. This is now fixed.

  10. [PM-1039] – pegasus-analyzer fails with: TypeError: unsupported operand type(s) for -: ‘int’ and ‘NoneType’ #1153

    pegasus-analyzer threw a stacktrace when a workflow did not start because of DAGMan NFS settings. This is now fixed.

  11. [PM-1041] – pegasus-db-admin 4.5.4 gives a stack trace when run on pegasus 4.6 workflow submit dir #1155

    A clean error is displayed, if pegasus-db-admin from 4.5.4 is run against a workflow submit directory from a higher Pegasus version.