Pegasus 4.2.2 is a minor release, that has minor enhancements and fixes bugs to Pegasus 4.2.0 release. Improvements in 4.2.2 include
- support for sever side pagination for pegasus-dashboard
- support for lcg-utils command line clients to retrieve and push data to SRM servers
- installation of Pegasus python libraries in standard system locations
- examples for using CREAMCE and glite submissions
Improvements
-
- Rotation of monitord logs
monitord is automatically launched by pegasus-dagman. When launching monitord, pegasus-dagman sets up the monitord to a log file it initializes. However monitord also took a backup of the log when it started up as it detected the log file existed. This led to two monitord log files in the submit directory which was confusing. Now only pegasus-dagman setsup the monitord log.
More details can be found at PM-688 #806
- Monitord Recovery in case of SQLLite DB
If a monitord gets killed on a currently running workflow, then it restarts from the start. The information in the recovery file it writes out is insufficient to recover gracefully. In case of SQLlite DB , monitord does not attempt to expunge the information from the database. Instead it takes a backup of the sqlite database in the submit directory.
More details can be found at PM-689 #807
- Support for lcg-utils for srm transfers
The pegasus-create-dir, pegasus-cleanup and pegasus-transfer clients were updated to include support for lcg utils to do operations against a SRM server
Note that lcg utils takes precedence if both lcg-cp and srm-copy are available.
- Improvements to the dashboard
- Use Content Delivery Networks as source for jQuery, jQueryUI, and DataTables plugin.
- Most tables in dashboard now have server side pagination, to enable large workflows.
- Replaced radio buttons with jQuery buttons for a better look and feel.
- Made Statistics/Charts links more prominent.
- Added a drop down to filter list of workflows run in last hour, day, month, or year.
- Newer examples added in the examples directory
The release has new examples checked in that highlight
- how to use the nonshared fs against a remote staging site that has a scp server.
- use glite submission to a local PBS cluster using the sharedfs data configuration
- use the nonsharedfs case, where we use SRM as a staging site using CREAMCE submission
- Pegasus python libraries are installed in standard system locations
The RPM and DEB packages now installs the Python modules in the standard system locations. Users should no longer have to set PYTHONPATH or add to the include paths in their DAX generators.
- Condor job logs are no longer in the /tmp directory
pegasus.condor.logs.symlink now defaults to false. This is to ensure compatibility with condor 7.9.4 onwards and ticket https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1419 DAGMAn will fail by default now if it detects that common log is in /tmp
10.1.2. BUGS FIXED
- Externally Accessible URL’s for staged executables broken for SRM
In certain cases, for SRM file servers in the site catalog, the URL constructed to a staged executable was incorrect. This is now fixed.
More details can be found at PM-686 #804
- pegasus-exitcode cluster-summary w/submitted=0
If the output file has a cluster-summary record, and the number of submitted tasks is 0, then the job succeeded. This fixes an error SCEC had that was introduced when the “tasks” and “submitted” values in cluster-summary were separated for PMC.
- Pegasus Lite did not support jobs with stdin file tracked in the DAX
In the pegasus lite case, support for jobs with their stdin tracked in the DAX was broken. This is now fixed.
More details can be found at PM-694 #812
- pegasus-cleanup did not support symlink deletion
In case where symlinks to the input files are created in the scratch directory on the staging-site, the pegasus-cleanup job was created with symlink urls to be deleted. This led to the jobs failing as pegasus-cleanup did not support deletion of symlinks.This is now fixed .
Additionally, the planner sets up the cleanup jobs to run on the remote if the url to b deleted is a file url or a symlink url
More details can be found at PM-696 #814
- pegasus-createdir and pegasus-transfer with S3 buckets
pegasus-createdir and pegasus-transfer did translate the S3 bucket name correctly if it contained a -. This is now fixed. Also the clients don’t fail if the bucket already exists.
- Bug fixes to the cleanup algorithm
The planner exited with an index out of bounds exception when data reuse was triggered and an output file that needed to be staged was required to be deleted. This is fixed
Also, the clustering of the cleanup jobs resulted in not all the files to be deleted by the cleanup jobs.
Improvements were made how excess edges were removed from the graph. The edge removal was done per file instead of per cleanup job. This fix drastically reduces the runtime for workflows with lots of files that need to be cleaned up.
More details can be found at PM-699 #817
- pegasus-analyzer detects prescript failures in the DB mode
Pegasus analyzer in the database mode was not detecting pre script failures for dax jobs as the associated job instance was not updated with the exitcode. Changed the way how monitord handles failures for sub workflows. In case of pre script failures, the prescript failure exitcode is recorded in addition to the stdout of the planner log. More details at
PM-704 #822
- monitord tracks non kickstarted files with rotated stdout and stderr files
monitord did not track the rotated stdout and stderr of jobs that were not launched by kickstart. Because of this the stdout and stderr was not populated. This is now fixed. More details at
PM-685 #803
- Planner fails on determining the DN from a proxy file
The planner uses the Java COG jar to determine the DN from a proxy file. It was discovered that for proxies generated from an X.509 end entity credential, by a GSI-enabled OpenSSH server results in a NPE in the COG jar.
The planner now catches all the exceptions while trying to determine the DN. There is never a FATAL error if unable to determine the DN.
- pegasus-exitcode checks for the existence of .err file
The pegasuslite_failures function did not check for missing stderr files. As a result, if exitcode was called in a scenario where there was no .err file, then it failed trying to determine if None is a valid path.
- Rotation of monitord logs