We are happy to annouce the release of Pegasus 4.3.2. Pegasus 4.3.2 is a minor release, which contains minor enhancements and fixes bugs to Pegasus 4.3.1 release.
- Better error recording for PegasusLite failures in the monitoring database
If a PegasusLite job failed because of an error encountered while retrieving hte input files from the staging site, then no kickstart record for the main user job is generated ( as the job is never launched). As a result, in the database no record is populated indicating a failure of the job. This was fixed to ensure that monitord now populates an invocation recording containing the error message from the err file of the PegasusLite job.More details can be found at
- PMC Changes
By default PMC now clears the CPU mask using sched_setaffinity. If libnuma is available, it also resets the NUMA memory policy using set_mempolicy. If the user wants to keep the inherit affinity/policy, then they can use the –keep-affinityargument.More details can be found atThe number of open files tracked in the internal file descriptor cache was decreased from 4096 to 256. Also if an error is encountered because, the fd limit is exceeded on a system then PMC logs the number of file descriptors it has open helping the user identify the number of FD’s open by PMC.
- pegasus-transfer changes
pegasus-transfer checks in the local cp mode to ensure that src and dst is not the same file.
pegasus-transfer sets the -fast option by default to GUC for 3rd party gsiftp transfers
- pegasus-status changes
Minor fix for when the parent dag disappears before the job (can happen for held jobs)
- changes to java memory settings
The pegasus-plan wrapper script takes into account ulimit -v settings while determining the java heap memory for the planner.
- Symlinking in PegasusLite against SRM server
In the case, where the data on the staging server is directly accessible to the worker nodes it is possible to enable symlinking in Pegasus that results in PegasusLite to symlink the data against the data on the staging site. When this was enabled, the source URL for the symlink transfer referred to a SRM URL resulting in pegasus lite doing a duplicate transfer. The planner needed to be changed to resolve the SRM URL to a file URL that is visible from the worker node.Also the planner never symlinks the executable files in Pegasus Lite as it can create problems with the setting of the x bit on the executables staged. For executable staging to work, the executable need to be copied to the worker node filesystem.More details can be found at
- The input file corresponding to the DAX for the DAX jobs was not associated correctly when the planner figures out the transfers required for the DAX job. This happened, if the DAX job only referred to the DAX file as an input file and that was generated by a parent dax generation job in the workflow.
- File dependency between a compute job and a sub workflow job
The planner failed while planning a dax job for an input file, that a parent job of the corresponding DAX job generated. This is now fixed as the cache file for the parent workflow is passed to the sub workflow planner invocations.More details can be found at
- Error when data reuse and cleanup is enabled
The planner failed in the case, where cleanup was enabled in conjuction with data reuse, where jobs removed by the data reuse algorithm were the ones for which output files are required by the user on the output site. In that case, the planner adds stageout jobs to stage the data from the location in the replica catalog to the output site. The addition of this stageout job was resulting in an execption in the cleanup module. This is now fixed.More details can be found at
- pegasus-analyzer not reporting the planner prescript log for failed sub workflows
In the case, where a workflow fails because the planner invoked in the prescript for the sub workflow failed, pegasus-analyzer did not point the user to the planner log file for the sub workflow. This is now fixed.More details can be found at