7. Pegasus 4.5.x Series
7.1. Pegasus 4.5.4
Release Date: January 27, 2016
We are happy to announce the release of Pegasus 4.5.4. Pegasus 4.5.4 is a minor release, which contains minor enhancements and fixes bugs. This will most likely be the last release in the 4.5 series, and unless you have specific reasons to stay with the 4.5.x series, we recommend to upgrade to 4.6.0.
7.1.1. New Features
[PM-1003] - planner should report information about what options were #1120 used in the planner
Planner now reports additional metrics such as command line options, whether PMC was used and number of deleted tasks to the metrics server.
[PM-1007] - “undelete” or attach/detach for pegasus-submitdir #1124
pegasus-submit dir has two new commands : attach, which adds the workflow to the dashboard (or corrects the path), and detach, which removes the workflow from the dashboard.
[PM-1030] - pegasus-monitord should parse the new dagman output #1144 that reports timestamps from condor user log
Starting 8.5.2 , HTCondor DAGMan record sthe condor job log timestamps in the ULOG event messages in the end of the log message. monitord was updated to prefer these timestamps for the job events if present in the DAGMan logs.
7.1.2. Improvements
[PM-896] - Document events that monitord publishes #1014
The netlogger messages generated by monitord that are used for populated the workflow database and master database, are now documented at https://pegasus.isi.edu/wms/docs/4.5.4cvs/stampede_wf_events.php
[PM-995] - changes to Pegasus tutorial #1112
Pegasus tutorial was reorganized and simplified to focus more on the pegasus-dashboard, and debugging exercises
[PM-1033] - update monitord to handle updated log messages in dagman.out file #1147
Starting 8.5.x series, some of the dagman log messages in dagman.out file were updated to have HTCondor instead of Condor. This broke the monitord parsing regex’s and hence it was not able to parse information from the dagman.out file. This is now fixed.
[PM-1034] - Make it more difficult for users to break pegasus-submitdir archive #1148
Adding locking mechanism internally, to make pegasus-submitdir more robust , when a user accidently kills an archive operation .
[PM-1040] - pegasus-analyzer should be able to handle cases where the workflow failed to start #1154
pegasus-analyzer now detects if a workflow failed to start because of DAGMan fail on NFS error setting, and also displays any errors in *.dag.lib.err files.
7.1.3. Bugs Fixed
[PM-921] - Specified env is not provided to monitord #1039
The environment for pegasus-monitord is now set in the dagman.sub file. The following order is used: pick system environment, override it with env profiles in properties and then from the local site entry in the site catalog.
[PM-999] - pegasus-transfer taking too long to finish in case of retries #1116
pegasus-transfer has moved to a exponential back-off: min(5 ** (attempt_current + 1) + random.randint(1, 20), 300) That means that failures for short running transfers will still take time, but is necessary to ensure scalability of real world workflows .
[PM-1008] - Dashboard file browser file list breaks with sub-directories #1125
Dashboard filebrowser broke when there were sub directories in the submit directory. this is now fixed.
[PM-1009] - File browser just says “Error” if submit_dir in workflow db is incorrect #1126
File browser gives a more informative message when submit directory recorded in the database does not actually exist.
[PM-1011] - OSX installer no longer works on El Capitan #1128
El Capitan has a new “feature” that disables root from modifying files in /usr with some exceptions (e.g. /usr/local). Since the earlier installer installed Pegasus in /usr, it no longer worked. Installer was updated to install Pegasus in /usr/local instead.
[PM-1012] - pegasus-gridftp fails with “no key” error #1129
The SSL proxies jar was updated . The error was triggered because of following JGlobus issue: https://github.com/jglobus/JGlobus/issues/146
[PM-1017] - pegasus-s3 fails with [SSL: CERTIFICATE_VERIFY_FAILED] #1132
s3.amazonaws.com has a cert that was issued by a CA that is not in the cacerts.txt file bundled with boto 2.5.2. Boto bundled with Pegasus was updated to 2.38.0
[PM-1021] - kickstart stat for jobs in the workflow does not work for clustered jobs #1136
kickstart stat did not work for clustered jobs. This is now fixed.
[PM-1022] - dynamic hierarchy tests failed randomly #1137
The DAX jobs were not considered for cleanup. Because of this, if there was a compute job that generated the DAX the subdax job required, sometimes the cleanup of the dax file happened before the subdax job finished. This is now fixed.
[PM-1039] - pegasus-analyzer fails with: TypeError: unsupported operand type(s) for -: ‘int’ and ‘NoneType’ #1153
pegasus-analyzer threw a stacktrace when a workflow did not start because of DAGMan NFS settings. This is now fixed.
[PM-1041] - pegasus-db-admin 4.5.4 gives a stack trace when run on pegasus 4.6 workflow submit dir #1155
A clean error is displayed, if pegasus-db-admin from 4.5.4 is run against a workflow submit directory from a higher Pegasus version.
7.2. Pegasus 4.5.3
Release Date: November 4, 2015
We are happy to annouce the release of Pegasus 4.5.3. Pegasus 4.5.3 is a minor release, which contains minor enhancements and fixes bugs in the Pegasus 4.5.2 release.
The following issues were addressed and more information can be found in the Pegasus Jira (https://jira.isi.edu/)
7.2.1. Bug Fixes
[PM-980] - pegasus-plots fails with “-p all” #1097
[PM-982] - MRC replica catalog backend does not work #1099
[PM-987] - noop jobs created by Pegasus don’t use DAGMan NOOP keyword #1104
[PM-996] - Pegasus Statistics transformation stats columns getting #1113 larger ad larger with more sub workflows
[PM-997] - pyOpenSSL v0.13 does not work with new version of openssl #1114 (1.0.2d) and El Captain
7.2.2. Improvements
[PM-976] - ignore register and transfer flags for input files #1093
[PM-981] - register only based names for output files with deep LFN’s #1098
[PM-983] - data reuse algorithm should consider file locations while #1100 cascading deletion upwards
[PM-984] - condor_rm on a pegasus-kickstart wrapped job does not #1101 return stdout back
[PM-988] - pegasus-transfer should handle file://localhost/ URL’s #1105
[PM-989] - pegasus-analyzer debug job option should have a hard check #1106 for output files
[PM-993] - Show dax/dag planning jobs in #1110 failed/succesfull/running/failing tabs in dashboard
[PM-1000] - turn off concurrency limits by default #1117
7.2.3. New Features
7.3. Pegasus 4.5.2
Release Date: October 15, 2015
We are happy to annouce the release of Pegasus 4.5.2. Pegasus 4.5.2 is a minor release, which contains minor enhancements and fixes bugs in the Pegasus 4.5.1 release. The release addresses a critical fix for systems running HTCondor 8.2.9 , whereby all dagman jobs for Pegasus workflows fail on startup.
7.3.1. Enhancements
File locations in the DAX treated as a Replica Catalog
By default, file locations listed in the DAX override entries listed in the Replica Catalog. Users can now set the boolean property pegasus.catalog.replica.dax.asrc to treat the dax locations along with the entries listed in the Replica Catalog for Replica Selection.
Associated JIRA item PM-973 #1090
Pegasus auxillary tools now have support for iRods 4.x
7.3.2. Bugs Fixed
pegasus-dagman setpgid fails under HTCondor 8.2.9
Starting with version 8.2.9, HTCondor sets up the process group already to match the pid, and hence the setpgid fails in the pegasus-dagman wrapper around condor-dagman. Because of this all Pegasus workflows fail to start on submit nodes with HTCondor 8.2.9 .
If you cannot upgrade to Pegasus version 4.5.2 and are running HTCondor 8.2.9, you can set you can turn off HTCondor’s setsid’ing by setting the following in your condor configuration
USE_PROCESS_GROUPS = false
The pegasus-dagman wrapper now does not fatally fail, if setpgid fails. More details at
PM-972 #1089
nonshareddfs execution does not work for Glite if auxiliary jobs are planned to run remotely
For nonsharedfs execution to a local PBS|SGE cluster using the GLite interface, Pegasus generated auxillary jobs had incorrect paths to pegasus-kickstart in the submit files, if a job was mapped to run on the remote ( non local ) site.
This is now fixed. PM-971 #1088
7.4. Pegasus 4.5.1
Release Date: August 10, 2015
We are happy to annouce the release of Pegasus 4.5.1. Pegasus 4.5.1 is a minor release, which contains minor enhancements and fixes bugs to Pegasus 4.5.0 release.
7.4.1. Enhancements
pegasus-statistics reports workflow badput
pegasus-statistics now reports the workflow badput time, which is the sum of all failed kickstart jobs. More details at https://pegasus.isi.edu/wms/docs/4.5.1/plotting_statistics.php
Associated JIRA item PM-941 #1058
fast start option for pegasus-monitord
By default, when monitord starts tracking a live dagman.out file, it sleeps intermittently, waiting for new lines to be logged in the dagman.out file.
This behavior, however causes monitord to lag considerably
when starting for large workflows
when monitord gets restarted due to some failure by pegasus-dagman, or we submit a rescue dag.
Users can now set the property pegasus.monitord.fast_start property to enable it. For a future release, it will be the default behavior.
Associated JIRA item PM-947 #1064
Support for throttling jobs across workflows using HTCondor concurrency limits
Users can now throttle jobs across worklfows using HTCondor concurrency limits. However, this only applies to vanilla universe jobs.
Documentation at https://pegasus.isi.edu/wms/docs/4.5.1sjob_throttling.php#job_throttling_across_workflows
Associated JIRA item PM-933 #1050
Support for submissions to local SGE cluster using the GLite interfaces
Prelimnary support for SGE clusters has been added in Pegasus. To use this you need to copy the sge_local_submit_attributes.sh from the Pegasus share directory and place it in your condor installation.
The list of supported keys can be found here https://pegasus.isi.edu/wms/docs/4.5.1/glite.php
Associated JIRA item PM-955 #1072
PEGASUS_SCRATCH_DIR set in the job environment for sharedfs deployment
Pegasus not sets an environment variable for the job that indicates the PEGASUS scratch directory the job is executed in , in the case of sharedfs deployments. This is the directory that is created by the create dir job on the execution site for the workflow.
Associated JIRA item PM-961 #1078
New properties to control read timeout while setting up connections to the database
User can now set pegasus.catalog.*.timeout to set the timeout value in seconds. This should be set only if you encounter database locked errors for your installation.
Associated JIRA item PM-943 #1060
Ability to prepend to system path before launcing an application executable
Users can now associate an env profile named KICKSTART_PREPEND_PATH with their jobs, to specify the PATH where application specific modules are installed. kickstart will take this value and prepend it to system path before launching the executable
Associated JIRA item PM-957 #1074
environment variables in condor submit files are specified using the newer condor syntax
For GLITE jobs the environment is specified using the key +remote_environment. For all other jobs, the environment is specified using the environment key but the value is in newer format ( i.e key=value separated by whitespace)
Associated JIRA item PM-934 #1051
pass options to pegasus-monitord via properties
Users can now specify pegasus.monitord.arguments to pass extra options with which pegasus-monitord is launched for the workflow at runtime.
Associated JIRA item PM-948 #1065
pegasus-transfer support OSG stashcp
pegasus-transfer has support for the latest version of stashcp
Associated JIRA item PM-948 #1065
pegasus-dashboard improvements
pegasus-dashboard now loads the directory listing via a AJAX calls. Makes the loading of the workflow details page much faster for large workflows.
Show working dir. for a job_instance, and invocation in job details and invocation details page.
Displays an appropriate error message if pegasus-db-admin update of a database fails.
Added a HTML error page for DB Migration error.
Configure logging so Flask log messages show up in Apache logs
Associated JIRA item PM-940 #1057
PEGASUS_SITE environment variable is set in job’s environment
PM-907 #1025
7.4.2. Bugs Fixed
InPlace cleanup failed if an intermediate file when used as input had transfer flag set to false
If an intermediate file ( an output file generated by a parent job) was used as an input file to a child job with the transfer flag set to false, then the associated cleanup job did not have a dependency to the child job. As a result, the cleanup job could run before the child job (that required it as input) could be run.
This is now fixed. PM-969 #1086
Incorrect ( malformed) rescue dag submitted in case planner dies because of memory related issues
For hieararchal workflows, if a sub worklfow fails then a rescue dag for the sub workflow gets submitted on the job retry. The .dag file for the sub workflow is generated by the planner. If the planner fails during code generation an incoplete .dag file can be submitted.
This is now fixed. The planner now writes the dag to a tmp file before renaming it to the .dag extension when code completion is done.
PM-966 #1083
Mismatched memory units in kickstart records
kickstart now reports all memory values in KB. Earlier the procs element in the machine entry was reporting the value in bytes, while the maxrss etc values in the usage elments were in KB.
This is now fixed. PM-959 #1076
pegasus-analyzer did not work for sub workflows
There was a bug in the 4.5.0 release where pegasus-analyzer did not pick up the stampede database for the sub workflows correctly. This is now fixed.
PM-956 #1073
Rescue DAGS not submitted correctly for dag jobs
There was a bug in the 4.5.0 release as a result of the .dag.condor.sub file was generated. As a result of that, the force option was propogated for the dag jobs in the DAX ( dag jobs are sub workflows that are not planned by Pegasus).
PM-949 #1066
nonsharedfs configuration did not work with Glite style submissions
In case of nonsharedfs, transfer_executable is set to true to transfer the PegasusLite script. However, in the Glite case, that was explicity disabled, which was preventing the workflows from running successfully.
PM-950 #1067
pegasus-analyzer catches error for wrong directory instead of listing the traceback
PM-946 #1063
pegasus-gridftp fails with: Invalid keyword “POSTALCODE”
pegasus-gridftp was failing against the XSEDE site stampede, because of change in certificates at TACC. This was fixed by udpating to the latest jglobus jars.
PM-945 #1062
pegasus-statistics deletes directories even if -o option is specified
By default pegasus-statistics deletes the statistics directory in which the statistics files are generated. However, this had the side affect of deleting user specified directories set by the -o option. that is no longer the case.
PM-932 #1049
pegasus-exitcode ignores errors when it gets “-r 0”
pegasus-exitcode now only ignores invocation records exitcodes , but does all the other checks specified when the -r option is specified.
PM-927 #1044
pegasus-statistics displays a workflow not found error in case of throwing SqlAlchemy error
This also happens if pegasus-admin creates an empty workflow database for a new workflow, and nothing is populated ( because events population is turned off).
Associated JIRA item PM-942 #1059
InPlace cleanup did not work correctly with mutlisite runs
InPlace cleanup did not work correctly with inter site transfer jobs. This is now fixed.
PM-936 #1053
7.5. Pegasus 4.5.0
Release Date: May 5, 2015
We are happy to announce the release of Pegasus 4.5.0. Pegasus 4.5.0 is a major release of Pegasus and includes all the bug fixes and improvements in the minor releases 4.4.1 and 4.4.2 .
New features and Improvements in 4.5.0 are
ensemble manager for managing collections of workflows
support for job checkpoint files
support for Google storage
improvements to pegasus-dashboard
data management improvements
new tools pegasus-db-admin, pegasus-submitdir , pegasus-halt and pegasus-graphviz
Migration guide available http://pegasus.isi.edu/wms/docs/4.5.0/useful_tips.php#migrating_from_leq44
7.5.1. NEW FEATURES
Ensemble manager for managing collections of workflows
The ensemble manager is a service that manages collections of workflows called ensembles. The ensemble manager is useful when you have a set of workflows you need to run over a long period of time. It can throttle the number of concurrent planning and running workflows, and plan and run workflows in priority order. A typical use-case is a user with 100 workflows to run, who needs no more than one to be planned at a time, and needs no more than two to be running concurrently.
The ensemble manager also allows workflows to be submitted and monitored programmatically through its RESTful interface. Details about ensemble manager can be found at https://pegasus.isi.edu/wms/docs/4.5.0/service.php
Support for Google Storage
Pegasus now supports running of workflows in the Google cloud. When running workflows in Google cloud, users can specify Google storage to act as the staging site. More details on how to configure Pegasus to use google storage can be found at pegasus.isi.edu/wms/docs/4.5.0/cloud.php#google_cloud. All the pegasus auxillary clients ( pegasus-transfer, pegasus-create-dir and pegasus-cleanup) were updated to handle google storage URL’s ( starting with gs://). The tools call out to google command line tool called gsutils.
Support for job checkpoint files
Pegasus now supports checkpoint files created by jobs. This allows users to run long running jobs ( where the runtime of a job exceeds the maxwalltime supported on a compute site) to completion, provided the jobs generate a checkpoint file periodically. To use this, checkpoint files with link as checkpoint need to be specified for the jobs in the DAX . Additionally, the jobs need to specify the pegasus profile checkpoint.time that indicates the number of minutes after which pegasus-kickstart sends a TERM signal to the job, signalling it to start the generation of the checkpoint file .
Details on this can be found in the userguide https://pegasus.isi.edu/wms/docs/4.5.0/transfer.php#staging_job_checkpoint_files
Pegasus Dashboard Improvements
Pegasus dashboard can now be deployed in multiuser mode. It is now started by the pegasus-service command. Instructions for starting the pegasus service can be found at https://pegasus.isi.edu/wms/docs/4.5.0/service.php#idp2043968
Changed the look and feel of the dashboard. Users can now track all job instances ( retries ) of a job through the dashboard. Earlier it was only the latest job retry.
There is a new tab called failing jobs on the workflows page. The tab lists jobs that have failed at least once and are currently being retried.
The submit host is displayed on the workflow’s main page.
The job details page now shows information about the Host where the job ran, and all the states that the job has gone through.
The dashboard also has a file browser which allows users to view files in the worklfow submit directory directly from the dashboard.
Data configuration is now supported per site
Starting with the 4.5.0 release, users now can associate a pegasus profile key data.configuration per site in the site catalog to specify the data configuration mode (sharedfs, nonsharedfs or condorio) to use for jobs executed on that site. Earlier this was a global configuration, that applied to the whole workflow and had to be specified in the properties file.
More details at PM-810 #928
Support for sqlite JDBCRC
Users can now specify a sqlite backend for their JDBCRC replica catalog. To create the database for the sqlite based replica catalog, use the command pegasus-db-admin
pegasus-db-admin create jdbc:sqlite:/shared/jdbcrc.db
To setup Pegasus to use sqlite JDBCRC set the following properties
pegasus.catalog.replica JDBCRC pegasus.catalog.replica.db.driver sqlite pegasus.catalog.replica.db.url jdbc:sqlite:/shared/jdbcrc.db
Users can use the tool pegasus-rc-client to insert, query and delete entires from the catalog.
New database management tool called pegasus-db-admin
Depending on configuration, Pegasus can refer to three different types of databases during the various stages of workflow planning and execution.
master - Usually a sqlite database located at $HOME/.pegasus/workflow.db. This is always populated by pegasus-monitord and is used by pegasus-dashboard to track users top level workflows.
workflow - Usually a sqlite database created by pegasus-monitord in the workflow submit directory. This contains detailed information about the workflow execution.
jdbcrc - if a user has configured a JDBCRC replica catalog.
The tool is automatically invoked by the planner to check for comaptibility and updates the master database if required. The jdbcrc is checked if a user has it configured at planning time or when using the pegasus-rc-client command line tool.
This tool should be used by users, when setting up new database catalogs, or to check for compatibility. For more details refer to the migration guide at https://pegasus.isi.edu/wms/docs/4.5.0cvs/useful_tips.php#migrating_from_leq44
pegasus-kickstart allows for system calls interposition
pegasus-kickstart has new options -z and -Z that get enabled for linux platforms. When enabled, pegasus-kickstart captures information about the files opened and I/O for user applications and includes it in the proc section of it’s output. This -z flag causes kickstart to use ptrace() to intercept system calls and report a list of files accessed and I/O performed. The -Z flag causes kickstart to use LD_PRELOAD to intercept library calls and report a list of files accessed and I/O performed.
pegasus-kickstart now captures condor job id and LRMS job ids
pegasus-kickstart now captures both the condor job id and the local LRMS ( the system through which the job is executed) in the invocation record for the job.
pegasus-transfer has support for SSHFTP
pegasus-transfer now has support for GridFTP over SSH . More details at https://pegasus.isi.edu/wms/docs/4.5.0/transfer.php#idp17066608
pegasus-s3 has support for bulk deletes
pegasus-s3 now supports batched deletion of keys from a S3 bucket. This improves the performance for deleting keys from a large bucket.
PM-791 #909
DAGMan metrics reporting enabled
Pegasus workflows now have DAGMan metric reporting capability turned on. Details on Pegasus usage tracking policy can be found at https://pegasus.isi.edu/wms/docs/4.5.0/usage_statistics.php
As part of this effort the planner now invokes condor_submit_dag at planning time to generate the DAGMan submit file, that is then modified to enable metrics reporting.
More details at PM-797 #915
Planner reports file distribution counts in metrics report
The planner now reports file distribution counts ( number of input, intermediate and output files) in it’s metrics report .
Notion of scope for data reuse
Users can now enable partial data reuse, where only output files of certain jobs are checked for existence in the replica catalog, to trigger data reuse. Three scopes are supported
full - full data reuse as is implemented in 4.4 none - no data reuse i.e same as –force option to the planner partial - in this case, only certain jobs ( those that have pegasus profile key enable_for_data_reuse set to true )are checked for presence of output files in the replica catalog
New tool called pegasus-submitdir
There is a new tool called pegasus-submitdir that allows users to archive, extract , move and delete a workflow submit directory. The tool ensures that master database ( usually in $HOME/.pegasus/workflow.db) is updated accordingly.
New tool called pegasus-halt
There is a new tool called pegasus-halt , that allows users to gracefully halt running workflows. The tool places DAGMan .halt files (http://research.cs.wisc.edu/htcondor/manual/v8.2/2_10DAGMan_Applications…) for all dags in a workflow.
More details at PM-702 #820
New tool called pegasus-graphviz
Pegasus now has a tool called pegasus-graphviz that allows you to visualize the DAX and DAG files. It creates a dot file as output .
New canonical executable pegasus-mpi-keg
New executable called pegasus-mpi-keg that can be compiled from source. Useful for creating synthetic workflows containing MPI jobs. It is similar to pegasus-keg and accepts the same command line arguments. The only difference is that it is MPI code.
Change in default values
By default, pegasus-transfer now launches maximum of 8 threads to manage the transfers of multiple files. The default job retries for a job in case of failure is now 1 instead of 3. The time for removing the job after has entered the HELD state has been reduced from 1 hour to 30 minutes now.
Support for DAGMan ABORT-DAG-ON feature
Pegasus now supports a dagman profile key named ABORT-DAG-ON , that can be associated with a job. This job can then cause the whole workflow to be aborted if it fails or exits with a specific value.
More details at PM-819 #937
Deprecated pool attribute in replica catalog
Users now can associate a site attribute in their file based replica catalogs to indicate the site where a file resides. The old attribute pool has been deprecated.
More details at PM-813 #931
Support for pegasus profile glite.arguments
Users can now specify a pegasus profile key glite.arguments that gets added to corresponding PBS qsub file that is generated by the Glite layer in HTCondor. For e.g you can set the value to “-N testjob -l walltime=01:23:45 -l nodes=2” . This will get translated to the following in the PBS file #PBS -N testjob -l walltime=01:23:45 -l nodes=2
The values specified for this profile, override any other conflicting directives that are created on the basis of the globus profiles associated with the jobs.
More details at PM-880 #998
Reorganized documentation
The userguide has been reorganized to make it easier for users to identify the right chapter they want to navigate to. The configuration documentation has been streamlined and put into a single chapter, rather than having a separate chapter for profiles and properties.
Support for hints namespace
Users can now specify the following hints profile keys to control the behavior of the planner.
execution.site - the execution site where a job should execute pfn - the path to the remote executable picked up grid.jobtype - the job type to be used while selecting the gridgateway / jobmanager for the job
More details at PM-828 #946
Added support for HubZero Distribute job wrapper
Added support for HubZero specific job launcher Distribute, that submits jobs to a remote PBS cluster. The compute jobs are setup by Pegasus to run in local universe, and are wrapped with Distribute job wrapper, that takes care of the submission and monitoring of the job. More details at PM-796 #914
New classad populated for dagman jobs
Pegasus now popualtes a +pegasus_execution_sites classad in the dagman submit file. The value is the list of execution sites for which the workflow was planned for.
More details at PM-846 #964
Python DAX API now bins the file by link type when rendering the workflow
Python DAX API now groups the jobs by their link type before rendering them to XML. This improves the readability of the generated DAX.
More details at PM-874 #992
Better demarcation of various stages in PegasusLite logs
The jobs .err file in PegasusLite modes captures the logs from the PegasusLite wrapper that launches users jobs on remote nodes. This log is now clearly demarcated to identify the various stages of a job execution by PegasusLite.
Dropped support for Globus RLS replica catalog backends
pegasus-plots is deprecated and will be removed in 4.6
7.5.2. Bugs Fixed
Fixed kickstart handling of environment variables with quotes
If an environment variable has quotes, then invalid XML output was produced by pegasus-kickstart. This is now fixed. More details at
PM-807 #925
Leaking file descriptors for two stage transfers
pegasus-transfer opens a temp file for each two stage transfer it has to execute. It was not closing them explicitly.
Disabling of chmod jobs triggered an exception
Disabling the chmod jobs results in creation of noop jobs instead of the chmod jobs. However, that resulted in planner exceptions when adding create dir and leaf cleanup nodes. This is now fixed.
More details at PM-845 #963
Incorrect binning of file transfers amongst transfer jobs
By default, pair only considered the destination URL of a transfer pair to determine whether the associated transfer job has to run locally on the submit host or on the remote staging site. However, this logic broke when user had input files catalogued in the replica catalog with file urls for files on the submit site and remote execution sites. The logic has now been updated to take into account source URL’s also.
More details at PM-829 #947
pegasus auxillary jobs are never lauched with pegasus-kickstart invoke capability
For compute jobs with long command line arguments , the planner triggers the pegasus invoke capability in addition to the -w option. However, this cannot be applied to pegasus auxillary jobs as that interferes with the credential handling.
More details at PM-851 #969
Everything in the remote job directory gets staged in condorio mode, if a job has no output files
If a job has no output files asscociated with it in the DAX, then in condorio data configuration mode the planner added an empty value for classad key transfer_output_files in the job submit file. This results in Condor staging back all the inputs ( all the contents in remote jobs directory) back to the submit host. This is now fixed as the planner now adds a special key +TransferOutput=”” , that prevents Condor from staging everything back.
More details at PM-820 #938
Setting multiple strings for exitcode.successmsg and exitcode.failuremsg
Users can now specify multiple pegasus profiles with the key exitcode.successmsg or exticode.failuremsg. Each value gets translated to a corresponding -s or -f argument to pegasus-exitcode invocation for the job.
More details at PM-826 #944
pegasus-monitord failed when submission of job fails
The events SUBMIT_FAILED, GRID_SUBMIT_FAILED, GLOBUS_SUBMIT_FAILED were not handled correctly by pegasus-monitord. As a result, subsequent event insertions for the job resulted in integrity errors. This is now fixed.
More details at PM-877 #995