$Id: RELEASE_NOTES 1795 2009-04-22 21:06:21Z vahi $ =============================== Release Notes for PEGASUS 2.3.0 =============================== NEW FEATURES -------------- 1) Regex Based Replica Selection Pegasus now allows users to use regular expression based replica selection. To use this replica selector, users need to set the following property pegasus.selector.replica Regex The Regex replica selector allows the user allows the user to specifiy the regex expressions to use for ranking various PFNs returned from the Replica Catalog for a particular LFN. This replica selector selects the highest ranked PFN i.e the replica with the lowest rank value. The regular expressions are assigned different rank, that determine the order in which the expressions are employed. The rank values for the regex can expressed in user properties using the property. pegasus.selector.replica.regex.rank.[value] The value is an integer value that denotes the rank of an expression with a rank value of 1 being the highest rank. For example, a user can specify the following regex expressions that will ask Pegasus to prefer file URL's over gsiftp url's from example.isi.edu pegasus.selector.replica.regex.rank.1 file://.* pegasus.selector.replica.regex.rank.2 gsiftp://example\.isi\.edu.* User can specify as many regex expressions as they want. Since Pegasus is in Java , the regex expression support is what Java supports. It is pretty close to what is supported by Perl. More details can be found at http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html There is documentation about the new replica selector in the properties document . It can also be found at $PEGASUS_HOME/etc/sample.properties To use this set pegasus.selector.replica Regex 2) Automatic Determination of pool attributes in RLS Replica Catalog Pegasus can now associate a pool attribute with the replica catalog entries returned from querying a LRC if the pool attribute is not already specified. This is achieved by associating the site handles with corresponding LRC url's in the properties file. This mapping tells us what default pool attribute should be assigned while querying a particular LRC. For example pegasus.catalog.replica.lrc.site.llo rls://ldas.ligo-la.caltech.edu:39281 pegasus.catalog.replica.lrc.site.lho rls://ldas.ligo-wa.caltech.edu:39281 tells Pegasus that all results from LRC rls://ldas.ligo-la.caltech.edu:39281 are associated with site llo Using this feature only makes sense, when a LRC *ONLY* contains mapping for data on one site, as in case of LIGO LDR deployment. 3) Pegasus auxillary jobs on submit host now execute in local universe All the scheduler universe jobs are now executed in local universe. Also any job planned for site local will by default run in local universe instead of scheduler universe. Additionally, extra checks were put in to handle the Condor File Transfer Mechansim issues in case local/scheduler universe. This was tracked in bugzilla at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=40 A user can override the local universe generation by specifying the condor profile key universe and setting it to the value desired. 4) Python API for generating DAX and PDAX Pegasus now includes a Python API for generating DAXes and PDAXes. An example can be found online at http://vtcpc.isi.edu/pegasus/index.php/ChangeLog#Added_Python_API_for_DAX_and_PDAX For more information on the DAX API type: pydoc Pegasus.DAX2 For more information on the PDAX API type: pydoc Pegasus.PDAX2 5) Interface to Engage VO for OSG There is a new Site Catalog Implementation called Engage that interfaces with the Engage VO to discover resource information about OSG from the information published in RENCI glue classads. To use it set pegasus.catalog.site Engage To generate a site catalog using pegasus-get-sites set the source option to Engage pegasus-get-sites --source Engage --sc engage.sc.xml 6) Gensim now reports Seqexec Times and Seqexec Delays Gensim script ($PEGASUS_HOME/contrib/showlog/gensim) now reports the seqexec time and the seqexec delay for the clustered jobs. There are two new columns in the jobs file created by seqexec - seqexec - seqexec delay. The seqexec time is determined from the last line of the .out file of the clustered jobs. E.g format [struct stat="OK", lines=4, count=4, failed=0, duration=21.836,start="2009-02-20T16:14:56-08:00"] The seqexec delay is the seqexec time - kickstart time. This useful for analyzing large scale workflow runs. 7) Properties to turn on or off the seqexec progress logging The property pegasus.clusterer.job.aggregator.seqexec.hasgloballog is now deprecated. It has been replaced by two boolean properties - pegasus.clusterer.job.aggregator.seqexec.log whether to log progress or not - pegasus.clusterer.job.aggregator.seqexec.log.global whether to log progress to global file or not. The pegasus.clusterer.job.aggregator.seqexec.log.global only comes into effect when pegasus.clusterer.job.aggregator.seqexec.log is set to true 8) Passing of the DAX label to kickstart invocation Now, the kickstart invocation for the jobs is always passed the dax label using the -L option. To disable the passing of the DAX label, user needs to set pegasus.gridstart.label to false Additionally, the basename option to pegasus-plan overrides the label value retrieved from the DAX. 9) show-job works on MAC OSX platform $PEGASUS_HOME/contrib/showlog/show-job now does not fail on unavailability of convert program. It only logs a warning and creates the EPS File , but not the png files. This allows us to run show-job on MAC OSX systems. 10) Enabling InPlace cleanup in deferred planning By default in case of deferred planning cleanup is turned off as the cleanup algorithm does not work across partitions. However, in scenarios where the partitions themseleves are independant ( i.e. dont share files ), user can safely turn on cleanup. This can now be done by setting pegasus.file.cleanup.scope deferred If the property is set to deferred, and the users wants to disable cleanup , they can still specify --nocleanup option on command line and that is honored. However in case of scope fullahead for deferred planning, the command line options are ignored and always nocleanup is set. 11) New Pegasus Job Classad Pegasus now publishes a job runtime classad with the jobs. The class ad key name is pegasus_job_runtime. The value passed to it is picked up from the Pegasus Profile runtime. If the Pegaus Profile is not associated, then the globus maxwalltime profile key is used. If both are not set, then a value of zero is published. This job classad can be used for users in case of glidein, to ensure that the jobs complete before the nodes expire. For the coral glidein service the sub expression to job requirement swould look something like this (CorralTimeLeft > MY.pegasus_job_runtime) 12) [workflow].job.map file Pegasus now creates a [workflow].job.map file that links jobs in the DAG with the jobs in the DAX. The contents of the file are in netlogger format. The [workflow] is replaced by the name of the workflow i.e. same prefix as the .dag file In the file there are two types of events. a) pegasus.job b) pegasus.job.map pegasus.job - This event is for all the jobs in the DAG. The following information is associated with this event. - job.id the id of the job in the DAG - job.class an integer designating the type of the job - job.xform the logical transformation which the job refers to. - task.count the number of tasks associated with the job. This is equal to the number of pegasus.job.task events created for that job. pegasus.job.map - This event allows us to associate a job in the DAG with the jobs in the DAX. The following information is associated with this event. -task.id the id of the job in the DAG -task.class an integer designating the type of the job -task.xform the logical transformation which the job refers to. 13) Source Directory for Worker Package Staging Users now can specify the property pegasus.transfer.setup.source.base.url to specify the URL to the source directory containing the pegasus worker packages. If it is not specified, then the worker packages are pulled from the http server at pegasus.isi.edu during staging of executables. BUGS FIXED ---------- 1) Critical Bug Fix to rc-client SCEC reported a bug with the rc-client while doing bulk inserts into RLS. The bug was related to how logging is initialized internally in the client. Details of the bug fix can be found at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=38 2) Bug Fix to tailstatd for parsing jobnames with . in them There was a bug where tailstatd incorrectly generated events in the jobstate.log while parsing condor logs. This was due to an errorneous regex expression for determining the event POST|PRE SCRIPT STARTED. The earlier expression did not allow for . in jobnames. This is especially prevalent in LIGO workflows where the DAX labels have . in them. An example of the problem line in DAGMan log 1/24 10:11:21 Running POST script of Node inspiral_hipe_eobinj_cat2_veto.EOBINJ_CAT_2_VETO.daxlalapps_sire_ID000731... Earlier the job id was parsed as inspiral_hipe_eobinj_cat2_veto instead of inspiral_hipe_eobinj_cat2_veto.EOBINJ_CAT_2_VETO.daxlalapps_sire_ID000731 3) Pegasus Builds on FC10 Earlier the Pegasus builds were failed on FC10 as the invoke c tool did not build correctly. This is now fixed. Details at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=41 4) tailstatd killing jobs by detecting starvation tailstatd removes a job after four hours when the job has been waiting in the queue WITHOUT being marked as EXECUTE in the condor log. To override tailstatd has an option of setting starvation time to 0 via command line or via pegasus.max.idletime property. The if condition in the perl script was not accepting 0 as a value when trying to override the default 4 hour starvation time. This fix allows the value to be set to 0 (turn of starvation checks) or any other value via the property pegasus.max.idletime. This was tracked in pegasus jira as bug 40 http://pegasus.isi.edu/jira/browse/PM-40 Documentation -------------- 1) User Guides The release has new user guides about the following - Pegasus Job Clustering - Pegasus Profiles - Pegasus Replica Selection The guides are checked in $PEGASUS_HOME/doc/guides They can be found online at http://pegasus.isi.edu/mapper/doc.php 2) Property Document was updated with the new properties introduced. =============================== Release Notes for PEGASUS 2.2.0 =============================== NEW FEATURES -------------- 1) Naming scheme changed for auxillary jobs Pegasus during the refinement of the abstract workflow to the executable workflows adds auxillary jobs to do data stagein/stageout, create work directories for workflow etc. The prefixes/suffixes added for these jobs has been changed. Type of Job | Old Prefix | New Prefix ------------------------------------------------------------------- Data Stage In Job | rc_tx_ | stage_in_ Data Stage Out Job | new_rc_tx_ | stage_out_ Data Stage In Job between sites | inter_tx_ | stage_inter_ Data Registration Job | new_rc_register_ | register_ Cleanup Job | cln_ | clean_up_ Transfer job to transfer the | setup_tx_ | stage_worker_ worker package | | Additionally, the suffixes for the create dir jobs are now replaced by prefixes Type of Job | Old Suffix | New Prefix ------------------------------------------------------------------- Directory creation job | _cdir | create_dir_ Synch Job in HourGlass mode | pegasus_concat | pegasus_concat_ 2) Staging of worker package to remote sites Pegasus now supports staging of worker package as part of the workflow. This feature is tracked through pegasus bugzilla . http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=35 The worker package is staged automatically to the remote site, by adding a setup transfer job to the workflow. The setup transfer job by default uses GUC to stage the data. However, this can be configured by setting the property pegasus.transfer.setup.impl property. If you also, have pegasus.transfer.*.impl set in your properties file, then you need explicilty set pegasus.transfer.setup.impl to GUC The code discovers the worker package by looking up pegasus::worker in the transformation catalog. Note: that the basename of the url's should not be changed. Pegasus parses the basename to determine the version of the worker package. Pegasus automatically determines the location of the worker package to deploy on the remote site. Currently default mappings are as follows INTEL32 => x86 AMD64 => x86_64 or x86 if not available INTEL64 =>x86 OS LINUX = rhel3 There is an untar job added to the workflow after the setup job that un tars the worker package on the remote site. It defaults to /bin/tar . However can be overriden by specifying the entry tar in the transformation catalog for a particular site. 3) New Site Catalog Schema This release of Pegasus has support for site catalog schema version 3. HTML visualization of schema: http://pegasus.isi.edu/mapper/docs/schemas/sc-3.0/sc-3.0.html Schema itself: http://pegasus.isi.edu/schema/sc-3.0.xsd A sample xml file : http://pegasus.isi.edu/schema/sc-3.0-sample.xml To use a site catalog in the new format set pegasus.catalog.site XML3 Changes to sc-client sc-client command line tool was updated to convert an existing site catalog from old format to the new format. Sample usage sc-client -i ldg-old-sites.xml -I XML -o ldg-new-sites.xml -O XML3 sc-client --help gives detailed help 4) pegasus-get-sites pegasus-get-sites was recoded in JAVA and now generates the site catalog confromant to schema sc-3.0. Sample Usage to query VORS to generate a site catalog for OSG. pegasus-get-sites --source VORS --grid osg -s ./sites-new.xml The value passed to the source option is case sensitive. Additionally, the VORS module of pegasus-get-sites determines the value of GLOBUS_LOCATION variable dependant on whether the auxillary jobmanager is of type fork or not. If it is of type fork then picks up the value of GLOBUS_LOCATION variable published in VORS for that site. else it picks up the value from OSG_GRID variable published in VORS for that site. i.e. GLOBUS_LOCATION is set to $OSG_GRID/globus 5) Overhaul of logging The Pegasus logging interfaces have been reworked. Now users can specify the logger they want to use, by specifying the property pegasus.log.manager . Currently, two logging implementations are supported. Default - Pegasus homegrown logger that logs to stdout and stderr directly. Log4j - Uses log4j to log the messages. The Log4j properties can be specified at runtime by specifying the property pegasus.log.manager.log4j.conf The format of the log message themselves can be specified at runtime by specifying the property pegasus.log.manager.formatter Right now two formatting modes are supported a) Simple - This formats the messages in a simple format. The messages are logged as is with minimal formatting. Below are sample log messages in this format while ranking a dax according to performance. event.pegasus.ranking dax.id se18-gda.dax - STARTED event.pegasus.parsing.dax dax.id se18-gda-nested.dax - STARTED event.pegasus.parsing.dax dax.id se18-gda-nested.dax - FINISHED job.id jobGDA job.id jobGDA query.name getpredicted performace time 10.00 event.pegasus.ranking dax.id se18-gda.dax - FINISHED b) Netlogger - This formats the messages in the Netlogger format , that is based on key value pairs. The netlogger format is useful for loading the logs into a database to do some meaningful analysis. Below are sample log messages in this format while ranking a dax according to performance. ts=2008-09-06T12:26:20.100502Z event=event.pegasus.ranking.start \ msgid=6bc49c1f-112e-4cdb-af54-3e0afb5d593c \ eventId=event.pegasus.ranking_8d7c0a3c-9271-4c9c-a0f2-1fb57c6394d5 \ dax.id=se18-gda.dax prog=Pegasus ts=2008-09-06T12:26:20.100750Z event=event.pegasus.parsing.dax.start \ msgid=fed3ebdf-68e6-4711-8224-a16bb1ad2969 \ eventId=event.pegasus.parsing.dax_887134a8-39cb-40f1-b11c-b49def0c5232\ dax.id=se18-gda-nested.dax prog=Pegasus ts=2008-09-06T12:26:20.100894Z event=event.pegasus.parsing.dax.end \ msgid=a81e92ba-27df-451f-bb2b-b60d232ed1ad \ eventId=event.pegasus.parsing.dax_887134a8-39cb-40f1-b11c-b49def0c5232 ts=2008-09-06T12:26:20.100395Z event=event.pegasus.ranking \ msgid=4dcecb68-74fe-4fd5-aa9e-ea1cee88727d \ eventId=event.pegasus.ranking_8d7c0a3c-9271-4c9c-a0f2-1fb57c6394d5 \ job.id="jobGDA" ts=2008-09-06T12:26:20.100395Z event=event.pegasus.ranking \ msgid=4dcecb68-74fe-4fd5-aa9e-ea1cee88727d \ eventId=event.pegasus.ranking_8d7c0a3c-9271-4c9c-a0f2-1fb57c6394d5 \ job.id="jobGDA" query.name="getpredicted performace" time="10.00" ts=2008-09-06T12:26:20.102003Z event=event.pegasus.ranking.end \ msgid=31f50f39-efe2-47fc-9f4c-07121280cd64 \ eventId=event.pegasus.ranking_8d7c0a3c-9271-4c9c-a0f2-1fb57c6394d5 6) New Transfer Refiner Pegasus has a new transfer refiner named Cluster. In this refinement strategy, clusters of stage-in and stageout jobs are created per level of the workflow. It builds upon the Bundle refiner. The differences between the Bundle and Cluster refiner are as follows. - stagein is also clustered/bundled per level. In Bundle it was for the whole workflow. - keys that control the clustering ( old name bundling are ) cluster.stagein and cluster.stageout instead of bundle.stagein and bundle.stageout This refinement strategy also adds dependencies between the stagein transfer jobs on different levels of the workflow to ensure that stagein for the top level happens first and so on. An image of the workflow with this refinement strategy can be found at http://vtcpc.isi.edu/pegasus/index.php/ChangeLog#Added_a_Cluster_Transfer_Refiner 7) New Transfer Implementation for GUC from globus 4.x Pegasus has a new transfer implementation that allows it to use GUC from globus 4.x series to transfer multiple files in one job. In order to use this transfer implementation - the property pegasus.transfer.*.impl must be set to value GUC. There should be an entry in the transformation catalog with the fully qualified name as globus::guc for all the sites where workflow is run, or on the local site in case of third party transfers. Pegasus can automatically construct the path to the globus-url-copy client, if the environment variable GLOBUS_LOCATION is specified in the site catalog for the site. The arguments with which the client is invoked can be specified - by specifying the property pegasus.transfer.arguments - associating the Pegasus profile key transfer.arguments 8) Recursive DAX'es There is prototypical support for recursive dax'es. Recursive DAX'es give you the ability to specify a job in the DAX that points to another DAX that has to be executed. There is a sample recursive dax at $PEGASUS_HOME/examples/recursive.dax The dax refers to pegasus jobs in turn plan and execute a dax To get this dax planned by pegasus you will need to have additional entries for dagman and pegasus in your transformation catalog. For e.g. local condor::dagman /opt/condor/7.1.0/bin/condor_dagman INSTALLED INTEL32::LINUX NULL local pegasus::pegasus-plan:2.0 /lfs1/software/install/pegasus/default INSTALLED INTEL32::LINUX NULL The recursive dax needs to be planned for site local, since the pegasus itself runs on local site. The jobs in the dax specify -s option where you want each of your workflows to run. Recursive DAX do not need to contain only pegasus jobs. They can contain application/normal jobs that one usually specifies in a DAX. Pegasus determines that a particular job is planning and execute job by looking for a pegasus profile key named type with value recursive e.g. recursive -Dpegasus.user.properties=/lfs1/work/conf/properties --dax /lfs1/work/dax3 -s tacc -o local --nocleanup --force --rescue 1 --cluster horizontal -vvvvv --dir ./dag_3 09) Rescue option to pegasus-plan for deferred planning A rescue option to pegasus-plan has been added. The rescue option takes in an integer value, that determines the number of times rescue dags are submitted before re-planning is triggered in case of failures in deferred planning. For this to work, Condor 7.1.0 or higher is required as it relies on the recently implemented auto rescue feature in Condor DAGMan. Even though re-planning is triggered, Condor DAGMan still ends up submitting the rescue dag as it auto detects. The fix to it is to remove the rescue dag files in case of re-planning. This is still to be implemented 10) -j|--job-prefix option to pegasus-plan pegasus-plan can now be passed the -j|--job-prefix option to designate the prefix that needs to be used for constructing the job submit file. 11) Executing workflows on Amazon EC2 Pegasus now has support of running workflows on EC2 with the storage of files on S3. This feature is still in testing phase and has not been tested fully. To execute workflows on EC2/S3, Pegasus needs to be configured to use S3 specific implementations of it's internal API's a) First level Staging API - The S3 implementation stages in from the local site ( submit node ) to a bucket on S3. Similarly the data is staged back from the bucket to the local site ( submit node ) . All the first level transfers happen between the submit node and the cloud. This means that input data can *only* be present on the submit node when running on the cloud, and the output data can be staged back only to the submit node. b) Second Level Staging API - The S3 implementation retrieves input data from the bucket to the worker node tmp directory and puts created data back in the bucket. c) Directory creation API - The S3 implementation creates a bucket on S3 for the workflow instead of a directory. d) Cleanup API - To cleanup files from the workflow specific bucket on S3 during workflow execution. The above implementations rely on s3cmd command line client to interface with S3 filesystem. There should be an entry in the transformation catalog with the fully qualified name as amazon::s3cmd for the site corresponding to the cloud and the local site. To configure Pegasus to use these implementations set the following properties pegasus.transfer.*.impl S3 pegasus.transfer.sls.*.impl S3 pegasus.dir.create.impl S3 pegasus.file.cleanup.impl S3 pegasus.execute.*.filesystem.local true 12) Support for OSU Datacutter jobs Pegasus has new gridstart mode called DCLauncher. This allows us to launch the Data Cutter jobs using the wrapper that OSU group wrote. Pegasus now supports the condor parallel universe. To launch a job using DCLauncher, the following pegasus profile keys need to be associated with the job gridstart to DCLauncher gridstart.path the path to the DCLauncher script 13) New Pegasus Profiles Keys a) create.dir - this profile key triggers kicstart to create and change directories before launching a job. b) gridstart.path - this profile key specifies the path to the gridstart used to launch a particular job c) runtime - this profile key is useful when using Heft based site selection. It allows users to associate expected runtimes of jobs with the job description in DAX. 14) Kickstart captures machine information Kickstart now logs machine information in the invocation record that it creates for each job invocation. The Kickstart JAVA parser can parse both records in old and new format. A snippet of machine information captured is show below 2008-09-23T13:58:05.211-07:00 #2 SMP Thu Apr 28 18:41:14 PDT 2005 2008-09-12T12:03:49.772-07:00 Intel(R) Xeon(TM) CPU 2.40GHz 15) Kickstart works in cygwin environment Kickstart now compiles on cygwin. Kickstart could not find SYS_NMLN variable in Cygwin to determine the uname datastructure's size. Added a fix in the Makefile to add CFLAGS -DSYS_NMLN=20 when the OS is Cygwin/Windows The kickstart records generated on cygwin are slightly different from the ones generated unix platforms. The kickstart parser was modified to handle that. The differences are as follows - a) On cygwin inode value is double. The inode value is parsed as double , but cast to long to prevent errors. b) On cygwin the uid and gid values are long. They are passed as long, but cast to int to prevent errors. 16) Changes to dirmanager The dirmanager executable can now remove and create multiple directories. This is achieved by specifying a whitespace separated list of directories to the --dir option. 17) Added color-file option to showjob There is now a --color-file option to show-job in $PEGASUS_HOME/contrib/showlog to pass a file that has the mappings from transformation name to colors. The format of each line is as follows transformation-name color This can be used to assign different colors to compute jobs in a workflow. The default color assigned is gray if none is specified. 18) jobstate-summary tool There is a new tool at $PEGASUS_HOME/bin/jobstate-summary. It attempts to give a summary for the workflow. Should help in jobstate-summ debugging failed job information. It will shows all the information associated with a failed job. It gets the list of failed job from the jobstate.log file. After that it parses latest kickstart file for each failed job and show the exit code and all the other information. Usage: jobstate-summary --i [--v(erbose)] [--V(ersion)] [--h(elp)] Input directory is the place where all the log files including jobstate.log file reside. v option is for verbose debugging. V option gives the pegasus version. h option prints the help message. A sample run is like jobstate-summary -i /dags/pegasus/diamond/run0013 -v 19) Support for DAGMan node categories Pegasus now supports DAGMan node categories. DAGMan now allows to specify CATEGORIES for jobs, and then specify tuning parameters ( like maxjobs ) per category. This functionality is exposed in Pegasus as follows The user can associate a dagman profile key category with the jobs. The key attribute for the profile is category and value is the category to which the job belongs to. For example you can set the dagman category in the DAX for a job as follows short-running -a top -T 6 -i -o The property pegasus.dagman.[category].maxjobs can be used to control the value. For the above example, the user can set the property as follows pegasus.dagman.short-running.maxjobs 2 In the DAG file generated you will see the category associated with jobs. For the above example, it will look as follows MAXJOBS short-running 2 CATEGORY preprocess_ID000001 short-running JOB preprocess_ID000001 preprocess_ID000001.sub RETRY preprocess_ID000001 2 20) Handling of pass through LFN If a job in a DAX, specifies the same LFN as an input and an output, it is a pass through LFN. Internally, the LFN is tagged only as an input for the job. The reason for this, being that we need to make sure that the replica catalog is queried for the location of the LFN. If this is not handled specially, then LFN is tagged internally as inout ( meaning it is generated during workflow execution ). LFN's with type inout are not queried for in the Replica Catalog in the force mode of operation 21) Tripping seqexec on first job failures By default seqexec does not stop execution even if one of the clustered jobs it is executing fails. This is because seqexec tries to get as much work done as possible. If for some reason, you want to make seqexec stop on first job failure, set the following property in the properties file pegasus.clusterer.job.aggregator.seqexec.firstjobfail true 22) New properties to choose the cleanup implementation Two new properties were added to select the strategy and implementation for file cleanup. pegasus.file.cleanup.strategy pegasus.file.cleanup.implementation Currently there is only one cleanup strategy ( InPlace ) that can be used and is loaded by default. The cleanup implementations that can be used are - Cleanup ( default) - RM - S3 Detailed documentation can be found at $PEGASUS_HOME/etc/sample.properties. 23) New properties to choose the create dir implementation The property pegasus.dir.create was deprecated. It has been replaced by pegasus.dir.create.strategy Additionally, a user can specify a property to choose the implementation used to create the directory on the remote sites. pegasus.dir.create.impl The create directory implementation that can be used are - DefaultImplementation uses $PEGASUS_HOME/bin/dirmanager executable to create a directory on the remote site. - S3 usese s3cmd to create a bucket on amazon S3. BUGS FIXED ---------- 1) Makefile for kickstart to build on Cygwin Kickstart could not find SYS_NMLN variable in Cygwin to determine the uname datastructure's size. Added a fix in the Makefile to add CFLAGS -DSYS_NMLN=20 when the OS is Cygwin/Windows 2) Bug fix to getsystem release tools Some systems have started using / in their system version name which causes failures in Pegasus build process. Fixed the getsystem release script which converts / into _ 3) Bug fix in file cleanup module when stageout is enabled. There was a bug in how the dependencies are added between the stageout jobs and the file cleanup jobs. In certain cases, cleanup could occur before the output was staged out. This is fixed now. This bug was tracked through bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=37 4) Bug fix to deferred planning Deferred planning used to fail if pegasus-plan was not given -o option . This is fixed now and was tracked through bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=34 5) Bug fix to caching on entries from Transformation Catalog In certain cases, caching of entries did not work for the INSTALLED case. This is fixed now and was tracked through bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=33 =============================== Release Notes for PEGASUS 2.1.0 =============================== NEW FEATURES -------------- 1) Support for Second Level Staging Normally, Pegasus transfers the data to and from a directory on the shared filesystem on the head node of a compute site. The directory needs to be visible to both the head node and the worker nodes for the compute jobs to execute correctly. In the case, where the worker nodes cannot see the filesystem of the head node there needs to be a Second Level Staging (SLS) process that transfers the data from the head node to a directory on the worker node tmp. To achieve this, Pegasus uses the pre-job and post-job feature of kickstart to pull the input data from the head node and push back the output data of a job to the head node. Even though we do SLS, Pegasus still relies on the existence of a shared file system due to the following two reasons a) for the transfer executable to pick up the proxy, that we transfer from the submit host to the head node. b) to access sls input and output files that contain the file transfer urls to manage the transfer of data to worker node and back to headnode. Additionally, if you are running your workflows on a Condor pool, one can bypass the use of kickstart to do the SLS. Please contact pegasus@isi.edu for more details of this scenario. In this case, the workflows generated by Pegasus have been shown to run in total non shared filesystem environment. To use this feature, user needs to set pegasus.execute.*.filesystem.local true The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=21 2) New DAX schema The new has release moved to the new DAX schema version 2.1. Schema is available online http://pegasus.isi.edu/schema/dax-2.1.xsd The main change in it is that the dontTransfer and dontRegister flags have been replaced by transfer and register flags. Changes were made both to the Java DAX Generator and Pegasus to conform to the new schema. Additionally, the DAX parser in Pegasus looks at the schema version to determine whether to pick up dontTransfer and dontRegister flags ( to support backward compatibility with the older daxes). Also with the filename type added a type attribute. It defaults to data. Additionally user can have the values executable|pattern. Users can use type=executable to specify any dependant executables that their jobs required. All executable files are tracked in the transformation catalog. The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=6 3) Workflow and Planner Metrics Logging Workflow and Planning metrics are now logged for each workflow that is planned by Pegasus. By default, they are logged to $PEGASUS_HOME/var/pegasus.log To turn metrics logging off, set pegasus.log.metrics to false To change the file to which the metrics are logged set pegasus.log.metrics.file path/to/log/file Here is a snippet from the log file that shows what is logged { user = vahi vogroup = pegasus-ligo submitdir.base = /nfs/asd2/vahi/jbproject/Pegasus/dags submitdir.relative = /vahi/pegasus-ligo/blackdiamond/run0064 planning.start = 2007-09-24T18:14:23-07:00 planning.end = 2007-09-24T18:14:29-07:00 properties =/nfs/asd2/vahi/jbproject/Pegasus/dags/vahi/pegasus-ligo/blackdiamond/run0064/pegasus.6766.properties dax = /nfs/asd2/vahi/jbproject/Pegasus/blackdiamond_dax.xml dax-label = blackdiamond compute-jobs.count = 3 si-jobs.count = 1 so-jobs.count = 3 inter-jobs.count = 0 reg-jobs.count = 3 cleanup-jobs.count = 2 total-jobs.count = 14 } 4) Support for querying multiple replica catalogs Pegasus now allows the users to query multiple replica catalogs at the same time to discover the locations of input data sets. For this a new Replica Catalog implmentation was developed. The users need to do the following to use it. Set the replica catalog to MRC in the properties file. pegasus.catalog.replica MRC Each associated replica catalog can be configured via properties as follows. The user associates a variable name referred to as [value] for each of the catalogs, where [value] is any legal identifier (concretely [A-Za-z][_A-Za-z0-9]*) For each associated replica catalogs the user needs to specify the following properties. pegasus.catalog.replica.mrc.[value] to specify the type of replica catalog pegasus.catalog.replica.mrc.[value].key to specify a property name key for a particular catalog For example, if a user wants to query two lrc's at the same time he/she can specify as follows pegasus.catalog.replica.mrc.lrc1 LRC pegasus.catalog.replica.mrc.lrc2.url rls://sukhna pegasus.catalog.replica.mrc.lrc2 LRC pegasus.catalog.replica.mrc.lrc2.url rls://smarty In the above example, lrc1, lrc2 are any valid identifier names and url is the property key that needed to be specified. 5) Local Replica Selector Pegasus has a new local replica selector that only prefers replicas from the local host and that start with a file: URL scheme. It is useful, when users want to stagin files to a remote site from your submit host using the Condor file transfer mechanism. In order to use this, set the replica selector to Local in the properties. - pegasus.selector.replica Local 6) Heft Based Site Selector Added a new site selector that is based on the HEFT processor scheduling algorithm. The implementation assumes default data communication costs when jobs are not scheduled on to the same site. Later on this may be made more configurable. The runtime for the jobs is specified in the transformation catalog by associating the pegasus profile key runtime with the entries. The number of processors in a site is picked up from the attribute idle-nodes associated with the vanilla jobmanager of the site in the site catalog. To use this site selector, users need to set the following property pegasus.selector.site Heft 7) Using multiple grid ftp servers for stageout If a user specifies multiple grid ftp servers for the output site in the site catalog, the stageout jobs will be distributed over all of them. More info can be found at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=3 8) Scalable Directory structure on the stageout site Users can now distribute their output files in a directory structure on the output site. On setting the Boolean property pegasus.dir.storage.deep to true, the relative submit directory structure is replicated on the output site. Additionally, within this directory the files are distributed into sub directories with each subdirectory having 256 files. The subdirectories are named in decimal format. 9) Specifying the jobmanager universe for the compute jobs in the DAX Users can know specify the jobmanager type for the compute jobs in the DAX. This is achieved by specifying the jobmanager.universe profile key in the hints namespace. Valid values for this are transfer|vanilla. This is useful for users who are running on a grid site, with the worker nodes behind a firewall and want a subset of their jobs to run on the head node. More info can be found at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=5 10) Stork Support for doing transfers The internal transfer interfaces of Pegasus were updated to use the latest version of Stork for managing data transfers. To use Stork implementations set the following pegasus.transfer.refiner = SDefault pegasus.transfer.*.impl=Stork 11) nogrid option for pegasus-run pegasus-run has now a --nogrid option. This bypasses the checks for proxy existence that are done before submitting the workflow for execution. It disables all globus checks like check for environment variables GLOBUS_lOCATION and LD_LIBRARY_PATH. This is useful for running workflows in native Condor environments. 12) Submitting workflows directly using pegauss-plan A new option --submit|-S option was added to pegasus-plan. This allows users to submit workflows directly, after they have been planned. 13) Specifying relative submit directory Since pegasus 2.0 , pegasus-plan creates a directory structure in the base submit directory. The base submit directory is specified by --dir option to pegasus-plan. If a user, want to specify a relative submit directory, he can use the --relative-dir option to pegasus-plan. The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=14 BUGS FIXED ---------- 1) Specifying Relative Path to the DAX An incorrect path to the dax was generated internally when a user specified a relative path to the dax to pegasus-plan This is fixed now, and was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=13 2) RLS java api bug fix 4114 (globus bugzilla number ) globus_rls_client.jar updated with the bug fix 4114. Also added the jar for java 1.4 in lib/java1.4 3) Passing of DAGMan parameters via properties in case of deferred planning In case of deferred planning, the properties that control DAGMan execution were not being passed as options to DAGMan. This is now fixed. The following properties are being handled correctly now. pegasus.dagman.maxjobs pegasus.dagman.maxpre pegasus.dagman.maxidle pegasus.dagman.maxpost =============================== Release Notes for PEGASUS 2.0.1 ================================ There is new documentation in the form of a quick start guide and glossary in the docs directory. More documentation will be coming soon and will be available in the release as well as on the pegasus website under documentation. NEW FEATURES ------------ 1) Pegasus now can store provenance data into PASOA. The actions taken by the various refiners are logged into the store. It is still an experimental feature. To turn it on, set the property pegasus.catalog.provenance.refinement pasoa The PASOA store needs to run on localhost on port 8080 https://localhost:8080/preserv-1.0 2) You can also use Pegasus to store execution provenance in PASOA. To use set the properties pegasus.exitcode.impl=pasoa pegasus.exitcode.path.pasoa=${pegasus.home}/bin/pasoa-client pegasus.exitcode.arguments= BUGS FIXED ---------- 1) sitecatalog-converter patch to fix pegasus profile conversion 2) pegasus-submit-dag added --maxidle option to allow setting number of idle jobs on the remote site. 3) VORS.pm Fixed a small typo in there, that lead to perl compilation errors. 4) pegasus-get-sites Removed local tc entries and added environments for PEGASUS_HOME, GLOBUS_LOCATION and LD_LIBRARY_PATH to local site. 5) mpiexec The execution of clustered jobs via mpiexec was broken in 2.0 release. That is now fixed. 6) exitcode/exitpost Fixed a bug in exitcode that caused a call to the DB PTC even though the property pegasus.catalog.provenance was not set. KNOWN BUGS ----------- =============================== Release Notes for PEGASUS 2.0.0 =============================== NEW FEATURES -------------- pegasus-plan. This is the main client for invoking pegasus. The earlier gencdag command is now called pegasus-plan pegasus-run This is the client that submits the planned workflow to Condor and starts a monitoring tailstatd daemon pegasus-status This client lets you monitor a particular workflow. Its a wrapper around condor-q pegasus-remove. This client lets you remove a running workflow from the condor queue. A rescue dag will be generated which can be submitted by just running pegasus-run on the dag directory.