Pegasus 2.3.0 Released

with No Comments
===============================
Release Notes for PEGASUS 2.3.0
===============================
 
NEW FEATURES
--------------
1) Regex Based Replica Selection
   Pegasus now allows users to use regular expression based replica
   selection. To use this replica selector, users need to set the
   following property 
   
   pegasus.selector.replica  Regex 

   The Regex replica selector allows the user allows the user to
   specifiy the regex expressions to use for ranking various PFNs
   returned from the Replica Catalog for a particular LFN. This
   replica selector selects the highest ranked PFN i.e the replica
   with the lowest rank value. 

   The regular expressions are assigned different rank, that determine
   the order in which the expressions are employed. The rank values
   for the regex can expressed in user properties using the property. 

   pegasus.selector.replica.regex.rank.[value]

   The value is an integer value that denotes the rank of an
   expression with a rank value of 1 being the highest rank. 

   For example, a user can specify the following regex expressions
   that will ask Pegasus to prefer file URL's over gsiftp url's from
   example.isi.edu 
   
   pegasus.selector.replica.regex.rank.1 file://.*
   pegasus.selector.replica.regex.rank.2 gsiftp://example\.isi\.edu.*

   User can specify as many regex expressions as they want.
   Since Pegasus is in Java , the regex expression support is what
   Java supports. It is pretty close to what is supported by
   Perl. More details can be found at
   http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html 

   There is documentation about the new replica selector in the
   properties document . It can also be found at
   $PEGASUS_HOME/etc/sample.properties 

   To use this set pegasus.selector.replica Regex


2) Automatic Determination of pool attributes in RLS Replica Catalog

   Pegasus can now associate a pool attribute with the replica catalog
   entries returned from querying a LRC if the pool attribute is not
   already specified. 

   This is achieved by associating the site handles with corresponding
   LRC url's in the properties file. This mapping tells us what
   default pool attribute should be assigned while querying a
   particular LRC. For example

   pegasus.catalog.replica.lrc.site.llo rls://ldas.ligo-la.caltech.edu:39281
   pegasus.catalog.replica.lrc.site.lho rls://ldas.ligo-wa.caltech.edu:39281

   tells Pegasus that all results from LRC
   rls://ldas.ligo-la.caltech.edu:39281 are associated with site llo 

   Using this feature only makes sense, when a LRC *ONLY* contains
   mapping for data on one site, as in case of LIGO LDR deployment.

3) Pegasus auxillary jobs on submit host now execute in local universe

   All the scheduler universe jobs are now executed in local
   universe. Also any job planned for site local will by default run
   in local universe instead of scheduler universe.

   Additionally, extra checks were put in to handle the Condor File
   Transfer Mechansim issues in case local/scheduler universe. This
   was tracked in bugzilla at
   http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=40 

   A user can override the local universe generation by specifying the
   condor profile key universe and setting it to the value desired.

4) Python API for generating DAX and PDAX
   
   Pegasus now includes a Python API for generating DAXes and PDAXes.

   An example can be found online at
   http://vtcpc.isi.edu/pegasus/index.php/ChangeLog#Added_Python_API_for_DAX_and_PDAX 

   For more information on the DAX API type: pydoc Pegasus.DAX2
   For more information on the PDAX API type: pydoc Pegasus.PDAX2


5) Interface to Engage VO for OSG

   There is a new Site Catalog Implementation called Engage that
   interfaces with the Engage VO to discover resource information
   about OSG from the information published in RENCI glue classads.

   To use it set
   pegasus.catalog.site Engage

   To generate a site catalog using pegasus-get-sites set the source
   option to Engage 

   pegasus-get-sites --source Engage --sc engage.sc.xml

6) Gensim now reports Seqexec Times and Seqexec Delays


   Gensim script ($PEGASUS_HOME/contrib/showlog/gensim) now reports
   the seqexec time and the seqexec delay for the clustered jobs. 

   There are two new columns in the jobs file created by seqexec
   - seqexec
   - seqexec delay.

   The seqexec time is determined from the last line of the .out file
   of the clustered jobs. E.g format [struct stat="OK", lines=4,
   count=4, failed=0,
   duration=21.836,start="2009-02-20T16:14:56-08:00"] 

   The seqexec delay is the seqexec time - kickstart time. 
   
   This useful for analyzing large scale workflow runs.

7) Properties to turn on or off the seqexec progress logging

   The property  pegasus.clusterer.job.aggregator.seqexec.hasgloballog
   is now deprecated.

   It has been replaced by  two boolean properties
   - pegasus.clusterer.job.aggregator.seqexec.log whether to log
     progress or not 
   - pegasus.clusterer.job.aggregator.seqexec.log.global whether to
     log progress to global file or not. 

     The pegasus.clusterer.job.aggregator.seqexec.log.global only
     comes into effect when
     pegasus.clusterer.job.aggregator.seqexec.log is set to true 


8) Passing of the DAX label to kickstart invocation

   Now, the kickstart invocation for the jobs is always passed the dax
   label using the -L option. To disable the passing of the DAX label,
   user needs to set pegasus.gridstart.label to false

   Additionally, the basename option to pegasus-plan overrides the
   label value retrieved from the DAX. 

9) show-job works on MAC OSX platform

   $PEGASUS_HOME/contrib/showlog/show-job now does not fail on
   unavailability of convert program. It only logs a warning and
   creates the    EPS File , but not the png files. This allows us to
   run show-job on MAC OSX systems. 

10) Enabling InPlace cleanup in deferred planning

    By default in case of deferred planning cleanup is turned off as
    the cleanup algorithm does not work across partitions. 
    However, in scenarios where the partitions themseleves are
    independant ( i.e. dont share files ), user can safely turn on
    cleanup. 

    This can now be done by setting
    pegasus.file.cleanup.scope  deferred

    If the property is set to deferred, and the users wants to disable
    cleanup , they can still specify --nocleanup option on command
    line and that is honored. 

    However in case of scope fullahead for deferred planning, the
    command line options are ignored and always nocleanup is set.
    
11) New Pegasus Job Classad
    
    Pegasus now publishes a job runtime classad with the jobs. The
    class ad key name is pegasus_job_runtime. The value passed to it
    is picked up from the Pegasus Profile runtime. If the Pegaus
    Profile is not associated, then the globus maxwalltime profile key
    is used. If both are not set, then a value of zero is published.

    This job classad can be used for users in case of glidein, to
    ensure that the jobs complete before the nodes expire. 

    For the coral glidein service the sub expression to job
    requirement swould look something like this 

    (CorralTimeLeft > MY.pegasus_job_runtime)

12) [workflow].job.map file

    Pegasus now creates a [workflow].job.map file that links jobs in
    the DAG with the jobs in the DAX. The contents of the file are in
    netlogger format. 

    The [workflow] is replaced by the name of the workflow i.e. same
    prefix as the .dag file 
    
    In the file there are two types of events.
    a) pegasus.job 
    b) pegasus.job.map

    pegasus.job - This event is for all the jobs in the DAG. The
    following information is associated with this event. 

    - job.id the id of the job in the DAG
    - job.class an integer designating the type of the job
    - job.xform the logical transformation which the job refers to.
    - task.count the number of tasks associated with the job. This is
       equal to the number of pegasus.job.task events created for that
     job.
 
    pegasus.job.map - This event allows us to associate a job in the
    DAG with the jobs in the DAX. The following information is
    associated with this event. 

    -task.id the id of the job in the DAG
    -task.class an integer designating the type of the job
    -task.xform the logical transformation which the job refers to.


13) Source Directory for Worker Package Staging

    Users now can specify the  property
    pegasus.transfer.setup.source.base.url to specify the URL to the
    source directory containing the pegasus worker packages. If it is
    not specified, then the worker packages are pulled from the http
    server at pegasus.isi.edu during staging of executables. 


BUGS FIXED
----------
1)  Critical Bug Fix to rc-client

    SCEC reported a bug with the rc-client while doing bulk inserts
    into RLS. The bug was related to how logging is initialized
    internally in the client. 

    Details of the bug fix can be found at
    http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=38

2)  Bug Fix to tailstatd for parsing jobnames with . in them

    There was a bug where tailstatd incorrectly generated events in
    the jobstate.log while parsing condor logs. This was due to an
    errorneous regex expression for determining the event
    POST|PRE SCRIPT STARTED. 

    The earlier expression did not allow for . in jobnames. This is
    especially prevalent in LIGO workflows where the DAX labels have
    . in them. 

    An example of the problem line in DAGMan log
    1/24 10:11:21 Running POST script of Node
    inspiral_hipe_eobinj_cat2_veto.EOBINJ_CAT_2_VETO.daxlalapps_sire_ID000731...

    Earlier the job id was parsed as inspiral_hipe_eobinj_cat2_veto
    instead of
    inspiral_hipe_eobinj_cat2_veto.EOBINJ_CAT_2_VETO.daxlalapps_sire_ID000731 

3) Pegasus Builds on FC10

   Earlier the Pegasus builds were failed on FC10 as the invoke c tool
   did not build correctly. This is now fixed. 

   Details at
   http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=41

4) tailstatd killing jobs by detecting starvation

   tailstatd removes a job after four hours when the job has been
   waiting in the queue WITHOUT being marked as EXECUTE in the condor
   log. To override tailstatd has an option of setting starvation time
   to 0 via command line or via pegasus.max.idletime property.  The if
   condition in the perl script was not accepting 0 as a value when
   trying to override the default 4 hour starvation time. This fix
   allows the value to be set to 0 (turn of starvation checks) or any
   other value via the property pegasus.max.idletime. 

   This was tracked in pegasus jira as bug 40
   http://pegasus.isi.edu/jira/browse/PM-40

Documentation
--------------

1) User Guides
   The release has new user guides about the following
   - Pegasus Job Clustering
   - Pegasus Profiles
   - Pegasus Replica Selection

   The guides are checked in $PEGASUS_HOME/doc/guides

   They can be found online at
   http://pegasus.isi.edu/mapper/doc.php
   
2) Property Document was updated with the new properties introduced.