With Pegasus 4.0 effort has been made to move the Pegasus installation to be FHS compliant, and to make workflows run better in Cloud environments and distributed grid environments. This chapter is for existing users of Pegasus who use Pegasus 3.1 to run their workflows and walks through the steps to move to using Pegasus 4.0
Pegasus 4.0 is the first release of Pegasus which is Filesystem Hierarchy Standard (FHS) compliant. The native packages no longer installs under /opt. Instead, pegasus-* binaries are in /usr/bin/ and example workflows can be found under /usr/share/pegasus/examples/.
To find Pegasus system components, a pegasus-config tool is provided. pegasus-config supports setting up the environment for
For example, to find the PYTHONPATH for the DAX API, run:
export PYTHONPATH=`pegasus-config --python`
For complete description of pegasus-config, see the man page.
Starting Pegasus 4.x the monitoring and statistics database schema has changed. If you want to use the pegasus-statistics, pegasus-analyzer and pegasus-plots against a 3.x database you will need to upgrade the schema first using the schema upgrade tool /usr/share/pegasus/sql/schema_tool.py or /path/to/pegasus-4.x/share/pegasus/sql/schema_tool.py
Upgrading the schema is required for people using the MySQL database for storing their monitoring information if it was setup with 3.x monitoring tools.
If your setup uses the default SQLite database then the new databases run with Pegasus 4.x are automatically created with the correct schema. In this case you only need to upgrade the SQLite database from older runs if you wish to query them with the newer clients.
To upgrade the database
For SQLite Database cd /to/the/workflow/directory/with/3.x.monitord.db Check the db version /usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db 2012-02-29T01:29:43.330476Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.init | 2012-02-29T01:29:43.330708Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 2012-02-29T01:29:43.348995Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Current version set to: 3.1. 2012-02-29T01:29:43.349133Z ERROR netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool. Convert the Database to be version 4.x compliant /usr/share/pegasus/sql/schema_tool.py -u connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db 2012-02-29T01:35:35.046317Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.init | 2012-02-29T01:35:35.046554Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 2012-02-29T01:35:35.064762Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Current version set to: 3.1. 2012-02-29T01:35:35.064902Z ERROR netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Schema version 3.1 found - expecting 4.0 - database admin will need to run upgrade tool. 2012-02-29T01:35:35.065001Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.upgrade_to_4_0 | Upgrading to schema version 4.0. Verify if the database has been converted to Version 4.x /usr/share/pegasus/sql/schema_tool.py -c connString=sqlite:////to/the/workflow/directory/with/workflow.stampede.db 2012-02-29T01:39:17.218902Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.init | 2012-02-29T01:39:17.219141Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema.start | 2012-02-29T01:39:17.237492Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Current version set to: 4.0. 2012-02-29T01:39:17.237624Z INFO netlogger.analysis.schema.schema_check.SchemaCheck.check_schema | Schema up to date. For upgrading a MySQL database the steps remain the same. The only thing that changes is the connection String to the database E.g. /usr/share/pegasus/sql/schema_tool.py -u connString=mysql://username:password@server:port/dbname
After the database has been upgraded you can use either 3.x or 4.x clients to query the database with pegasus-statistics, as well as pegasus-plots and pegasus-analyzer.
Existing users that are running workflows in a cloud environment with a non shared filesystem setup have to do some trickery in the site catalog to include placeholders for local/submit host paths for execution sites when using CondorIO. In Pegasus 4.0, this has been rectified.
For example, for a 3.1 user, to run on a local-condor pool without a shared filesystem and use Condor file IO for file transfers, the site entry looks something like this
<site handle="local-condor" arch="x86" os="LINUX"> <grid type="gt2" contact="localhost/jobmanager-fork" scheduler="Fork" jobtype="auxillary"/> <grid type="gt2" contact="localhost/jobmanager-condor" scheduler="unknown" jobtype="compute"/> <head-fs> <!-- the paths for scratch filesystem are the paths on local site as we execute create dir job on local site. Improvements planned for 4.0 release.--> <scratch> <shared> <file-server protocol="file" url="file:///" mount-point="/submit-host/scratch"/> <internal-mount-point mount-point="/submit-host/scratch"/> </shared> </scratch> <storage> <shared> <file-server protocol="file" url="file:///" mount-point="/glusterfs/scratch"/> <internal-mount-point mount-point="/glusterfs/scratch"/> </shared> </storage> </head-fs> <replica-catalog type="LRC" url="rlsn://dummyValue.url.edu" /> <profile namespace="env" key="PEGASUS_HOME" >/cluster-software/pegasus/2.4.1</profile> <profile namespace="env" key="GLOBUS_LOCATION" >/cluster-software/globus/5.0.1</profile> <!-- profies for site to be treated as condor pool --> <profile namespace="pegasus" key="style" >condor</profile> <profile namespace="condor" key="universe" >vanilla</profile> <!-- to enable kickstart staging from local site--> <profile namespace="condor" key="transfer_executable">true</profile> </site>
With Pegasus 4.0 the site entry for a local-condor pool can be as concise as the following
<site handle="condorpool" arch="x86" os="LINUX"> <head-fs> <scratch /> <storage /> </head-fs> <profile namespace="pegasus" key="style" >condor</profile> <profile namespace="condor" key="universe" >vanilla</profile> </site>
The planner in 4.0 correctly picks up the paths from the local site entry to determine the staging location for the condor io on the submit host.
Users should read pegasus data staging configuration chapter and also look in the examples directory ( share/pegasus/examples).