4.4. Executable Discovery (Transformation Catalog)

The Transformation Catalog maps logical transformations to physical executables on the system. It also provides additional information about the transformation as to what system they are compiled for, what profiles or environment variables need to be set when the transformation is invoked etc.

Pegasus currently supports a Text formatted Transformation Catalog

  1. Text: A multi line text based Transformation Catalog (DEFAULT)

In this guide we will look at the format of the Multiline Text based TC.

4.4.1. MultiLine Text based TC (Text)

The multile line text based TC is the new default TC in Pegasus. This format allows you to define the transformations

The file is read and cached in memory. Any modifications, as adding or deleting, causes an update of the memory and hence to the file underneath. All queries are done against the memory representation. The file sample.tc.text in the etc directory contains an example

tr example::keg:1.0 { 

#specify profiles that apply for all the sites for the transformation 
#in each site entry the profile can be overriden 

  profile env "APP_HOME" "/tmp/myscratch"
  profile env "JAVA_HOME" "/opt/java/1.6"

  site isi {
    profile env "HELLo" "WORLD"
    profile condor "FOO" "bar"
    profile env "JAVA_HOME" "/bin/java.1.6"
    pfn "/path/to/keg"
    arch "x86"
    os "linux"
    osrelease "fc"
    osversion "4"
    type "INSTALLED"
  }

  site wind {
    profile env "CPATH" "/usr/cpath"
    profile condor "universe" "condor"
    pfn "file:///path/to/keg"
    arch "x86"
    os "linux"
    osrelease "fc"
    osversion "4"
    type "STAGEABLE"
  }
}

The entries in this catalog have the following meaning

  1. tr tr - A transformation identifier. (Normally a Namespace::Name:Version.. The Namespace and Version are optional.)

  2. pfn - URL or file path for the location of the executable. The pfn is a file path if the transformation is of type INSTALLED and generally a url (file:/// or http:// or gridftp://) if of type STAGEABLE

  3. site - The site identifier for the site where the transformation is available

  4. type - The type of transformation. Whether it is installed ("INSTALLED") on the remote site or is availabe to stage ("STAGEABLE").

  5. arch, os, osrelease, osversion - The arch/os/osrelease/osversion of the transformation. osrelease and osversion are optional.

    ARCH can have one of the following values x86, x86_64, sparcv7, sparcv9, ppc, aix. The default value for arch is x86

    OS can have one of the following values linux,sunos,macosx. The default value for OS if none specified is linux

  6. Profiles - One or many profiles can be attached to a transformation for all sites or to a transformation on a particular site.

To use this format of the Transformation Catalog you need to set the following properties

  1. pegasus.catalog.transformation=Text

  2. pegasus.catalog.transformation.file=<path to the transformation catalog file>

4.4.1.1. Containerized Applications in the Transformation Catalog

Users can specify what container they want to use for running their application in the Transformation Catalog using the multi line text based format described in this section. Users can specify an optional attribute named container that refers to the container to be used for the application.

tr example::keg:1.0 { 

  #specify profiles that apply for all the sites for the transformation 
  #in each site entry the profile can be overriden 

  profile env "APP_HOME" "/tmp/myscratch"
  profile env "JAVA_HOME" "/opt/java/1.6"

  site isi {
    # environment to be set when the job is run in the container
    # overrides env profiles specified in the container
    profile env "HELLo" "WORLD"
    profile env "JAVA_HOME" "/bin/java.1.6"
    
    profile condor "FOO" "bar"
    
    pfn "/path/to/keg
    arch "x86"
    os "linux"
    osrelease "fc"
    osversion "4"
      
    # INSTALLED means pfn refers to path in the container.
    # STAGEABLE means the executable can be staged into the container
    type "INSTALLED" 

    #optional attribute to specify the container to use
    container "centos-pegasus"
  }
}

cont centos-pegasus{
     # can be either docker or singularity
     type "docker"

     # URL to image in a docker|singularity hub OR
     # URL to an existing docker image exported as a tar file or singularity image
     image "docker:///rynge/montage:latest" 

     # optional site attribute to tell pegasus which site tar file
     # exists. useful for handling file URL's correctly
     image_site "optional site"

     # mount information to mount host directories into container 
     # format src-dir:dest-dir[:options]
     mount "/Volumes/Work/lfs1:/shared-data/:ro"

     # environment to be set when the job is run in the container
     # only env profiles are supported
     profile env "JAVA_HOME" "/opt/java/1.6"	    
}

The container itself is defined using the cont entry. Multiple transformations can refer to the same container.

  1. cont cont - A container identifier.

  2. image - URL to image in a docker|singularity hub or URL to an existing docker image exported as a tar file or singularity image. Example of a docker hub URL is docker:///rynge/montage:latest, while for singularity shub://pegasus-isi/fedora-montage

  3. image_site - The site identifier for the site where the container is available

  4. mount - mount information to mount host directories into container of format src-dir:dest-dir[:options] . Consult Docker and Singularity documentation for options supported for -v and -B options respectively.

  5. Profiles - One or many profiles can be attached to a transformation for all sites or to a transformation on a particular site. For containers, only env profiles are supported.

Note

Containerized Applications can only be specified in the transformation catalog, not via the DAX API.

4.4.2. TC Client pegasus-tc-client

We need to map our declared transformations (preprocess, findrange, and analyze) from the example DAX above to a simple "mock application" name "keg" ("canonical example for the grid") which reads input files designated by arguments, writes them back onto output files, and produces on STDOUT a summary of where and when it was run. Keg ships with Pegasus in the bin directory. Run keg on the command line to see how it works.

$ keg -o /dev/fd/1

Timestamp Today: 20040624T054607-05:00 (1088073967.418;0.022)
Applicationname: keg @ 10.10.0.11 (VPN)
Current Workdir: /home/unique-name
Systemenvironm.: i686-Linux 2.4.18-3
Processor Info.: 1 x Pentium III (Coppermine) @ 797.425
Output Filename: /dev/fd/1

Now we need to map all 3 transformations onto the "keg" executable. We place these mappings in our File transformation catalog for site clus1.

Note

In earlier version of Pegasus users had to define entries for Pegasus executables such as transfer, replica client, dirmanager, etc on each site as well as site "local". This is no longer required. Pegasus versions 2.0 and later automatically pick up the paths for these binaries from the environment profile PEGASUS_HOME set in the site catalog for each site.

A single entry needs to be on one line. The above example is just formatted for convenience.

Alternatively you can also use the pegasus-tc-client to add entries to any implementation of the transformation catalog. The following example shows the addiition the last entry in the File based transformation catalog.

$ pegasus-tc-client -Dpegasus.catalog.transformation=Text \
-Dpegasus.catalog.transformation.file=$HOME/tc -a -r clus1 -l black::analyze:1.0 \
-p gsiftp://clus1.com/opt/nfs/vdt/pegasus/bin/keg  -t STAGEABLE -s INTEL32::LINUX \
-e ENV::KEY3="VALUE3"

2007.07.11 16:12:03.712 PDT: [INFO] Added tc entry sucessfully

To verify if the entry was correctly added to the transformation catalog you can use the pegasus-tc-client to query.

$ pegasus-tc-client -Dpegasus.catalog.transformation=File \
-Dpegasus.catalog.transformation.file=$HOME/tc -q -P -l black::analyze:1.0

#RESID     LTX          PFN                  TYPE              SYSINFO

clus1    black::analyze:1.0    gsiftp://clus1.com/opt/nfs/vdt/pegasus/bin/keg
                STAGEABLE    INTEL32::LINUX

Note

pegasus-tc-client is no longer actively developed and is deprecated.

4.4.3. TC Converter Client pegasus-tc-converter

Pegasus 3.0 by default now parses a file based multi line textual format of a Transformation Catalog. The new Text format is explained in detail in the chapter on Catalogs.

Pegasus 3.0 comes with a pegasus-tc-converter that will convert users old transformation catalog ( File ) to the Text format. Sample usage is given below.

$ pegasus-tc-converter -i sample.tc.data -I File -o sample.tc.text -O Text

2010.11.22 12:53:16.661 PST:   Successfully converted Transformation Catalog from File to Text 
2010.11.22 12:53:16.666 PST:   The output transfomation catalog is in file  sample.tc.text 

To use the converted transformation catalog, in the properties do the following:

  1. unset pegasus.catalog.transformation or set pegasus.catalog.transformation to Text

  2. point pegasus.catalog.transformation.file to the converted transformation catalog