Pegasus tries to do data staging from localhost by default, but some data scenarios makes some remote jobs do data staging. An example of such a case is when running in nonsharedfs mode. Depending on the transfer protocols used, the job may have to carry credentials to enable these data transfers. To specify where which credential to use and where Pegasus can find it, use environment variable profiles in your site catalog. The supported credential types are X.509 grid proxies, Amazon AWS S3 keys, Google Cloud Platform OAuth token (.boto file), iRods password and SSH keys.
Credentials are usually associated per site in the site catalog. Users can associate the credentials either as a Pegasus profile or an environment profile with the site.
A pegasus profile with the value pointing to the path to the credential on the local site or the submit host. If a pegasus credential profile associated with the site, then Pegasus automatically transfers it along with the remote jobs.
A env profile with the value pointing to the path to the credential on the remote site. If an env profile is specified, then no credential is transferred along with the job. Instead the job's environment is set to ensure that the job picks up the path to the credential on the remote site.
Tip
Specifying credentials as Pegasus profiles was introduced in 4.4.0 release.
In case of data transfer jobs, it is possible to associate different credentials for a single file transfer ( one for the source server and the other for the destination server) . For example, when leveraging GridFTP transfers between two sides that accept different grid credentials such as XSEDE Stampede site and NCSA Bluewaters. In that case, Pegasus picks up the associated credentials from the site catalog entries for the source and the destination sites associated with the transfer.
If the grid proxy is required by transfer jobs, and the proxy is
in the standard location, Pegasus will pick the proxy up automatically.
For non-standard proxy locations, you can use the
X509_USER_PROXY
environment variable. Site catalog
example:
<profile namespace="pegasus" key="X509_USER_PROXY" >/some/location/x509up</profile>
If a workflow is using s3 URLs, Pegasus has to be told where to
find the .s3cfg file. This format of the file is described in the pegaus-s3 command line client's man
page. For the file to be picked up by the workflow, set the
S3CFG
profile to the location of the file. Site
catalog example:
<profile namespace="pegasus" pegasus="S3CFG" >/home/user/.s3cfg</profile>
If a workflow is using gs:// URLs, Pegasus needs access to a Google Storage service account. First generate the credential by following the instructions at:
https://cloud.google.com/storage/docs/authentication#service_accounts
Download the credential in PKCS12 format, and then use "gsutil config -e" to generate a .boto file. For example:
$ gsutil config -e This command will create a boto config file at /home/username/.boto containing your credentials, based on your responses to the following questions. What is your service account email address? some-identifier@developer.gserviceaccount.com What is the full path to your private key file? /home/username/my-cred.p12 What is the password for your service key file [if you haven't set one explicitly, leave this line blank]? Please navigate your browser to https://cloud.google.com/console#/project, then find the project you will use, and copy the Project ID string from the second column. Older projects do not have Project ID strings. For such projects, click the project and then copy the Project Number listed under that project. What is your project-id? your-project-id Boto config file "/home/username/.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
Pegasus has to be told where to find both the .boto file as well
as the PKCS12 file. For the files to be picked up by the workflow, set
the BOTO_CONFIG
and GOOGLE_PKCS12
profiles for the storage site. Site catalog example:
<profile namespace="pegasus" key="BOTO_CONFIG" >/home/user/.boto</profile> <profile namespace="pegasus" key="GOOGLE_PKCS12" >/home/user/.google-service-account.p12</profile>
If a workflow is using iRods URLs, Pegasus has to be given an irodsEnv file. It is a standard file, with the addtion of an password attribute. Example when using iRods 3.X:
# iRODS personal configuration file. # # iRODS server host name: irodsHost 'some.host.edu' # iRODS server port number: irodsPort 1259 # Account name: irodsUserName 'someuser' # Zone: irodsZone 'somezone' # this is used with Pegasus irodsPassword 'somesecretpassword'
iRods 4.0 switched to a JSON based configuration file. Pegasus can handle either config file. JSON Example:
{ "irods_host": "some.host.edu", "irods_port": 1247, "irods_user_name": "someuser", "irods_zone_name": "somezone", "irodsPassword" : "somesecretpassword" }
The location of the file can be given to the workflow using the
irodsEnvFile
environment profile. Site catalog
example:
<profile namespace="pegasus" key="irodsEnvFile" >/home/user/.irods/.irodsEnv</profile>
New in Pegasus 4.0 is the support for data staging with scp using ssh public/private key authentication. In this mode, Pegasus transports a private key with the jobs. The storage machines will have to have the public part of the key listed in ~/.ssh/authorized_keys.
Warning
SSH keys should be handled in a secure manner. In order to keep
your personal ssh keys secure, It is recommended that a special set of
keys are created for use with the workflow. Note that Pegasus will not
pick up ssh keys automatically. The user will have to specify which
key to use with SSH_PRIVATE_KEY
.
The location of the ssh private key can be specified with the
SSH_PRIVATE_KEY
environment profile. Site catalog
example:
<profile namespace="pegasus" key="SSH_PRIVATE_KEY" >/home/user/wf/wfsshkey</profile>