8.4. Container Execution Model

User's containerized applications are launched as part of PegasusLite jobs. PegasusLite job when starting on a remote worker node.

  1. Sets up a directory to run a user job in.

  2. Pulls the container image to that directory

  3. Optionally, loads the container from the container image file and sets up the user to run as in the container (only applicable for Docker containers)

  4. Mounts the job directory into the container as /scratch for Docker containers, while as /srv for Singularity containers.

  5. Container will run a job specific script that figures created by PegasusLite that does the following:

    1. Figures the appropriate Pegasus worker to use in the container if not already installed

    2. Sets up the job environment to use including transfer and setup of any credentials transferred as part of PegasusLite

    3. Pulls in all the relevant input data, executables required by the job

    4. Launches the user application using pegasus-kickstart.

  6. Optionally, shuts down the container (only applicable for Docker containers)

  7. Ships out the output data to the staging site

  8. Cleans up the directory on the worker node.

Note

Starting Pegasus 4.9.1 the container data transfer model has been changed. Instead of data transfers for the job occurring outside the container in the PegasusLite wrapper, they now happen when the user job starts in the container.

In versions of Pegasus >= 4.9.1 the transfers are handled from within the container, and thus container recipes require some extra attention. A Dockerfile example that prepares a container for GridFTP transfers is provided below.

In this example there are three sections.

  • Essential Packages

  • Install Globus Toolkit

  • Install CA Certs

From the "Essential Packages", python and either curl or wget have to be present. "Install Globus Toolkit", sets up the enviroment for GridFTP transfers. And "Install CA Certs" copies the grid certificates in the container.

Note

Globus Toolkit introduced some breaking changes in August 2018 to its authentication module, and some sites haven't upgraded their installations (eg. NERSC). GridFTP in order to authenticate successfully, requires the libglobus-gssapi-gsi4 package to be pinned to the version 13.8-1. The code snipet below contains installation directives to handle this but they are commented out.

##########################################
#### This Container Supports GridFTP  ####
##########################################

FROM ubuntu:18.04

#### Essential Packages ####
RUN apt-get update &&\
apt-get install -y software-properties-common curl wget python unzip &&\
rm -rf /var/lib/apt/lists/*

#### Install Globus Toolkit ####
RUN wget -nv http://www.globus.org/ftppub/gt6/installers/repo/globus-toolkit-repo_latest_all.deb &&\
dpkg -i globus-toolkit-repo_latest_all.deb &&\
apt-get update &&\
# apt-get install -y libglobus-gssapi-gsi4=13.8-1+gt6.bionic &&\
# apt-mark hold libglobus-gssapi-gsi4 &&\
apt-get install -y globus-data-management-client &&\
rm -f globus-toolkit-repo_latest_all.deb &&\
rm -rf /var/lib/apt/lists/*

#### Install CA Certs ####
RUN mkdir -p /etc/grid-security &&\
cd /etc/grid-security &&\
wget -nv https://download.pegasus.isi.edu/containers/certificates.tar.gz &&\
tar xzf certificates.tar.gz &&\
rm -f certificates.tar.gz

##########################################
#### Your Container Specific Commands ####
##########################################