7.8. Local Campus Cluster Using Glite

This section describes the configuration required for Pegasus to generate an executable workflow that uses glite to submit to a Slurm, PBS, or SGE batch system on a local cluster. This environment is referred to as the local campus cluster, as the workflow submit node (Pegasus + HTCondor) need to be installed on a login node (or a node where the local batch scheduler commands can be executed) of the cluster.

Note

Glite is the old name for BLAH (or BLAHP). BLAH binaries are distributed with HTCondor as the "batch_gahp". For historical reasons, we often use the term "glite", and you will see "glite" and "batch_gahp" references in HTCondor, but all of them refer to the same thing, which has been renamed BLAH.

This guide covers Slurm, PBS, Moab, and SGE, but glite also works with other PBS-like batch systems, including LSF, Cobalt and others. If you need help configuring Pegasus and HTCondor to work with one of these systems, please contact pegasus-support@isi.edu. For the sake of brevity, the text below will say "PBS", but you should read that as "PBS or PBS-like system such as SGE, Moab, LSF, and others".

This is because the glite layer communicates with the batch system running on the cluster using squeue/qsub/... or equivalent commands. If you can submit jobs to the local scheduler from the workflow submit host, then the local HTCondor can be used to submit jobs via glite (with some modifications described below). If you need to SSH to a different cluster head node in order to submit jobs to the scheduler, then you can use BOSCO, which is documented in another section.

Tip

There is also a way to do remote job submission via glite even if you cannot SSH to the head node. This might be the case, for example, if the head node requires 2-factor authentication (e.g. RSA tokens). This approach is called the "Reverse GAHP" and you can find out more information on the GitHub page. All it requires is SSH from the cluster head node back to the workflow submit host.

In either case, you need to modifiy the HTCondor glite installation that will be used to submit jobs to the local scheduler. To do this, run the pegasus-configure-glite command. This command will install all the required scripts to map Pegasus profiles to batch-system specific job attributes, and add support for Moab. You may need to run it as root depending on how you installed HTCondor.

Tip

HTCondor has an issue for the Slurm configuration when running on Ubuntu systems. Since in Ubuntu, /bin/sh does not link to bash, the Slurm script will fail when trying to run the source command. A quick fix to this issue is to force the script to use bash. In the bls_set_up_local_and_extra_args function of the blah_common_submit_functions.sh script, which is located in the same folder as the installation above, only add bash before $bls_opt_tmp_req_file >> $bls_tmp_file 2> /dev/null command line.

In order to configure a workflow to use glite you need to create an entry in your site catalog for the cluster and set the following profiles:

  1. pegasus profile style with value set to glite.

  2. condor profile grid_resource with value set to batch slurm, batch pbs, batch sge or batch moab.

An example site catalog entry for a local glite PBS site looks like this:

<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0">

    <site  handle="local" arch="x86" os="LINUX">
        <directory type="shared-scratch" path="/lfs/shared-scratch/glite-sharedfs-example/work">
            <file-server operation="all" url="file:///lfs/local-scratch/glite-sharedfs-example/work"/>
        </directory>
        <directory type="local-storage" path="/shared-scratch//glite-sharedfs-example/outputs">
            <file-server operation="all" url="file:///lfs/local-scratch/glite-sharedfs-example/outputs"/>
        </directory>
    </site>

    <site  handle="local-slurm" arch="x86" os="LINUX">

        <!-- the following is a shared directory shared amongst all the nodes in the cluster -->
        <directory type="shared-scratch" path="/lfs/glite-sharedfs-example/local-slurm/shared-scratch">
            <file-server operation="all" url="file:///lfs/glite-sharedfs-example/local-slurm/shared-scratch"/>
        </directory>

        <profile namespace="env" key="PEGASUS_HOME">/lfs/software/pegasus</profile>

        <profile namespace="pegasus" key="style" >glite</profile>

        <profile namespace="condor" key="grid_resource">batch slurm</profile>
        <profile namespace="pegasus" key="queue">normal</profile>
        <profile namespace="pegasus" key="runtime">30000</profile>
    </site>

</sitecatalog>
    

Tip

Starting 4.2.1, in the examples directory you can find a glite shared filesystem example that you can use to test out this configuration.

You probably don't need to know this, but Pegasus generates a +remote_cerequirements expression for an HTCondor glite job based on the Pegasus profiles associated with the job. This expression is passed to glite and used by the *_local_submit_attributes.sh scripts installed by pegasus-configure-glite to generate the correct batch submit script. An example +remote_cerequirements classad expression in the HTCondor submit file looks like this:

+remote_cerequirements = JOBNAME=="preprocessj1" && PASSENV==1 && WALLTIME=="01:00:00" && \
 EXTRA_ARGUMENTS=="-N testjob -l walltime=01:23:45 -l nodes=2" && \
 MYENV=="CONDOR_JOBID=$(cluster).$(process),PEGASUS_DAG_JOB_ID=preprocess_j1,PEGASUS_HOME=/usr,PEGASUS_WF_UUID=aae14bc4-b2d1-4189-89ca-ccd99e30464f"

The job name and environment variables are automatically passed through to the remote job.

The following sections document the mapping of Pegasus profiles to batch system job requirements as implemented by Pegasus, HTCondor, and glite.

7.8.1. Setting job requirements

The job requirements are constructed based on the following profiles:

Table 7.1. Mapping of Pegasus Profiles to Job Requirements

Profile Key Key in +remote_cerequirements SLURM parameter PBS Parameter SGE Parameter Moab Parameter Cobalt Parameter Description
pegasus.cores CORES --ntasks cores n/a -pe ompi n/a --proccount cores Pegasus uses cores to calculate either nodes or ppn. If cores and ppn are specified, then nodes is computed. If cores and nodes is specified, then ppn is computed. If both nodes and ppn are specified, then cores is ignored. The resulting values for nodes and ppn are used to set the job requirements for PBS and Moab. If neither nodes nor ppn is specified, then no requirements are set in the PBS or Moab submit script. For SGE, how the processes are distributed over nodes depends on how the parallel environment has been configured; it is set to 'ompi' by default.
pegasus.nodes NODES --nodes nodes -l nodes n/a -l nodes -n nodes This specifies the number of nodes that the job should use. This is not used for SGE.
pegasus.ppn PROCS n/a -l ppn n/a -l ppn --mode c[ppn] This specifies the number of processors per node that the job should use. This is not used for SGE.
pegasus.runtime WALLTIME --time walltime -l walltime -l h_rt -l walltime -t walltime This specifies the maximum runtime for the job in seconds. It should be an integer value. Pegasus converts it to the "hh:mm:ss" format required by the batch system. The value is rounded up to the next whole minute.
pegasus.memory PER_PROCESS_MEMORY --mem memory -l pmem -l h_vmem --mem-per-cpu pmem n/a This specifies the maximum amount of physical memory used by any process in the job. For example, if the job runs four processes and each requires up to 2 GB (gigabytes) of memory, then this value should be set to "2gb" for PBS and Moab, and "2G" for SGE. The corresponding PBS directive would be "#PBS -l pmem=2gb".
pegasus.project PROJECT --account project_name -A project_name n/a -A project_name -A project_name Causes the job time to be charged to or associated with a particular project/account. This is not used for SGE.
pegasus.queue QUEUE --partition -q -q -q   This specifies the queue for the job. This profile does not have a corresponding value in +remote_cerequirements. Instead, Pegasus sets the batch_queue key in the Condor submit file, which gLite/blahp translates into the appropriate batch system requirement.
globus.totalmemory TOTAL_MEMORY --mem memory -l mem n/a -l mem n/a The total memory that your job requires. It is usually better to just specify the pegasus.memory profile. This is not mapped for SGE.
pegasus.glite.arguments EXTRA_ARGUMENTS prefixed by "#SBATCH" prefixed by "#PBS" prefixed by "#?" prefixed by "#MSUB" n/a This specifies the extra arguments that must appear in the generated submit script for a job. The value of this profile is added to the submit script prefixed by the batch system-specific value. These requirements override any requirements specified using other profiles. This is useful when you want to pass through special options to the underlying batch system. For example, on the USC cluster we use resource properties to specify the network type. If you want to use the Myrinet network, you must specify something like "-l nodes=8:ppn=2:myri". For infiniband, you would use something like "-l nodes=8:ppn=2:IB". In that case, both the nodes and ppn profiles would be effectively ignored.

7.8.2. Specifying a remote directory for the job

gLite/blahp does not follow the remote_initialdir or initialdir classad directives. Therefore, all the jobs that have the glite style applied don't have a remote directory specified in the submit script. Instead, Pegasus uses Kickstart to change to the working directory when the job is launched on the remote system.