7.12. XSEDE

The Extreme Science and Engineering Discovery Environment (XSEDE) provides a set of High Performance Computing (HPC) and High Throughput Computing (HTC) resources.

For the HPC resources, it is recommended to run using Globus GRAM or glideins. Most of these resources have fast parallel file systesm, so running with sharedfs data staging is recommended. Below is example site catalog and pegasusrc to run on SDSC Trestles:

<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd"
             version="4.0">
      
    <site  handle="local" arch="x86_64" os="LINUX">
        <directory type="shared-scratch" path="/tmp/wf/work">
            <file-server operation="all" url="file:///tmp/wf/work"/>
        </directory>
        <directory type="local-storage" path="/tmp/wf/storage">
            <file-server operation="all" url="file:///tmp/wf/storage"/>
        </directory>
    </site>

    <site handle="Trestles" arch="x86_64" os="LINUX">
       <grid type="gt5" contact="trestles.sdsc.edu:2119/jobmanager-fork" scheduler="PBS" jobtype="auxillary"/>
       <grid type="gt5" contact="trestles.sdsc.edu:2119/jobmanager-pbs" scheduler="PBS" jobtype="compute"/>
       <directory type="shared-scratch" path="/phase1/USERNAME">
           <file-server operation="all" url="gsiftp://trestles-dm1.sdsc.edu/phase1/USERNAME"/>
       </directory>
    </site>

</sitecatalog>

pegasusrc:

pegasus.catalog.replica=SimpleFile
pegasus.catalog.replica.file=rc

pegasus.catalog.site.file=sites.xml

pegasus.catalog.transformation=Text
pegasus.catalog.transformation.file=tc

pegasus.data.configuration = sharedfs

# Pegasus might not be installed, or be of a different version
# so stage the worker package
pegasus.transfer.worker.package = true

The HTC resources available on XSEDE are all HTCondor based, so standard HTCondor Pool setup will work fine.

If you need to run high throughput workloads on the HPC machines (for example, post processing after a large parallel job), glideins can be useful as it is a more efficient method for small jobs on these systems.