Chapter 14. Submit Directory Details

This chapter describes the submit directory content after Pegasus has planned a workflow. Pegasus takes in an abstract workflow ( DAX ) and generates an executable workflow (DAG) in the submit directory.

This document also describes the various Replica Selection Strategies in Pegasus.

14.1. Layout

Each executable workflow is associated with a submit directory, and includes the following:

  1. <daxlabel-daxindex>.dag

    This is the Condor DAGMman dag file corresponding to the executable workflow generated by Pegasus. The dag file describes the edges in the DAG and information about the jobs in the DAG. Pegasus generated .dag file usually contains the following information for each job

    1. The job submit file for each job in the DAG.

    2. The post script that is to be invoked when a job completes. This is usually located at $PEGASUS_HOME/bin/exitpost and parses the kickstart record in the job's.out file and determines the exitcode.

    3. JOB RETRY - the number of times the job is to be retried in case of failure. In Pegasus, the job postscript exits with a non zero exitcode if it determines a failure occurred.

  2. <daxlabel-daxindex>.dag.dagman.out

    When a DAG ( .dag file ) is executed by Condor DAGMan , the DAGMan writes out it's output to the <daxlabel-daxindex>.dag.dagman.out file . This file tells us the progress of the workflow, and can be used to determine the status of the workflow. Most of pegasus tools mine the dagman.out or jobstate.log to determine the progress of the workflows.

  3. <daxlabel-daxindex>.static.bp

    This file contains netlogger events that link jobs in the DAG with the jobs in the DAX. This file is parsed by pegasus-monitord when a workflow starts and populated to the stampede backend.

  4. <daxlabel-daxindex>.notify

    This file contains all the notifications that need to be set for the workflow and the jobs in the executable workflow. The format of notify file is described here

  5. <daxlabel-daxindex>.replica.store

    This is a file based replica catalog, that only lists file locations are mentioned in the DAX.

  6. <daxlabel-daxindex>.dot

    Pegasus creates a dot file for the executable workflow in addition to the .dag file. This can be used to visualize the executable workflow using the dot program.

  7. <job>.sub

    Each job in the executable workflow is associated with it's own submit file. The submit file tells Condor how to execute the job.

  8. <job>.out.00n

    The stdout of the executable referred in the job submit file. In Pegasus, most jobs are launched via kickstart. Hence, this file contains the kickstart XML provenance record that captures runtime provenance on the remote node where the job was executed. n varies from 1-N where N is the JOB RETRY value in the .dag file. The exitpost executable is invoked on the <job>.out file and it moves the <job>.out to <job>.out.00n so that the the job's .out files are preserved across retries.

  9. <job>.err.00n

    The stderr of the executable referred in the job submit file. In case of Pegasus, mostly the jobs are launched via kickstart. Hence, this file contains stderr of kickstart. This is usually empty unless there in an error in kickstart e.g. kickstart segfaults , or kickstart location specified in the submit file is incorrect. The exitpost executable is invoked on the <job>.out file and it moves the <job>.err to <job>.err.00n so that the the job's .out files are preserved across retries.

  10. jobstate.log

    The jobstate.log file is written out by the pegasus-monitord daemon that is launched when a workflow is submitted for execution by pegasus-run. The pegasus-monitord daemon parses the dagman.out file and writes out the jobstate.log that is easier to parse. The jobstate.log captures the various states through which a job goes during the workflow. There are other monitoring related files that are explained in the monitoring chapter.

  11. braindump.txt

    Contains information about pegasus version, dax file, dag file, dax label.