14.4. Jobstate.Log File

The jobstate.log file logs the various states that a job goes through during workflow execution. It is created by the pegasus-monitord daemon that is launched when a workflow is submitted to Condor DAGMan by pegasus-run. pegasus-monitord parses the dagman.out file and writes out the jobstate.log file, the format of which is more amenable to parsing.

Note

The jobstate.log file is not created if a user uses condor_submit_dag to submit a workflow to Condor DAGMan.

The jobstate.log file can be created after a workflow has finished executing by running pegasus-monitord on the .dagman.out file in the workflow submit directory.

Below is a snippet from the jobstate.log for a single job executed via condorg:

1239666049 create_dir_blackdiamond_0_isi_viz SUBMIT 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz EXECUTE 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz GLOBUS_SUBMIT 3758.0 isi_viz - 1
1239666059 create_dir_blackdiamond_0_isi_viz GRID_SUBMIT 3758.0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz JOB_TERMINATED 3758.0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz JOB_SUCCESS 0 isi_viz - 1
1239666064 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_STARTED - isi_viz - 1
1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_TERMINATED 3758.0 isi_viz - 1
1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_SUCCESS - isi_viz - 1

Each entry in jobstate.log has the following:

  1. The ISO timestamp for the time at which the particular event happened.

  2. The name of the job.

  3. The event recorded by DAGMan for the job.

  4. The condor id of the job in the queue on the submit node.

  5. The pegasus site to which the job is mapped.

  6. The job time requirements from the submit file.

  7. The job submit sequence for this workflow.

Table 14.1. The job lifecycle when executed as part of the workflow

STATE/EVENT DESCRIPTION
SUBMIT job is submitted by condor schedd for execution.
EXECUTE condor schedd detects that a job has started execution.
GLOBUS_SUBMIT the job has been submitted to the remote resource. It's only written for GRAM jobs (i.e. gt2 and gt4).
GRID_SUBMIT same as GLOBUS_SUBMIT event. The ULOG_GRID_SUBMIT event is written for all grid universe jobs./
JOB_TERMINATED job terminated on the remote node.
JOB_SUCCESS job succeeded on the remote host, condor id will be zero (successful exit code).
JOB_FAILURE job failed on the remote host, condor id will be the job's exit code.
POST_SCRIPT_STARTED post script started by DAGMan on the submit host, usually to parse the kickstart output
POST_SCRIPT_TERMINATED post script finished on the submit node.
POST_SCRIPT_SUCCESS | POST_SCRIPT_FAILURE post script succeeded or failed.

There are other monitoring related files that are explained in the monitoring chapter.

14.4.1. Pegasus Workflow Job States and Delays

The various job states that a job goes through ( as caputured in the dagman.out and jobstate.log file) during it's lifecycle are illustrated below. The figure below highlights the various local and remote delays during job lifecycle.