The jobstate.log file logs the various states that a job goes through during workflow execution. It is created by the pegasus-monitord daemon that is launched when a workflow is submitted to Condor DAGMan by pegasus-run. pegasus-monitord parses the dagman.out file and writes out the jobstate.log file, the format of which is more amenable to parsing.
The jobstate.log file is not created if a user uses condor_submit_dag to submit a workflow to Condor DAGMan.
The jobstate.log file can be created after a workflow has finished executing by running pegasus-monitord on the .dagman.out file in the workflow submit directory.
Below is a snippet from the jobstate.log for a single job executed via condorg:
1239666049 create_dir_blackdiamond_0_isi_viz SUBMIT 3758.0 isi_viz - 1 1239666059 create_dir_blackdiamond_0_isi_viz EXECUTE 3758.0 isi_viz - 1 1239666059 create_dir_blackdiamond_0_isi_viz GLOBUS_SUBMIT 3758.0 isi_viz - 1 1239666059 create_dir_blackdiamond_0_isi_viz GRID_SUBMIT 3758.0 isi_viz - 1 1239666064 create_dir_blackdiamond_0_isi_viz JOB_TERMINATED 3758.0 isi_viz - 1 1239666064 create_dir_blackdiamond_0_isi_viz JOB_SUCCESS 0 isi_viz - 1 1239666064 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_STARTED - isi_viz - 1 1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_TERMINATED 3758.0 isi_viz - 1 1239666069 create_dir_blackdiamond_0_isi_viz POST_SCRIPT_SUCCESS - isi_viz - 1
Each entry in jobstate.log has the following:
The ISO timestamp for the time at which the particular event happened.
The name of the job.
The event recorded by DAGMan for the job.
The condor id of the job in the queue on the submit node.
The pegasus site to which the job is mapped.
The job time requirements from the submit file.
The job submit sequence for this workflow.
Table 14.1. The job lifecycle when executed as part of the workflow
|SUBMIT||job is submitted by condor schedd for execution.|
|EXECUTE||condor schedd detects that a job has started execution.|
|GLOBUS_SUBMIT||the job has been submitted to the remote resource. It's only written for GRAM jobs (i.e. gt2 and gt4).|
|GRID_SUBMIT||same as GLOBUS_SUBMIT event. The ULOG_GRID_SUBMIT event is written for all grid universe jobs./|
|JOB_TERMINATED||job terminated on the remote node.|
|JOB_SUCCESS||job succeeded on the remote host, condor id will be zero (successful exit code).|
|JOB_FAILURE||job failed on the remote host, condor id will be the job's exit code.|
|POST_SCRIPT_STARTED||post script started by DAGMan on the submit host, usually to parse the kickstart output|
|POST_SCRIPT_TERMINATED||post script finished on the submit node.|
|POST_SCRIPT_SUCCESS | POST_SCRIPT_FAILURE||post script succeeded or failed.|
There are other monitoring related files that are explained in the monitoring chapter.