2.5. Workflow Dashboard for Monitoring and Debugging

The Pegasus Dashboard is a web interface for monitoring and debugging workflows. We will use the web dashboard to monitor the status of the split workflow.

If you are doing the tutorial using the tutorial VM, then the dashboard will start when the VM boots. If you are using your own machine, then you will need to start the dashboard by running:

$ pegasus-service
    

By default, the dashboard server can only monitor workflows run by the current user i.e. the user who is running the pegasus-service.

Access the dashboard by navigating your browser to https://localhost:5000. If you are using the EC2 VM you will need to replace 'localhost' with the IP address of your EC2 instance.

When the webpage loads up, it will ask you for a username and a password. If you are using the tutorial VM, then log in as user "tutorial" with password "pegasus". If you are running the dashboard on your own machine, then use your UNIX username and password to log in.

The Dashboard's home page lists all workflows, which have been run by the current-user. The home page shows the status of each workflow i.e. Running/Successful/Failed/Failing. The home page lists only the top level workflows (Pegasus supports hierarchical workflows i.e. workflows within a workflow). The rows in the table are color coded

  • Green: indicates workflow finished successfully.

  • Red: indicates workflow finished with a failure.

  • Blue: indicates a workflow is currently running.

  • Gray: indicates a workflow that was archived.

Figure 2.8. Dashboard Home Page

Dashboard Home Page

To view details specific to a workflow, the user can click on corresponding workflow label. The workflow details page lists workflow specific information like workflow label, workflow status, location of the submit directory, etc. The details page also displays pie charts showing the distribution of jobs based on status.

In addition, the details page displays a tab listing all sub-workflows and their statuses. Additional tabs exist which list information for all running, failed, successful, and failing jobs.

The information displayed for a job depends on it's status. For example, the failed jobs tab displays the job name, exit code, links to available standard output, and standard error contents.

Figure 2.9. Dashboard Workflow Page

Dashboard Workflow Page

To view details specific to a job the user can click on the corresponding job's job label. The job details page lists information relevant to a specific job. For example, the page lists information like job name, exit code, run time, etc.

The job instance section of the job details page lists all attempts made to run the job i.e. if a job failed in its first attempt due to transient errors, but ran successfully when retried, the job instance section shows two entries; one for each attempt to run the job.

The job details page also shows tab's for failed, and successful task invocations (Pegasus allows users to group multiple smaller task's into a single job i.e. a job may consist of one or more tasks)

Figure 2.10. Dashboard Job Description Page

Dashboard Job Description Page

The task invocation details page provides task specific information like task name, exit code, duration etc. Task details differ from job details, as they are more granular in nature.

Figure 2.11. Dashboard Invocation Page

Dashboard Invocation Page

The dashboard also has web pages for workflow statistics and workflow charts, which graphically renders information provided by the pegasus-statistics and pegasus-plots command respectively.

The Statistics page shows the following statistics.

  1. Workflow level statistics

  2. Job breakdown statistics

  3. Job specific statistics

Figure 2.12. Dashboard Statistics Page

Dashboard Statistics Page