6.3. Dashboard

As the number of jobs and tasks in workflows increase, the ability to track the progress and quickly debug a workflow becomes more and more important. The dashboard provides users with a tool to monitor and debug workflows both in real-time as well as after execution is already completed, through a browser.

6.3.1. Workflow Dashboard

Pegasus Workflow Dashboard is bundled with Pegasus. The pegasus-service is developed in Python and uses the Flask framework to implement the web interface.The users can then connect to this server using a browser to monitor/debug workflows.

Note

the workflow dashboard can only monitor workflows which have been executed using Pegasus 4.2.0 and above.

To start the Pegasus Dashboard execute the following command

$ pegasus-service --host 127.0.0.1 --port 5000

SSL is not configured: Using self-signed certificate
2015-04-13 16:14:23,074:Pegasus.service.server:79: WARNING: SSL is not configured: Using self-signed certificate
Service not running as root: Will not be able to switch users
2015-04-13 16:14:23,074:Pegasus.service.server:86: WARNING: Service not running as root: Will not be able to switch users

By default, the server is configured to listen only on localhost/127.0.0.1 on port 5000. A user can view the dashboard on https://localhost:5000/

To make the Pegasus Dashboard listen on all network interfaces OR on a different port, users can pass different values to the --host and/or --port options.

By default, the dashboard server can only monitor workflows run by the current user i.e. the user who is running the pegasus-service.

The Dashboard's home page lists all workflows, which have been run by the current-user. The home page shows the status of each of the workflow i.e. Running/Successful/Failed/Failing. The home page lists only the top level workflows (Pegasus supports hierarchical workflows i.e. workflows within a workflow). The rows in the table are color coded

  • Green: indicates workflow finished successfully.

  • Red: indicates workflow finished with a failure.

  • Blue: indicates a workflow is currently running.

  • Gray: indicates a workflow that was archived.

Figure 6.8. Dashboard Home Page

Dashboard Home Page

To view details specific to a workflow, the user can click on corresponding workflow label. The workflow details page lists workflow specific information like workflow label, workflow status, location of the submit directory, files, and metadata associated with the workflow etc. The details page also displays pie charts showing the distribution of jobs based on status.

In addition, the details page displays a tab listing all sub-workflows and their statuses. Additional tabs exist which list information for all running, failed, successful, and failing jobs.

Note

Failing jobs are currently running jobs (visible in Running tab), which have failed in previous attempts to execute them.

The information displayed for a job depends on it's status. For example, the failed jobs tab displays the job name, exit code, links to available standard output, and standard error contents.

Figure 6.9. Dashboard Workflow Page

Dashboard Workflow Page

Figure 6.10. Dashboard Workflow Metadata

Dashboard Workflow Metadata


Figure 6.11. Dashboard Workflow Files

Dashboard Workflow Files


To view details specific to a job the user can click on the corresponding job's job label. The job details page lists information relevant to a specific job. For example, the page lists information like job name, exit code, run time, etc.

The job instance section of the job details page lists all attempts made to run the job i.e. if a job failed in its first attempt due to transient errors, but ran successfully when retried, the job instance section shows two entries; one for each attempt to run the job.

The job details page also shows tab's for failed, and successful task invocations (Pegasus allows users to group multiple smaller task's into a single job i.e. a job may consist of one or more tasks)

Figure 6.12. Dashboard Job Description Page

Dashboard Job Description Page

The task invocation details page provides task specific information like task name, exit code, duration, metadata associated with the task, etc. Task details differ from job details, as they are more granular in nature.

Figure 6.13. Dashboard Invocation Page

Dashboard Invocation Page

The dashboard also has web pages for workflow statistics and workflow charts, which graphically renders information provided by the pegasus-statistics and pegasus-plots command respectively.

The Statistics page shows the following statistics.

  1. Workflow level statistics

  2. Job breakdown statistics

  3. Job specific statistics

  4. Integrity statistics

Figure 6.14. Dashboard Statistics Page

Dashboard Statistics Page

The Charts page shows the following charts.

  1. Job Distribution by Count/Time

  2. Time Chart by Job/Invocation

  3. Workflow Execution Gantt Chart

The chart below shows the invocation distribution by count or time.

Figure 6.15. Dashboard Plots - Job Distribution

Dashboard Plots - Job Distribution

The time chart shown below shows the number of jobs/invocations in the workflow and their total runtime

Figure 6.16. Dashboard Plots - Time Chart

Dashboard Plots - Time Chart

The workflow gantt chart lays out the execution of the jobs in the workflow over time.

Figure 6.17. Dashboard Plots - Workflow Gantt Chart

Dashboard Plots - Workflow Gantt Chart