As the number of jobs and tasks in workflows increase, the ability to track the progress and quickly debug a workflow becomes more and more important. The dashboard provides users with a tool to monitor and debug workflows both in real-time as well as after execution is already completed, through a browser.
Pegasus Workflow Dashboard is bundled with Pegasus. The pegasus-service is developed in Python and uses the Flask framework to implement the web interface.The users can then connect to this server using a browser to monitor/debug workflows.
the workflow dashboard can only monitor workflows which have been executed using Pegasus 4.2.0 and above.
To start the Pegasus Dashboard execute the following command
$ pegasus-service --host 127.0.0.1 --port 5000 SSL is not configured: Using self-signed certificate 2015-04-13 16:14:23,074:Pegasus.service.server:79: WARNING: SSL is not configured: Using self-signed certificate Service not running as root: Will not be able to switch users 2015-04-13 16:14:23,074:Pegasus.service.server:86: WARNING: Service not running as root: Will not be able to switch users
By default, the server is configured to listen only on localhost/127.0.0.1 on port 5000. A user can view the dashboard on https://localhost:5000/
To make the Pegasus Dashboard listen on all network interfaces OR on a different port, users can pass different values to the --host and/or --port options.
By default, the dashboard server can only monitor workflows run by the current user i.e. the user who is running the pegasus-service.
The Dashboard's home page lists all workflows, which have been run by the current-user. The home page shows the status of each of the workflow i.e. Running/Successful/Failed/Failing. The home page lists only the top level workflows (Pegasus supports hierarchical workflows i.e. workflows within a workflow). The rows in the table are color coded
Green: indicates workflow finished successfully.
Red: indicates workflow finished with a failure.
Blue: indicates a workflow is currently running.
Gray: indicates a workflow that was archived.
To view details specific to a workflow, the user can click on corresponding workflow label. The workflow details page lists workflow specific information like workflow label, workflow status, location of the submit directory, files, and metadata associated with the workflow etc. The details page also displays pie charts showing the distribution of jobs based on status.
In addition, the details page displays a tab listing all sub-workflows and their statuses. Additional tabs exist which list information for all running, failed, successful, and failing jobs.
Failing jobs are currently running jobs (visible in Running tab), which have failed in previous attempts to execute them.
The information displayed for a job depends on it's status. For example, the failed jobs tab displays the job name, exit code, links to available standard output, and standard error contents.
To view details specific to a job the user can click on the corresponding job's job label. The job details page lists information relevant to a specific job. For example, the page lists information like job name, exit code, run time, etc.
The job instance section of the job details page lists all attempts made to run the job i.e. if a job failed in its first attempt due to transient errors, but ran successfully when retried, the job instance section shows two entries; one for each attempt to run the job.
The job details page also shows tab's for failed, and successful task invocations (Pegasus allows users to group multiple smaller task's into a single job i.e. a job may consist of one or more tasks)
The task invocation details page provides task specific information like task name, exit code, duration, metadata associated with the task, etc. Task details differ from job details, as they are more granular in nature.
The dashboard also has web pages for workflow statistics and workflow charts, which graphically renders information provided by the pegasus-statistics and pegasus-plots command respectively.
The Statistics page shows the following statistics.
Workflow level statistics
Job breakdown statistics
Job specific statistics
The Charts page shows the following charts.
Job Distribution by Count/Time
Time Chart by Job/Invocation
Workflow Execution Gantt Chart
The chart below shows the invocation distribution by count or time.
The time chart shown below shows the number of jobs/invocations in the workflow and their total runtime
The workflow gantt chart lays out the execution of the jobs in the workflow over time.