12.4. Ensemble Manager

The ensemble manager is a service that manages collections of workflows called ensembles. The ensemble manager is useful when you have a set of workflows you need to run over a long period of time. It can throttle the number of concurrent planning and running workflows, and plan and run workflows in priority order. A typical use-case is a user with 100 workflows to run, who needs no more than one to be planned at a time, and needs no more than two to be running concurrently.

The ensemble manager also allows workflows to be submitted and monitored programmatically through its RESTful interface, which makes it an ideal platform for integrating workflows into larger applications such as science gateways and portals.

To start the ensemble manager server, run:

$ pegasus-em server

Once the ensemble manager is running, you can create an ensemble with:

$ pegasus-em create myruns

where "myruns" is the name of the ensemble.

Then you can submit a workflow to the ensemble by running:

$ pegasus-em submit myruns.run1 ./plan.sh run1.dax

Where the name of the ensemble is "myruns", the name of the workflow is "run1", and "./plan.sh run1.dax" is the command for planning the workflow from the current working directory. The planning command should either be a direct invocation of pegasus-plan, or a shell script that calls pegasus-plan. If a shell script is used, then it should not redirect the output of pegasus-plan, because the ensemble manager reads the output to determine whether pegasus-plan succeeded and what is the submit directory of the workflow.

To check the status of your ensembles run:

$ pegasus-em ensembles

To check the status of your workflows run:

$ pegasus-em workflows myruns

To check the status of a specific workflow, run:

$ pegasus-em status myruns.run1

To help with debugging, the ensemble manager has an analyze command that emits diagnostic information about a workflow, including the output of pegasus-analyzer, if possible. To analyze a workflow, run:

$ pegasus-em analyze myruns.run1

Ensembles can be paused to prevent workflows from being planned and executed. Workflows in a paused ensemble will continue to run, but no new workflows will be planned or executed. To pause an ensemble, run:

$ pegasus-em pause myruns

Paused ensembles can be reactivated by running:

$ pegasus-em activate myruns

A workflow might fail during planning. In that case, run the analyze command to examine the planner output, make the necessary corrections to the workflow configuration, and replan the workflow by running:

$ pegasus-em replan myruns.run1

A workflow might also fail during execution. In that case, run the analyze command to identify the issue, correct the problem, and rerun the workflow by running:

$ pegasus-em rerun myruns.run1

Workflows in an ensemble can have different priorities. These priorities are used to determine the order in which workflows in the ensemble will be planned and executed. Priorities are specified using the '-p' option of the submit command. They can also be modified after a workflow has been submitted by running:

$ pegasus-em priority myruns.run1 -p 10

where 10 is the desired priority. Higher values have higher priority, the default is 0, and negative values are allowed.

Each ensemble has a pair of throttles that limit the number of workflows that are concurrently planning and executing. These throttles are called max_planning and max_running. Max planning limits the number of workflows in the ensemble that can be planned concurrently. Max running limits the number of workflows in the ensemble that can be running concurrently. These throttles are useful to limit the impact of planning on the memory usage of the submit host, and the load on the submit host and remote site caused by concurrently running workflows. The throttles can be specified with the '-R' and '-P' options of the create command. They can also be updated using the config command:

$ pegasus-em config myruns.run1 -P 1 -R 5