Chapter 11. Optimizing Workflows for Efficiency and Scalability

By default, Pegasus generates workflows which targets the most common usecases and execution environments. For more specialized environments or workflows, the following sections can provide hints on how to optimize your workflow to scale better, and run more efficient. Below are some common issues and solutions.

11.1. Optimizing Short Jobs / Scheduling Delays

Issue: Even though HTCondor is a high throughput system, there are overheads when scheduling short jobs. Common overheads include scheduling, data transfers, state notifications, and task book keeping. These overheads can be very noticeable for short jobs, but not noticeable at all for longer jobs as the ration between the computation and the overhead is higher.

Solution: If you have many short tasks to run, the solution to minimize the overheads is to use task clustering. This instructs Pegasus to take a set of tasks, selected horizontally, by labels, or by runtime, and create jobs containing that whole set of tasks. The result is more efficient jobs, for wich the overheads are less noticeable.