The Pegasus team is pleased to announce that it has received a new grant from the National Science Foundation to support new development and maintenance of the Pegasus Workflow Management System. It will support Pegasus for the next 5 years and help address the needs of our diverse user community.
Since 2001, the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. Pegasus allows scientists to easily monitor and debug their scientific workflows, providing a suite of command line tools and a web-based workflow dashboard. These capabilities allow scientists to do production-grade science at scale using Pegasus. The power of these abstractions was demonstrated in 2015 when Pegasus was used by an international collaboration to harness a diverse set of resources and to manage compute and data- intensive workflows that confirmed the existence of gravitational waves, as predicted by Einstein’s theory of relativity.
Scientists are using Pegasus to model new materials, model the effects of seismic activity infer human demographic history, develop a better soybean among others.
Experience from working with these diverse scientific domains has helped us uncover opportunities for further automation of scientific workflows. The new effort will addresses these opportunities through innovation in the following areas: automation methods to include resource provisioning ahead of and during workflow execution, data-aware job scheduling algorithms, and data sharing mechanisms in high-throughput environments. Near-term capabilities to be released in the 4.8 software release include:
- Integration with Jupyter Notebook;
- Support for application container technologies: both Docker and Singularity.
To support a broader group of “long-tail” scientists, the new grant provides funding towards usability improvements as well as outreach, education, and training activities.
The proposed enhancements will be integrated into Pegasus, and distributed to the user community as part of regular Pegasus software releases. This will facilitate adoption and evaluation of these capabilities in the context of real-life applications and computing environments. The data-aware focus targets new classes of applications executing in high-throughput and high-performance environments.
The Pegasus team very much looks forward to our continued collaboration with domain and computer scientists and we also hope to work with new users and communities. Please contact us at email@example.com if you would like to discuss your workflow needs and ideas.