The four-year project, Scientific Workflow Integrity with Pegasus, is funded by a $1 million grant from the National Science Foundation (NSF) as part of its Cybersecurity Innovation for Cyberinfrastructure (CICI) program. Von Welch, director of IU’s Center for Applied Cybersecurity Research (CACR), is the project’s principal investigator.
The Pegasus Workflow Management System is popular among the research community for its ability to easily structure and execute large-scale data analyses. The application benefits a wide range of scientific applications including LIGO (the Laser Interferometer Gravitational-Wave Observatory), which announced the first direct detection of gravitational waves earlier this year—proving that Einstein’s theory was right.
IU will receive nearly half of the grant, $479,855, to increase cybersecurity within Pegasus’s computational science and give scientists added ease of mind by providing the means to validate their data. The remaining half has been awarded to the project’s collaborators—the Renaissance Computing Institute (RENCI) at the University of North Carolina ($230k) and the Information Sciences Institute (ISI) at the University of Southern California ($290k).
By digitally signing the data that is run through Pegasus, these improvements will strengthen consistency in results from multiple workflows. They’ll also allow users to see whether their data has changed since the last time a workflow was completed.
“Scientific data is a key part of scientific workflows and, ultimately, the science project,” said Welch. “By integrating support for data integrity into the popular workflow management tool Pegasus, we increase our trust in computational science in a manner that will be easy for scientists to use.”
Welch and Steven Myers, associate professor at IU’s School of Informatics and Computing, will lead the project team, which includes experts in cybersecurity and virtualization, alongside the Pegasus development team.
One of the challenges of the new project will be to make sure that the cryptography used for ensuring data integrity, such as the digital signatures, will scale appropriately to handle the increasingly large scientific datasets. Myers, an expert in cryptography, will guide the selection, implementation and deployment of the cryptographic systems, making sure they are efficient, and likely to maintain their security over the lengthy time periods scientific data is referenced and used.
“Cryptography can provide strong assurances of data integrity and records of its origin and modifications over the long periods of time that much scientific data is used and must be maintained,” said Myers. “Given the experimental costs of some of this data, having strong assurances is critical, as some groups have definite motive to modify the data, and the experiments are incredibly costly to reproduce if the data’s integrity is questioned.”
Scientists from a variety of disciplines, including astronomy, bioinformatics, earthquake science, gravitational wave physics, ocean science and neuroscience, have used Pegasus to run over 700,000 workflows over the last three years. However, Welch’s team aims to achieve solutions that will be generic enough to apply to other workflow systems and applications and help an even broader scope of researchers.
“I am very excited to work with the IU and RENCI teams to include new and critical data integrity solutions into Pegasus,” said Ewa Deelman, research professor and director at ISI. “The results of this work will benefit a number of science disciplines and will help scientists to have a higher degree of trust in their results and the results shared by their colleagues.”