biodrugscreenThe Structural Protein-Ligand Interactome (SPLINTER) project predicts the interaction of thousands of small molecules with thousands of proteins. These interactions are predicted using the three-dimensional structure of the bound complex between each pair of protein and compound that is predicted by molecular docking. These docking runs consist of millions of individual short jobs each lasting only minutes, and are managed by Pegasus.

An example of the scale of these computations, is an initial run which was submitted to the Open Science Grid during January and February of 2013 and consisted of approximately 3900 proteins and 5000 ligands constituting over 19 million docking simulations. This run accounted for 1.42 million core hours and completed in 27 days. The daily average total wall clock time delivered was 52,593 core hours. Peak values exceeded 100,000 core hours per day.

To handle the large amount of small files, the workflow is broken up into a set of smaller workflows using Pegasus’ hierarchical workflow feature. When submitted, Pegasus generates the main workflow configuration file with information such as location of the input and output files, configuration of the HTCondor job script and environment variables. The top level workflow then generate sub workflow configuration files. Each sub-workflow manages up to 20,000 individual ligand-protein interaction simulations.  Each individual job contains a small cluster of of AutoDock Vina calls. Task clustering in this case makes the jobs long enough for grid overheads to be negligible.

Scientists: David Xu, Samy Meroueh (Indiana University School of Medicine)

Science Node article:


Rob Quick, Soichi Hayashi, Samy Meroueh, Mats Rynge, Scott Teige, Bo Wang and David Xu, Building a Chemical-Protein Interactome on the Open Science Grid, Proceedings of Science, International Symposium on Grids and Clouds (ISGC) 2015, 2015.

The workflow is available on GitHub: