Pegasus Research Impact

Today’s science is built on the shoulders of research software. The Pegasus workload management system (est. 2001) has significantly contributed to the scientific progress in several domains including astronomy, Earth science, climate science, bioinformatics, neuroinformatics, among others. In order to quantify how Pegasus has impacted the scientific community, we constantly perform data analysis on publications and citations data provided by online services such as Google Scholar.

The goal of this page is to report the state-of-the-art research impact provided by the Pegasus software. To this end, we use the open-source CitationXpert software to apply data science methods to extract metric values that quantify the research impact of the Pegasus software.

  The analysis shown in this page uses data acquired on June 2016.



Data Acquisition

This analysis is based on citations for the two main Pegasus publications (see below), and references to the Pegasus’ website extracted from Google Scholar.

  • [PDF] [DOI] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus: a Workflow Management System for Science Automation,” Future Generation Computer Systems, vol. 46, pp. 17-35, 2015.
    author={Ewa Deelman and Karan Vahi and Gideon Juve and Mats Rynge and Scott Callaghan and Philip J Maechling and Rajiv Mayani and Weiwei Chen and Ferreira da Silva, Rafael and Miron Livny and Kent Wenger},
    title={Pegasus: a Workflow Management System for Science Automation},
    journal={Future Generation Computer Systems},
    note={Funding Acknowledgements: NSF ACI SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575},
  • [PDF] E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, B. G. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, “Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems,” Scientific Programming Journal, vol. 13, iss. 3, pp. 219-237, 2005.
    author={Ewa Deelman and Gurmeet Singh and Mei-Hui Su and James Blythe and Yolanda Gil and Carl Kesselman and Gaurang Mehta and Karan Vahi and G. Bruce Berriman and John Good and Anastasia Laity and Joseph C. Jacob and Daniel S. Katz},
    title={Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems},
    journal={Scientific Programming Journal},


Number of Citations

Pegasus has been cited in 1121 research articles, where 175 of these citations are done by one of the authors from the two Pegasus’ articles (named self-reference), and 946 are external references—cited by other scientists/researchers. Although several research studies and application support has been internally developed within the Pegasus team, the ratio of self-reference citations is limited to about 15% of the total number of citations. This result demonstrates the overall impact of the Pegasus software within the research community.

We break down the total number of citations per year to observe the evolution of citations since the first publication of the Pegasus software (in 2005). The number of self-references per year is nearly constant with an average value of 14.6 citations (standard deviation 4.4), while there is an increase on the number of external references per year. The significant increase on the number of citations in 2015 (when the second Pegasus software paper was published), demonstrates the importance (and impact) of up-to-date research articles to the research community. In 2016, the number of references is still low because the data was collected on June (only 6 months of citations).


Distribution of Citation Types

The distribution of citation types is based on the analysis of BibTex entry types. We measure the distribution of citation types (entry types) per year for self-referenced and external references. Within the SciTech research group (which develops Pegasus), there is a balance between the number of citations from publications published in conferences (In Proceedings or In Collection) and in journal articles (46.6% in average, standard deviation 12.2%), with a tendency to the prevalence of journal articles in the past few years. The list of publications are available on the SciTech website.

External references present a clear tendency for the prevalence of journal articles along the years (average increase rate 1.17, standard deviation 0.23). This result may indicate that research developed in the early stages (which are often first published in conferences), continue to evolve and become more mature, leading to journal article publications. Since the number of references increase for each year, this result also allows to infer that new research continues to be developed using the Pegasus software. The important number of PhD thesis referencing the Pegasus highlights the importance of the software to the research community (average 6.9%, standard deviation 1.8%).


Pegasus software-h-index

The h-index quantifies the research output of an individual. The metric software-h-index indicates then that a software has index h if its citations (research articles citing the software) have at least h citations each. This metric measures the second-tier degree impact of the software in other researches. The current software-h-index for Pegasus is 66.


Distribution of Authors Citing Pegasus

Pegasus has been cited by 2592 different authors. Then, we gathered the authors information from Google Scholar to determine the impact of the Pegasus software worldwide. Although Google Scholar is a very popular tool for tracking individual citations, only a fraction of the authors have a profile (837 authors out of 2592). Using the registered author email, we can determine the location of the authors’ institutions. The map below shows the distribution of authors citing the Pegasus software since 2005. Most of the authors are from the USA (353 authors), followed by the United Kingdom (70 authors), China (43 authors), Australia (26 authors), and France (24 authors). Please see the enlarged map for detailed information.

View Larger Map