Pegasus Research Impact


Today’s science is built on the shoulders of research software. The Pegasus workload management system (est. 2001) has significantly contributed to the scientific progress in several domains including astronomy, Earth science, climate science, bioinformatics, neuroinformatics, among others. In order to quantify how Pegasus has impacted the scientific community, we constantly perform data analysis on publications and citations data provided by online services such as Google Scholar.

The goal of this page is to report the state-of-the-art research impact provided by the Pegasus software.

Outside References to Pegasus

YearTitleUsed Pegasus Component
2020GeoEDF: An Extensible Geospatial Data Framework for FAIR ScienceWMS
2019Improving the sensitivity of Advanced LIGO using noise subtractionWMS
2018Distributed Data and Job Management for the XENON1T ExperimentWMS
2018Evaluating Workflow Management Systems: A Bioinformatics Use CasePMC
2018Workflow Scheduling Using Hybrid GA-PSO Algorithm in Cloud ComputingWorkflowSim / WorkflowGenerator
2018SciTokens: Capability-Based Secure Access to Remote Scientific DataWMS
2018SimPrily: A Python framework to simplify high-throughput genomic simulationsWMS
2018Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning ApproachWMS
2018Hybrid scheduling algorithm in early warning systemsWorkflowGenerator
2018Budget-Aware Scheduling Algorithms for Scientific Workflows with Stochastic Task Weights on Heterogeneous IaaS Cloud PlatformsWorkflowGenerator
2018Checkpointing Workflows for Fail-Stop ErrorsWorkflowGenerator
2017BOSS-LDG: A Novel Computational Framework That Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave DiscoveryWMS
2017Data Access for LIGO on the OSGWMS
2017Fault-tolerant elastic scheduling algorithm for workflow in Cloud systemsWorkflowSim / WorkflowGenerator
2017Dynamic Voltage Frequency Scaling Simulator for Real Workflows Energy-Aware Management in Green Cloud ComputingWorkflowSim / WorkflowGenerator
2017Using imbalance characteristic for fault-tolerant workflow scheduling in cloud systemsWorkflowGenerator
2017Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case StudyWMS
2017The application of nondominated sorting genetic algorithm (NSGA-III) for scientific-workflow scheduling on cloudWorkflowGenerator
2016Global adjoint tomography: first-generation modelWMS
2016Cost effective, reliable and secure workflow deployment over federated cloudsWorkflowGenerator
2016Endocrine-based coevolutionary multi-swarm for multi-objectiveworkflow scheduling in a cloud systemWorkflowSim / WorkflowGenerator
2016An immune system-inspired rescheduling algorithm for workflow in Cloud systemsWorkflowSim / WorkflowGenerator
2016Adaptive Multi-level Workflow Scheduling with Uncertain Task EstimatesWorkflowGenerator
2015Scheduling Framework for Regular Scientific Workflows in CloudWMS
2007A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data AnalysisWMS

Automated Pegasus Research Impact Analysis

  The analysis shown below uses data acquired on June 2016.

Data Acquisition

This analysis is based on citations for the two main Pegasus publications (see below), and references to the Pegasus’ website extracted from Google Scholar.

Number of Citations

Pegasus has been cited in 1121 research articles, where 175 of these citations are done by one of the authors from the two Pegasus’ articles (named self-reference), and 946 are external references—cited by other scientists/researchers. Although several research studies and application support has been internally developed within the Pegasus team, the ratio of self-reference citations is limited to about 15% of the total number of citations. This result demonstrates the overall impact of the Pegasus software within the research community.

We break down the total number of citations per year to observe the evolution of citations since the first publication of the Pegasus software (in 2005). The number of self-references per year is nearly constant with an average value of 14.6 citations (standard deviation 4.4), while there is an increase on the number of external references per year. The significant increase on the number of citations in 2015 (when the second Pegasus software paper was published), demonstrates the importance (and impact) of up-to-date research articles to the research community. In 2016, the number of references is still low because the data was collected on June (only 6 months of citations).

Distribution of Citation Types

The distribution of citation types is based on the analysis of BibTex entry types. We measure the distribution of citation types (entry types) per year for self-referenced and external references. Within the SciTech research group (which develops Pegasus), there is a balance between the number of citations from publications published in conferences (In Proceedings or In Collection) and in journal articles (46.6% in average, standard deviation 12.2%), with a tendency to the prevalence of journal articles in the past few years. The list of publications are available on the SciTech website.

External references present a clear tendency for the prevalence of journal articles along the years (average increase rate 1.17, standard deviation 0.23). This result may indicate that research developed in the early stages (which are often first published in conferences), continue to evolve and become more mature, leading to journal article publications. Since the number of references increase for each year, this result also allows to infer that new research continues to be developed using the Pegasus software. The important number of PhD thesis referencing the Pegasus highlights the importance of the software to the research community (average 6.9%, standard deviation 1.8%).

Pegasus software-h-index

The h-index quantifies the research output of an individual. The metric software-h-index indicates then that a software has index h if its citations (research articles citing the software) have at least h citations each. This metric measures the second-tier degree impact of the software in other researches. The current software-h-index for Pegasus is 66.

Distribution of Authors Citing Pegasus

Pegasus has been cited by 2592 different authors. Then, we gathered the authors information from Google Scholar to determine the impact of the Pegasus software worldwide. Although Google Scholar is a very popular tool for tracking individual citations, only a fraction of the authors have a profile (837 authors out of 2592). Using the registered author email, we can determine the location of the authors’ institutions. The map below shows the distribution of authors citing the Pegasus software since 2005. Most of the authors are from the USA (353 authors), followed by the United Kingdom (70 authors), China (43 authors), Australia (26 authors), and France (24 authors). Please see the enlarged map for detailed information.

View Larger Map