Applications

Pegasus is being used in a broad range of applications.  This pages shows some of the examples.
Also to see the details of the workflows of the applications below you can visit our workflow gallery at http://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator.

We are looking for new applications willing to leverage our workflow technologies. If you are interested please contact us at pegasus at isi dot edu

TOC:

Biology Earth Sciences Physical Sciences Education
Bioinformatics Climate Modeling Astronomy Online
Neuroscience Earthquake Science Chemistry Classroom
Botany Limnology Energy  
  Ocean Science Helioseismology  
    Physics  

Bioinformatics


Click to zoom in

DNA sequencing

The USC Epigenome Center is currently using the Illumina Genetic Analyzer (GA) system to generate high throughput DNA sequence data (up to 8 billion nucleotides per week) to map the epigenetic state of human cells on a genome-wide scale.

Epigenomic Workflow (computational jobs are shown as circles, data transfer jobs as rhomboids).

The Center has implemented an automated analysis pipeline using Pegasus-WMS to support these sequencing efforts. The workflow shown above consists of seven basic steps:
    1. transfer sequence data to the cluster storage system,
    2. split sequence files into multiple parts to be processed in parallel,
    3. convert sequence files to the appropriate file format,
    4. filter out noisy and contaminating sequences,
    5. map sequences to their genomic locations,
    6. merge output from individual mapping steps into a single global map, and
    7. use sequence maps to calculate the sequence density at each position in the genome.

The Epigenome Center is currently using this workflow to process its production DNA methylation and histone modification data. While the workflow currently implements the minimum requirements to effectively analyze the data, we are currently working to add quality control and checkpoint steps to make the pipeline more robust.

Scientists: Ben Berman and Peter Laird, USC Epigenome Center

Publications: Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Bruce Berriman, Benjamin P. Berman and Phil Maechling. Scientific Workflow Applications on Amazon EC2. Workshop on Cloud-based Services and Applications in conjunction with 5th IEEE Internation Conference on e-Science (e-Science 2009), Oxford UK, December 9-11, 2009.


SeqWare (http://sourceforge.net/projects/seqware/) is an open source software developed at UCLA. It is used to support massively parallel sequencing technologies and provides a number of different functionality including a LIMS, computational pipelines (supported by Pegasus), and a metadata component . This software was recently used to sequence the U87MG cancer cell line

Scientists: Brian O"Connor and Jordan Mendler, UCLA

Publications:

M. J. Clark, N. Homer, B. D. O"Connor, Z. Chen, A. Eskin, H. Lee, B. Merriman, and S. F. Nelson, "U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line," PLoS Genetics, vol. 6, 2010.

V. Marx. (2010, February 5). UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production, Newsletter: BioInform. Available: http://www.genomeweb.com/print/933003?page=show


Proteomics

Click to zoom in

Scientists at OSU use Pegasus for mass-spectrometry-based proteomics. Proteomics workflows have been executed on local clusters and cloud resources.

Example proteomic workflow: a) Pegasus workflow template. Square boxes with double lines represent file collections and the ellipses with double boundary represent parallel jobs. b) Implementation of workflow for clustering of five shotgun proteomic data sets. c) Hierarchical cluster analysis of the shotgun proteomic data.

Scientists: Michael Freitas, OSU


Bacterial RNA studies

SIPHT is an application in bacterial genomics which predicts sRNA (small non-coding RNAs)-encoding genes in bacteria. This project currently provides a web-based interface but needs the ability to provide better notifications of task/workflow completion.

Service available at: http://newbio.cs.wisc.edu/sRNA/

Scientists: Jonathan Livny, Broad Institute

Publications : Jonathan Livny, Hidayat Teonadi, Miron Livny, and Matthew K. Waldor. High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PloS one, 3(9), 2008.


Population Studies

Click to zoom in

As part of the Population Architecture using Genomics and Epidemiology (PAGE), we are developing a number of different workflows. Some include quality control analysis on genomic data./

The workflow below aims to indicate discrepancy in the data coming from different group and checks of concordance on the genotype calls against HapMap genotypes. Initially, it was a big R scripts which computed all the steps at once, sequentially. We have split the code into different steps/tasks to represent them as workflow. As a result, we manage to execute the whole workflow within 30 min on a small cluster compare to the 4 hours of the initial sequential version./

Scientists: Steve Buyske (Rutgers University)


Genomic Studies of Mental Disorders

Click to zoom in

The computational portal developed for the Center for Genomic Studies of Mental Disorders uses Pegasus to manage workflows for genetic population studies. This portal uses the Wings (url) workflow composition system and Pegasus to enable scientists to launch an analysis based on the available workflow template. Below is a screenshot of the portal Workflow Gallery.

Scientists: Chris Mason, Cornell Medical School; Yolanda Gil, ISI

Neuroscience

Pegasus is also used in the Telescience project and portal to support 3D reconstruction of electron tomography images. The UCSD scientists plan to continue to rely on our workflow technologies to expand the set of Grid applications they support within their portal environment and to develop new techniques that can provide real-time feedback from the 3D reconstruction to the scientists manipulating the instrument.

Scientists: Mark Ellisman, Steven Peltier, Abel Lin (UCSD)

Publications: A. Lathers, M.-H. Su, A. Kulungowski, A. W. Lin, G. Mehta, S. T. Peltier, E. Deelman, and M. H. Ellisman, Enabling Parallel Scientific Applications with Workflow Tools, Proceedings of Challenges of Large Applications in Distributed Environments (CLADE), Paris, 2006.

Botany

Plant scientists at University of Wisconsin Madison are using Pegasus to generate movies of plant root growth and analyze images collected via time-lapse photography. Another project samples forest locations to characterize the understory vegetation to determine how different plant species are distributed in the woods.