Applications

Pegasus is being used in a broad range of applications.  This pages shows some of the examples.
Also to see the details of the workflows of the applications below you can visit our workflow gallery at http://vtcpc.isi.edu/pegasus/index.php/WorkflowGenerator.

We are looking for new applications willing to leverage our workflow technologies. If you are interested please contact us at pegasus at isi dot edu


Earthquake Science

The Southern California Earthquake Center (SCEC) uses our workflow technologies to produce more accurate seismic hazard maps. These maps, generated as part of the SCEC CyberShake project, indicate the maximum amount of shaking expected at a particular geographic location over a certain period of time. The hazard maps are used by civil engineers to determine building design tolerances. Pegasus maps the CyberShake workflows onto SCEC and NSF CyberInfrastructure resources.  The figure on the left shows the results of running CyberShake on the TeraGrid in the fall of 2005. The workflows ran over a period of 23 days and processed 20TB of data using 1.8 CPU Years. The total number of tasks in the workflows was 261,823. CyberShake delivers new insights into how rupture directivity and sedimentary basin effects contribute to the shaking experienced at different geographic locations. As a result more accurate hazard maps can be created.

Recently Cybershake calculated a major hazard map of the southern California Region by runing Cybershake to generate hazard curves for 200 sites. Approximately 200 Million Tasks were run on the Teragrid over a 2 month period. Results and more metrics will be published soon.

SCEC plans to utilize Pegasus in a new application Broadband. The image below shows a broadband workflow for a single source, stations and velocity file. Plans are to run workflows that involve 10's to 100's of sources and 100's of stations.

SCEC is also using Pegasus and DAGMan in the Earthworks Portal, a TeraGrid Science Gateway, hosted at Washington University that allows users to configure and execute earthquake wave propagation simulations structured as workflows through a simple portal interface.

David Okaya's of SCEC has an interesting slide on Benefits of Scientifc Workflows for an application scientists perspective.

SCEC Scientists:  Thomas H. Jordan, Scott Callaghan, Edward Field, Hunter Francoeur, Robert Graves, Nitin Gupta, Vipin Gupta, Philip Maechling, John Mehringer, David Okaya, Li Zhao


Gravitational-Wave Physics

Pegasus is used in the Laser Gravitational Wave Observatory (LIGO) project to map binary inspiral analysis workflows onto the Open Science Grid. A month of LIGO data requires many thousands of jobs, running for days on hundreds of CPUs. The figure on the right  illustrates the use of OSG for the LIGO workflows over the period of November 2006 to early January 2007. The figure was created using the Monalisa monitoring software used on OSG. The workflows were run across several OSG sites and used a total of 2.5 CPU years of computing over the period 2 months.

LIGO Scientists: Kent Blackburn, David Meyers, Michael Samidi (Caltech)

 

 

 

 

 

 


Astronomy

Pegasus is used in astronomy, and in particular in the Montage application which delivers science-grade mosaics of the sky.  Our technologies were used to transform a single-processor Montage code into a complex workflow and parallelized computations to process larger-scale images. Montage workflows mapped by Pegasus to the NSF CyberInfrastructure are characterized by tens of thousands of executable tasks and the processing of thousands of images.  The image on the right (Beaton et al. Ap J Lett in press) was recently created to verify a bar in the spiral galaxy M31.  Eleven major projects and surveys worldwide, such as the Spitzer Space Telescope Legacy teams have integrated Montage and therefore Pegasus into their pipelines and processing environments to generate science and browse products for dissemination to the astronomy community.

For other Montage success stories please visit: Recent Talk by Bruce Berriman at GRITS May 14, 2009

http://montage.ipac.caltech.edu/applications.html

Montage Scientists: Bruce Berriman, John Good,  (IPAC) Dan Katz (LSU), and Joe Jacobs (Caltech)

 

 


Epigenomics

Epigenomic Workflow (computational jobs are shown as circles,data transfer jobs as rhomboids).

The USC Epigenome Center is currently using the Illumina Genetic Analyzer (GA) system to generate high throughput DNA sequence data (up to 8 billion nucleotides per week) to map the epigenetic state of human cells on a genome-wide scale.  

We have implemented an automated analysis pipeline using Pegasus-WMS to support these epigenomic sequencing efforts.  The workflow shown below consists of seven basic steps which (1) transfer sequence data to the cluster storage system, (2) split sequence files into multiple parts to be processed in parallel, (3) convert sequence files to the appropriate file format, (4) filter out noisy and contaminating sequences, (5) map sequences to their genomic locations, (6) merge output from individual mapping steps into a single global map, and (7) use sequence maps to calculate the sequence density at each position in the genome. The Epigenome Center is currently using this workflow to process its production DNA methylation and histone modification data. While the workflow currently implements the minimum requirements to effectively analyze the data, we are currently working to add quality control and checkpoint steps to make the pipeline more robust.

Epigonemic Scientists :Ben Berman, Jonathan Buckley, James Knowles and Peter Laird (USC)

 

 

 

 


Ocean JPL

This is a project with JPL Nasa to run workflows on IRIX clusters. They perform Ocean Temperature Modeling Analysis. The goal is to run the Ocean Workflow under 6 hours. This workflow consumes about 1.8Gb Data and produces output of about 8.7 MB.

Ocean Scientists: Peggy Li (JPL, NASA)

 

 

 


Helioseismology

The Solar Dynamics Observatory (SDO) is NASA's most important solar physics mission of this coming decade. To be launched near the end of 2008, the three primary instruments on board SDO are the Helioseismic and Magnetic Imager (HMI), the Atmospheric Imaging Assembly (AIA) and the Extreme ultraviolet Variability Experiment (EVE). The data will be predominantly used to learn about solar magnetic activity and to probe the internal structure and dynamics of the Sun with helioseismology.

The solar scientific community is to be inundated with a flood of large-volume SDO data (about 1 TB/day). The German Data Center (GDC) for the Solar Dynamics Observatory (SDO), hosted by the Max Planck Institute for Solar System Research in Germany, will provide access to SDO data for Europeans. The GDC-SDO will make available all the relevant Helioseismic and Magnetic Imager (HMI) data for helioseismology and smaller selected Atmospheric Imaging Assembly (AIA) data sets. This project commenced in August 2007 and is funded by the German Aerospace Center (Deutsches zentrum fuer Luft- und Raumfahrt or DLR) until December 2012. Additional information about the GDC-SDO can be found at http://www.mps.mpg.de/projects/seismo/GDC1/.

While the Data Record Management System (DRMS) has been developed and is distributed by the Stanford/Lockheed Joint Science Operations Center (JSOC), there is also the need for an advanced Workflow Management System to process the data at the German site.  Pegasus is currently being implemented at the GDC-SDO to run the helioseismic tomography pipeline. This particular workflow will read sets of HMI/SDO images from the local database, remap and track images, apply filters in 3D Fourier space, compute helioseismic travel times, and invert these travel times to form tomographic images of the solar interior. Because the data flow will be continuous, significant compute power will be required. Pegasus/Condor will distribute the jobs on a local compute cluster (~150 cores) and, later, on remote grids.

Helioseismology Scientists: Laurent Gizon, Raymond Burston, Yacine Saidi


Genome Analysis

GADU - the Genome Analysis and Database Update system, has been using Pegasus for the past 2 years to perform high-throughout analysis and annotation of the genomics information that it regularly fuses from multiple public information sources, providing an integrated facility that supports research programs within DOE as well as public visitors to its web portal. GADU workflows are being run across the Open Science Grid and TeraGrid, applying tools such as BLAST, BLOCKS and PFAM to enrich the warehouse.

The graph on the left shows the number of GADU workflow jobs run on OSG for the past year April 2006-January 2007. (source OSG's Monalisa)

GADU scientists:  Natalia Maltsev, Alex Rodriguez, Dinanath Sulakhe, Elizabeth Marland, Veronika Nefedova (ANL)

 

 

 

 

 


A climate modeling application has used out tools to reduce the amount of time computations take.  Simulations which used to take 2.5 months to run manually, took only 2.5 days to run using our tools. In Automating "Climate Science: Large Ensemble Simulations on the TeraGrid with the GriPhyN Virtual Data System," Veronika Nefedova, Robert Jacob, Ian Foster, Zhengyu Liu, Yun Liu, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi, e-Science 2006, Amsterdam, December 4-6, 2006


Neuroscience

Pegasus is also used in the Telescience project and portal to support 3D reconstruction of electron tomography images. The UCSD scientists plan to continue to rely on our workflow technologies to expand the set of Grid applications they support within their portal environment and to develop new techniques that can provide real-time feedback from the 3D reconstruction to the scientists manipulating the instrument.

Telescience Scientists: Mark Ellisman, Steven Peltier, Abel Lin (UCSD)


Data mining

Data mining and natural language processing applications at USC/ISI are new user communities that are exploring the use of our workflow technologies to manage the large-scale computations on today’s cyberinfrastructure.