Applications
Pegasus is being used in a broad range of
applications. This pages shows some of the examples.
Also to see the details of the workflows of the applications below you can visit our workflow gallery at http://vtcpc.isi.edu/pegasus/index.php/WorkflowGenerator.
We are looking for new applications willing to leverage our workflow technologies. If you are interested please contact us at pegasus at isi dot edu
Earthquake Science
The Southern California Earthquake Center (SCEC)
uses our workflow technologies to produce more accurate seismic hazard maps.
These maps, generated as part of the SCEC CyberShake project, indicate the maximum amount of shaking expected at a particular
geographic location over a certain period of time. The hazard maps are used by
civil engineers to determine building design tolerances. Pegasus maps the
CyberShake workflows onto SCEC and NSF CyberInfrastructure resources. The
figure on the left shows the results of running CyberShake on the TeraGrid in the fall of 2005. The workflows ran over a period of 23 days and processed 20TB of data using 1.8 CPU
Years. The total number of tasks in the workflows was 261,823. CyberShake
delivers new insights into how rupture directivity and sedimentary basin
effects contribute to the shaking experienced at different geographic locations. As a result more accurate hazard maps can be created.
SCEC plans to utilize Pegasus in a new application Broadband. The image below shows a broadband workflow for a single source, stations and velocity file. Plans are to run workflows that involve 10's to 100's of sources and 100's of stations.
SCEC is also using Pegasus and DAGMan in the Earthworks Portal, a TeraGrid Science Gateway, hosted at Washington University that allows users to configure and execute earthquake wave propagation simulations structured as workflows through a simple portal interface.
David Okaya's of SCEC has an interesting slide on Benefits of Scientifc Workflows for an application scientists perspective.
SCEC Scientists: Thomas H. Jordan, Scott Callaghan, Edward Field, Hunter Francoeur, Robert Graves, Nitin Gupta, Vipin Gupta, Philip Maechling, John Mehringer, David Okaya, Li Zhao
Gravitational-Wave Physics
Pegasus
is used in the Laser Gravitational
Wave Observatory (LIGO) project to map binary inspiral analysis
workflows onto the Open Science Grid. A month of LIGO data requires many thousands of jobs, running for days on hundreds of CPUs. The figure on the
right illustrates the use of OSG for the LIGO workflows over the period of
November 2006 to early January 2007. The figure was created using the Monalisa monitoring software used on OSG. The workflows were run across several OSG sites
and used a total of 2.5 CPU years of computing over the period 2 months.
LIGO Scientists: Kent Blackburn, David Meyers, Michael Samidi (Caltech)
Astronomy
Pegasus is used in astronomy, and in particular in the Montage application which delivers science-grade mosaics of the sky. Our technologies were used to transform a single-processor Montage code into a complex workflow and parallelized computations to process larger-scale images. Montage workflows mapped by Pegasus to the NSF CyberInfrastructure are characterized by tens of thousands of executable tasks and the processing of thousands of images. The image on the right (Beaton et al. Ap J Lett in press) was recently created to verify a bar in the spiral galaxy M31. Eleven major projects and surveys worldwide, such as the Spitzer Space Telescope Legacy teams have integrated Montage and therefore Pegasus into their pipelines and processing environments to generate science and browse products for dissemination to the astronomy community.
For other Montage success stories please visit: Recent Talk by Bruce Berriman at GRITS May 14, 2009
http://montage.ipac.caltech.edu/applications.html
Montage Scientists: Bruce Berriman, John Good, (IPAC) Dan Katz (LSU), and Joe Jacobs (Caltech)
Epigenomics 
The USC Epigenome Center is currently using the Illumina Genetic Analyzer (GA) system to generate high throughput DNA sequence data (up to 8 billion nucleotides per week) to map the epigenetic state of human cells on a genome-wide scale.
We have implemented an automated analysis
pipeline using Pegasus-WMS to support these epigenomic sequencing efforts. The
workflow shown below consists of seven basic steps which
(1) transfer sequence data to the cluster storage system,
(2)
split sequence files into multiple parts to be processed in parallel,
(3) convert sequence files to the appropriate file
format,
(4) filter out noisy and
contaminating sequences,
Epigonemic Scientists :Ben Berman, Jonathan Buckley, James Knowles and Peter Laird (USC)
Ocean JPL
This is a project with JPL Nasa to run workflows on IRIX clusters. They perform Ocean Temperature Modeling Analysis. The goal is to run the Ocean Workflow under 6 hours. This workflow consumes about 1.8Gb Data and produces output of about 8.7 MB.
Ocean Scientists: Peggy Li (JPL, NASA)
Helioseismology
The Solar Dynamics Observatory (SDO) is NASA's most important solar physics mission of this coming decade. To be launched near the end of 2008, the three primary instruments on board SDO are the Helioseismic and Magnetic Imager (HMI), the Atmospheric Imaging Assembly (AIA) and the Extreme ultraviolet Variability Experiment (EVE). The data will be predominantly used to learn about solar magnetic activity and to probe the internal structure and dynamics of the Sun with helioseismology.
The solar scientific community is to be inundated with a flood of large-volume SDO data (about 1 TB/day). The German Data Center (GDC) for the Solar Dynamics Observatory (SDO), hosted by the Max Planck Institute for Solar System Research in Germany, will provide access to SDO data for Europeans. The GDC-SDO will make available all the relevant Helioseismic and Magnetic Imager (HMI) data for helioseismology and smaller selected Atmospheric Imaging Assembly (AIA) data sets. This project commenced in August 2007 and is funded by the German Aerospace Center (Deutsches zentrum fuer Luft- und Raumfahrt or DLR) until December 2012. Additional information about the GDC-SDO can be found at http://www.mps.mpg.de/projects/seismo/GDC1/.
While the Data Record Management System (DRMS) has been developed and is distributed by the Stanford/Lockheed Joint Science Operations Center (JSOC), there is also the need for an advanced Workflow Management System to process the data at the German site. Pegasus is currently being implemented at the GDC-SDO to run the helioseismic tomography pipeline. This particular workflow will read sets of HMI/SDO images from the local database, remap and track images, apply filters in 3D Fourier space, compute helioseismic travel times, and invert these travel times to form tomographic images of the solar interior. Because the data flow will be continuous, significant compute power will be required. Pegasus/Condor will distribute the jobs on a local compute cluster (~150 cores) and, later, on remote grids.
Helioseismology Scientists: Laurent Gizon, Raymond Burston, Yacine Saidi
Genome Analysis
GADU - the Genome Analysis and Database Update system, has been using Pegasus for the
past 2 years to perform high-throughout analysis and annotation of the genomics
information that it regularly fuses from multiple public information sources,
providing an integrated facility that supports research programs within DOE as
well as public visitors to its web portal. GADU workflows are being run across
the Open Science Grid and TeraGrid, applying tools such as BLAST, BLOCKS and
PFAM to enrich the warehouse.
The graph on the left shows the number of GADU workflow jobs run on OSG for the past year April 2006-January 2007. (source OSG's Monalisa)
GADU scientists: Natalia Maltsev, Alex Rodriguez, Dinanath Sulakhe, Elizabeth Marland, Veronika Nefedova (ANL)
A climate modeling application has used out tools to reduce the amount of time computations take. Simulations which used to take 2.5 months to run manually, took only 2.5 days to run using our tools. In Automating "Climate Science: Large Ensemble Simulations on the TeraGrid with the GriPhyN Virtual Data System," Veronika Nefedova, Robert Jacob, Ian Foster, Zhengyu Liu, Yun Liu, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi, e-Science 2006, Amsterdam, December 4-6, 2006
Neuroscience
Pegasus is also used in the Telescience project and portal to support 3D reconstruction of electron tomography images. The UCSD scientists plan to continue to rely on our workflow technologies to expand the set of Grid applications they support within their portal environment and to develop new techniques that can provide real-time feedback from the 3D reconstruction to the scientists manipulating the instrument.
Telescience Scientists: Mark Ellisman, Steven Peltier, Abel Lin (UCSD)
Data mining
Data mining and natural language processing applications at USC/ISI are new user communities that are exploring the use of our workflow technologies to manage the large-scale computations on today’s cyberinfrastructure.
