The periodogram application processes time-series data collected by NASA’s Kepler mission. The Kepler satellite uses high-precision photometry to search for exoplanets transiting their host stars. In 2009 the Kepler mission began a multi-year transit survey of 170,000 stars near the constellation Cygnus. In 2010 the project released a data set containing 210,664 light curves, which record how the brightness of a star changes over time. Analyzing light curves to identify the periodic dips in brightness that arise from transiting exoplanets requires the calculation of periodograms, which reveal periodic signals in time-series data and estimate their significance. Generating periodograms is a computationally intensive process that requires high-performance, distributed computing.
An atlas compiled from more than 18 million images reveals in unprecedented detail the structure of the Milky Way's galactic plane—the center line of the galaxy that contains most of its stars. Researchers used the NSF-funded Pegasus Workflow Management System (Pegasus WMS) to produce this map of the light that the interstellar medium and dense molecular clouds absorb in the near-infrared and subsequently re-emit at longer wavelengths.
Over the past several years, the US National Science Foundation has been funding the development of collaborative web sites or ‘collaboratories’. Many communities have adopted the HUBzero platform to create collaboratories called ‘hubs’ where they can share ideas, models, experiences, publications, and data in pursuit of research and education.
Hubs in different domains leverage the same HUBzero infrastructure to support different toolsets for their own community. In 2009, the NSF George E. Brown Network for Earthquake Engineering Simulation (NEES) moved operations to Purdue and created a hub for the civil engineering community. Today, NEES.org offers more than 65 simulation and data analysis tools used to understand the damage caused by earthquakes and improve building design. One of these tools leverages an open source code, the Open System for Earthquake Engineering Simulation (OpenSees), to provide a collection of utilities for structural and geotechnical engineers.
"The Center for Biomedical Informatics at the Children’s hospital of Philadelphia is the home for the development of innovative solutions to healthcare's immediate and long-term informatics needs. CBMi provides informatics-focused services, applications, and educational programs to Children's Hospital clinicians and researchers and seek to transform their craft with high-impact, low-cost solutions. One of the CBMi’s main areas of focus is genomics.
Pegasus Workflow Management System is our platform of choice for processing next-generation sequencing data including hundreds of whole-genome and whole-exome data sets. To implement clinical sequence analysis workflows, we needed a system that provides a reproducible, self-documented and well-logged solution and Pegasus addresses all of our concerns. Our Pegasus-based NGS sequence analysis workflows turn hundreds of gigabytes of raw sequencing data into manageable list of variants that can then be interpreted by scientists and geneticists. We take ‘big data’ that even cutting edge compute systems struggle with, and turn it into tangible data formats, enabling physicians and investigators to answer questions otherwise too complicated to solve with other methods." --- Mahdi Sarmady
As the second OWM (old world monkey) sequenced (the first is Rhesus macaque), vervets, unlike the great apes who are mostly in near-extinction status, are widely available for biomedical research. (Rhesus is widely available in India but the export restriction imposed by the Indian government makes it less ideal for biomedical research). Given the genetic proximity to human compared to other model organisms such as mouse, vervet is a great model to study high-level cognitive traits, such as novelty-seeking, ADHD (Attention deficit hyperactivity disorder), etc. and some primate-specific diseases, such as HIV. A large pedigree of vervet monkeys housed in VRC (Vervet Research Colony) at North Carolina offers a great genetic resource to study these various traits in a controlled environment. Our focus right now is to sequence a large number of VRC monkeys with well-characterized and highly heritable phenotypes. The purpose is to find the genetic loci underlying these phenotypes.
Caltech astronomers are using Pegasus to generate science-grade mosaics of the sky (Montage project http://montage.ipac.caltech.edu/). Montage delivers science-grade mosaics of the sky. Our technologies were used to transform a single-processor Montage code into a complex workflow and parallelized computations to process larger-scale images. Montage workflows mapped by Pegasus to the NSF CyberInfrastructure are characterized by tens of thousands of executable tasks and the processing of thousands of images. The image above (Beaton et al. Ap J Lett in press) was recently created to verify a bar in the spiral galaxy M31. Eleven major projects and surveys worldwide, such as the Spitzer Space Telescope Legacy teams have integrated Montage and therefore Pegasus into their pipelines and processing environments to generate science and browse products for dissemination to the astronomy community.
The Montage team needed the ability to deliver provenance records along with the mosaics so that the scientific value of the images could be ascertained. To support these capabilities, we interfaced Pegasus with the PASOA provenance store (url). We also developed a pipeline-centric provenance model (url of paper).
For other Montage success stories please visit: Recent Talk by Bruce Berriman at GRITS May 14, 2009
A collaboration with NASA/IPAC Infrared Science Archive (http://irsa.ipac.caltech.edu). The imaging capabilities of the Spitzer Space Telescope have enabled for the first time surveys of the plane of our Galaxy across the infrared spectrum. When taken together with images from existing all sky surveys, these new image surveys contain over 18 million images and reveal in unprecedented detail the structure of the Galactic plane in the infrared between wavelengths of 1 µm to 70 µm. The images provide for the first time a global view of the absorption of light in the near-infrared by the interstellar medium and dense molecular clouds and its subsequent re-emission at longer wavelengths. Investigations of the details of this act of absorption and re-emission over the infrared spectrum and on a global scale is the key to making progress in important questions such as: measuring the total star formation rate of the Galaxy; assessing the supernova rate of the Galaxy; and determining whether coagulation or fragmentation governs the formation of massive stars.
Pegasus is also used in the Telescience project and portal to support 3D reconstruction of electron tomography images. The UCSD scientists plan to continue to rely on our workflow technologies to expand the set of Grid applications they support within their portal environment and to develop new techniques that can provide real-time feedback from the 3D reconstruction to the scientists manipulating the instrument.
Publications: A. Lathers, M.-H. Su, A. Kulungowski, A. W. Lin, G. Mehta, S. T. Peltier, E. Deelman, and M. H. Ellisman, Enabling Parallel Scientific Applications with Workflow Tools, Proceedings of Challenges of Large Applications in Distributed Environments (CLADE), Paris, 2006.