Workflows, especially data-driven workflows and workflow ensembles are becoming a centerpiece of modern computational science. However, scientists lack the tools that integrate the operation of workflow-driven science applications on top of dynamic infrastructures that link campus, institutional and national resources into connected arrangements targeted at solving a specific problem. These tools must (a) orchestrate the infrastructure in response to application demands, (b) manage application lifetime on top of the infrastructure by monitoring various workflow steps and modifying slices in response to application demands, and (c) integrate data movement with the workflows to optimize performance.
Scientific computing and scientific collaboration are both activities that cannot be defined in absolute terms as they continuously evolve over time. What is an extreme scale computation today may not be considered as such in six months and what is considered as a close collaboration today may be viewed differently in the future. Moreover, the scale of computing and the characteristics of collaborations have many dimensions, most of which are difficult to quantify. Mapping all these dimensions to a single subjective quantifiable metric has been an ongoing challenge. It is therefore critical that when dealing with computing and collaboration we focus our attention on the rate of change (dV/dt) rather than an absolute metric.
Precip is a flexible exeperiment management API for running experiments on clouds. Precip was developed for use on FutureGrid infrastructures such as OpenStack, Eucalyptus (>=3.2), Nimbus, and at the same time commercial clouds such as Amazon EC2. The API allows you to easily provision resources, which you can then can run commands on and copy files to/from subsets of instances identified by tags. The goal of the API is to be flexible and simple to use in Python scripts to control your experiments. For more information, please see the Precip software page.
Delivers robust and scalable workflow management tools to the scientific community!
The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and now clouds. Scientific workflows allow users to easily express multi-step computations, for example retrieve data from a database, reformat the data, and run an analysis. Once an application is formalized as a workflow the Pegasus Workflow Management Service can map it onto available compute resources and execute the steps in appropriate order. Pegasus can easily handle workflows with several million computational tasks.
FutureGrid is a distributed, high-performance test-bed that allows scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing.
The test-bed is composed of a set of distributed high-performance computing resources connected by a high-speed network (with adjustable performance via a network impairment device). Users can access the HPC resources as traditional batch clusters, a computational grid, or as highly configurable cloud resources where users can deploy their own virtual machines.
The Brain Span project seeks to find when and where in the brain a gene is expressed. This information holds clues to potential causes of disease. A recent study found that forms of a gene associated with schizophrenia are over-expressed in the fetal brain. To make such discoveries about what is abnormal, scientists first need to know what the normal patterns of gene expression are during development. To this end, the National Institute of Mental Health (NIMH), part of the National Institutes of Health (NIH), has funded the creation of TADHB. To map human brain "transcriptomes", researchers identify the composition of intermediate products, called transcripts or messenger RNAs, which translate genes into proteins throughout development.
As part of this project we have enabled the geneticists to analyse over 225 human brain RNA sequences using two different mapping algorithms CASAVA ELAND and Perm.
Although much work has been done in developing the national cyberinfrastructure in support of science, there is still a gap between the needs of the scientific applications and the capabilities provided by the resources. Leadership-class systems are optimized for highly-parallel, tightly coupled applications. Many scientific applications, however, are composed of a large number of loosely-coupled individual components, many with data and control dependencies. Running these complex, many-step workflows robustly and easily still poses difficulties on today’s cyberinfrastructure. One effective solution that allows applications to efficiently use the current cyberinfrastructure is resource provisioning using Condor glideins.
Large-scale applications today make use of distributed resources to support computations and as part of their execution, generate large amounts of log information. Up to now, we have been using the Netlogger analysis tools to perform off-line log analysis. Stampede extends the current offline workflow log analysis capability and develops a comprehensive middleware solution that will allow users of complex scientific applications to track the status of their jobs in real time, to detect execution anomalies automatically, and to perform on-line troubleshooting without logging in to remote nodes or searching through thousands of log files.
We build on an important class of applications, scientific workflows, that are being used today in a number of scientific disciplines including astronomy, biology, ecology, earthquake science, gravitational-wave physics, and many others that are running on today's large-scale infrastructure such as the OSG or the TeraGrid. This solution will be modular and distributed, and reusable across a broad class of applications and workflow systems.
The Center for Collaborative Genetic Studies on Mental Disorders is a collaboration of Rutgers University RUCDR, Washington University in St. Louis and the University of Southern California's Information Sciences Institute. It is funded by a grant from the National Institute of Mental Health.
The Center produces, stores, and distributes clinical data and biomaterials (DNA samples and cell lines) available in the NIMH Human Genetics Initiative. The Center creates and distributes computational tools that support investigation and analysis of the clinical data. In addition, the Center creates tools that enables researchers to determine which samples or data might be of use to them, so that they may request access from NIMH.
Over recent years, genome-wide association studies (GWAS) have allowed researchers to uncover hundreds of genetic variants associated with common diseases. However, the discovery of genetic variants through GWAS research represents just the first step in the challenging process of piecing together the complex biological picture of common diseases. To help speed the process, the National Human Genome Research Institute, is supporting new research in existing large epidemiology studies, all with a rich range of measures of health and potential disease, and many with long-term follow-up.
The focus of the new research is on how genetic variants initially identified through GWAS research are related to a person's biological and physical characteristics, such as weight, cholesterol levels, blood sugar levels or bone density. Scientists will also examine how non-genetic factors, such as diet, medications and smoking, may interact with genetic factors or each other to influence health outcomes.