Pegasus and Dynamo Aiding Weather Scientists

with No Comments

Computational science today depends on many complex, compute and data-intensive applications operating on distributed datasets that originate from a variety of scientific instruments and data repositories. Two major challenges for these applications are (1) the provisioning of compute resources and (2) the integration of data into the scientists’ workflow.

Background

ExoGENI [2], is a networked Infrastructure-as-a-Service (IaaS) testbed supported by NSF, that spans across multiple campuses in the US through regional and national transit networks(Internet2, ESnet, etc.). ExoGENI allows users to dynamically provision mutually isolated “slices” of interconnected infrastructure from multiple independent providers (compute, network, and storage) by taking advantage of existing virtualization mechanisms and integrating various resources together: layer2 global dynamic-circuit networks like Internet2 and ESnet, and private clouds like OpenStack [5].

NSF Chameleon Cloud [1] is a large, deeply programmable testbed designed for systems and networking experiments. Similar to ExoGENI, it leverages OpenStack to deploy isolated slices of cloud resources for user experiments. However, where ExoGENI scales in geographic distribution, Chameleon scales by providing large amounts of compute, storage, and networking resources spread across only two sites, the University of Chicago and the Texas Advanced Computing Center (TACC).

Additionally, advances in network hardware and software defined networks have enabled high performance dynamically provisioned networks. These networks not only exist within the ExoGENI testbed, but they can also “stitch” external resources to its slices, using a network artifact called stitchport. What lies beyond a stitchport is assumed to be an IP-based subnet including infrastructure such as data transfer nodes, data repository endpoints, nodes connected to storage arrays for instrument data, or any other sources or sinks of scientific data sets.

Dynamo and CASA

Dynamo [8] aims to enable high performance, adaptive and performance-isolated data-flows across a federation of distributed cloud resources (E.g, ExoGENI, Chameleon, OSG, XSEDE Jetstream) [1, 2, 3, 4] and community data repositories. Using the Mobius client [9], Dynamo facilitates the provisioning of appropriate compute and storage resources for observational science workflows from diverse, national-scale cyberinfrastructure (CI). Efforts to use, configure, and reconfigure resources are significantly simplified by this approach. Additionally, through integration with the Pegasus Workflow Management System [7], DyNamo offers automation and orchestration of data-driven workflows on the provisioned infrastructure.

The Collaborative and Adaptive Sensing of the Atmosphere (CASA) [6], has the goal to improve our ability to observe, understand, predict, and respond to hazardous weather events. CASA is one of many applications that can benefit from the Dynamo project, as it presents data movement challenges and the need to elastically scale resources on demand. Moment data from a network of seven weather radars, located in Dallas/Fort Worth (DFW), flow continuously to CASA’s data repository and new processing tasks are triggered as new data arrive. Additionally the severity of weather events increases the processing time. In these cases, in order to satisfy CASA’s QOS, more resources have to be provisioned dynamically, since data processing cannot be postponed until resources become available again.

During the first year of the project, Dynamo helped CASA scientists to improve their operational procedures. Two of CASA’s pipelines have been ported to Pegasus workflows and are taking advantage of the improved Mobius capabilities to instantiate compute resources across different cloud infrastructures and setup dedicated network links between the cloud resources, as well as between CASA’s data repository and all cloud resources, using Software Defined Networks (SDNs). The first pipeline is called Wind Speed, while the second pipeline is called Nowcast.

Casa Workflows

The Wind Speed pipeline (depicted in Figure 2), ingests radar moment data, especially radial velocity, but also other variables for noise reduction purposes. The first step in the process is the merging of an arbitrary number of individual radar data files. The wind speed algorithm attempts to create a grid of the maximum observed velocity, corrected for the height of the observation. The output is a single grid representing the estimate of the wind speed near the ground (depicted in Figure 1).

Figure 1: Wind Workflow Visualization
Figure 2: Wind Workflow DAG

The Nowcast pipeline (depicted in Figure 3) ingests asynchronous individual radar data that have previously been combined into a merged grid. These gridded data are accumulated over time for a certain initialization period and then are used as input to the nowcast algorithm. In its current version, the algorithm is processing data every minute and produces a series of gridded reflectivity data, where each grid corresponds to the projected conditions of each minute within the span of the next 30 minutes into the future. An example of the calculated reflectivity is presented in Figure 4.

Figure 3: Nowcast Workflow DAG
Figure 4: Nowcast Workflow Visualization

Pegasus is playing a key role in the orchestration and execution of CASA’s workflows. While designing these workflows we wanted to achieve two main goals: (1) portability and (2) scalability. Taking advantage of Pegasus’ container support, which provides a unified execution environment, we achieve portability across multiple clouds. Additionally, the abstract workflow definitions describe the computational tasks in their simplest form (one task triggers one executable). Even though, this approach results in more tasks in the generated DAG, it increases the versatility of the workflow. Taking into consideration the characteristics of the tasks (E.g., runtime or resource usage) and the available resources, we can instruct Pegasus to automatically cluster multiple tasks together, creating larger jobs and optimizing for data movement and resource utilization. Versions of CASA’s Nowcast and Wind Pegasus workflows can be found on our Github page, and are accompanied with testcase data. [10][11]

Acknowledgements: Dynamo is supported by the National Science Foundation, award number #1826997.

References

[1] NSF Chameleon Cloud. https://chameleoncloud.org
[2] I. Baldin, J. Chase, Y. Xin, A. Mandal, P. Ruth, C. Castillo, V. Orlikowski, C. Heermann, J. Mills. “ExoGENI: A Multi-Domain Infrastructure-as-a-Service Testbed.” The GENI Book, pp. 279–315, 2016.
[3] R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, F. Wurthwein, I. Foster, R. Gardner, M. Wilde, A. Blatecky, J. McGee, R. Quick, The open science grid, Journal of Physics:
Conference Series 78 (2007) 012057. doi:10.1088/1742-6596/78/1/012057. URL https://doi.org/10.1088/1742-6596/78/1/012057
[4] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. Scott, N. Wilkins-Diehr, Xsede: Accelerating scientific discovery, Computing in Science & Engineering 16 (05) (2014) 62–74. doi:10.1109/MCSE. 2014.80.
[5] OpenStack. https://www.openstack.org
[6] B. Philips, D. Pepyne, D. Westbrook, E. Bass, J. Brotzge, W. Diaz, K. Kloesel, J. Kurose, D. McLaughlin, H. Rodriguez, and M. Zink. “Integrating End User Needs into System Design and Operation: The Center for Collaborative Adaptive Sensing of the Atmosphere (CASA).” In Proceedings of Applied Climatol., American Meteorological Society Annual Meeting, San Antonio, TX, USA, 2007.
[7] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus: a workflow management system for science automation.” Future Generation Computer Systems, vol. 46, pp. 17–35, 2015.
[8] E. Lyons, G. Papadimitriou, C. Wang, K. Thareja, P. Ruth, J. J. Villalobos, I. Rodero, E. Deelman, M. Zink, and A. Mandal, “Toward a Dynamic Network-centric Distributed Cloud Platform for Scientific Workflows: A Case Study for Adaptive Weather Sensing,” in 15th eScience Conference, 2019.
[9] Mobius. https://github.com/RENCI-NRIG/Mobius
[10] CASA Nowcast – Pegasus Workflow. https://github.com/pegasus-isi/casa-nowcast-workflow
[11] CASA Wind – Pegasus Workflow. https://github.com/pegasus-isi/casa-wind-workflow

3,105 views