SCEC Cybershake: Using Pegasus to run across OLCF Titan, Bluewaters and HPCC

with No Comments

While recently gathering data for the Pegasus Annual Report, we reached out to our collaborators at SCEC to summarize how Pegasus WMS impacted their science in 2015. Some interesting facts about SCEC Cybershake Study 15.4 performed by SCEC in 2015 using Pegasus

  • Calculations were distributed between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC’s center for High Performance Computing
  • 150 TB of intermediate data was staged, and in all about 1 PB of intermediate data was produced
  • 1.1 million node hours used over 5 weeks of wall clock time
  • At peak CyberShake calculations were running on 20% of Blue Waters and 80% of Titan
  • 8 TB of output data was automatically staged back to SCEC storage as part of the workflows

Below is an excerpt from Scott Callaghan, the lead engineer on the SCEC Cybershake effort.

The Southern California Earthquake Center (SCEC) uses Pegasus-WMS extensively to manage the execution of CyberShake, a physics-based probabilistic seismic hazard analysis (PSHA) application. PSHA quantifies the peak ground motions from all possible earthquakes, which might affect geographic locations in Southern California and establishes probabilities that these locations will experience a given level of ground motion over a time interval. Through the CyberShake project, SCEC has developed simulation-based PSHA that captures the physics of earthquakes more effectively than alternative approaches not explicitly based on physical models. PSHA results more closely model the real world and have the potential to lead to more resilient design of structures and infrastructure networks, reducing seismic risk and improving community resilience.

To provide broad impact to users of CyberShake data, such as seismologists, utility companies, and building code engineers, SCEC performed CyberShake Study 15.4 in 2015 to calculate PSHA estimates of improved accuracy. The calculations were distributed between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC’s center for High Performance Computing, using Pegasus-WMS.

2015-CyberShake schematic

The work was split up so that 150 TB of intermediate data was staged from Titan to Blue Waters for post-processing, which was automatically handled as part of the Pegasus workflows. This study was 16 times as large as previous CyberShake studies, and used 1.1 million node-hours over 5 weeks of wallclock time. On average, 54 Pegasus workflows ran concurrently, on an average of 1962 nodes across the three systems. At peak, CyberShake calculations were running on 20% of Blue Waters and 80% of Titan, running both GPU and CPU jobs. Pegasus managed over a petabyte of data, of which 8 TB was automatically staged back to SCEC storage as part of the workflows.

Study 15.4 produced an urban seismic hazard map for Los Angeles at a seismic frequency of 1 Hz, twice what was possible previously, and a goal that the SCEC computational team had been working towards for several years. Following on this success, SCEC began Study 15.12, which combines the deterministic physics-based results from Study 15.4 with stochastic high-frequency seismograms produced using software from the SCEC Broadband Platform. When complete, this study will run almost 10 million tasks using the pegasus-mpi-cluster framework and produce broadband PSHA results from 0-10 Hz, of great interest to the SCEC community.

SCEC is now looking ahead to using a new earthquake rupture forecast in CyberShake simulations, which will require simulation of 25 times as many earthquakes. The scalability required to achieve this science goal underscores the need for Pegasus to automate the execution and data management of SCEC’s large-scale workflows.

SCEC Scientists:  Scott Callaghan, Phil Maechling,  Kevin Milner, Rob Graves, Kim Olsen, Tom Jordan (PI)