A collaborative effort of several National Oceanic and Atmospheric Administration (NOAA) and National Weather Service (NWS)hydrology research laboratories, the Research Distributed Hydrologic Model (RDHM) (Koren et al. 2004, Reed et al. 2007), was developed to improve streamflow predictions in streams and rivers and to improve flash flood forecasting by incorporating evolving estimates of relevant parameters, such as soil moisture, soil temperature, permeability, and vegetation, in addition to the primary forcing mechanism, rainfall. Since its inception in 2008, local domain knowledge has improved. For example, in the Dallas-Fort Worth (DFW) metroplex in North Texas, high resolution Lidar based surveys have enabled the creation of highly accurate topology mappings which expose riverbeds and manmade drainage systems. Additionally, urban surface changes have been well documented thanks to such mappings. This knowledge has also contributed to an improvement in the accuracy of gridded rainfall estimates, which are provided at higher temporal and spatial resolutions. The RDHM model has traditionally been run on a single, powerful dedicated server. In an effort to improve scalability as resolutions and domain sizes increase, and to tailor the output for stakeholder decision support, here we have containerized the model for use in the cloud, and adopted Pegasus to manage the workflow.
Figure 1: RDHM Pegasus Workflow
The RDHM workflow (depicted in figure 1) was developed using the Pegasus 5.0 Python API and is executed in an ExoGENIdeployment by Pegasus on a continual basis using the latest precipitation data provided by two sources: the Next Generation Weather Radar (NEXRAD) system and a network of Doppler radars operated by the Collaborative Adaptive Sensing of the Atmosphere (CASA)partnership. This data is then combined and fed into the RDHM flood model, along with a multitude of input parameters derived from high-resolution topography, land cover, land use, and soil data of the DFW region, and other hydrologic and climatological data. To ensure that the RDHM flood model has access to best estimates of the current hydrologic conditions, its output from the previous run of the workflow is also passed in as an input, forming a feedback loop over all consecutive runs of this workflow. Once complete, the RDHM model outputs streamflow, runoff, and return period estimates valuable for flash flood forecasting, along with the updated parameterizations to be used for the next instantiation of the model. This output data is formatted as XMRG, a legacy binary format in the Hydrological Rainfall Analysis Projection (HRAP) coordinate system, used by the NWS. The next steps in this workflow involve converting the XMRG output file(s) into the modern GIS format GeoJSON, and the packed binary grid format NetCDF, with standard WGS84 map projections. The output GeoJSON files can be ingested by mapping services such as Google maps for web display (figures 2, 3, 4), while the NetCDF files are contoured into geofenced polygons representing areas where the hydrologic values exceed standard thresholds for alerting. These contours are sent to a remote database in near real time, triggering targeted flash flood alerts sent to people in the risk area, as well as city emergency managers and stormwater management personnel.
To ensure portability across various clouds, this workflow utilizes Pegasus’ Singularity container support. The RDHM model and all the downstream processes are containerized with the latest model parameters stored locally for continuity across runs and also to persist offline should the compute cloud architecture go down.