Leveraging Pegasus for Integrated Assessment Modeling

with No Comments

In 2016, Vermont EPSCoR received a $20M award from the National Science Foundation to study and promote resiliency in Lake Champlain Basin. The study aims to provide much needed information to decision-makers as they govern the basin and develop policies that reach far into the future. A novel computer model has been developed, which integrates data collected from sensors in streams, soils, and the lake and information from adjacent land use. The model has been used to test management scenarios and identify strategies for maintaining infrastructure, environmental health and drinking water quality in the event of intense storms. The study has leveraged the Pegasus Workflow Management System to implement a distributed workflow of the proposed model, which has been executed on a heterogeneous computing environment consisting of local compute resources as well as NCAR’s Cheyenne HPC cluster.

Integrated Assessment Models

Integrated assessment models (IAMs) are commonly used to explore the interactions between different modeled components of socio-environmental systems (SES). They are particularly popular in climate change impacts studies in which climate models are linked to terrestrial process models such as hydrological or lake models to determine impacts of changes in climate on water resources and water quality. The research team for the current Vermont National Science Foundation (NSF) Established Program to Stimulate Competitive Research (EPSCoR) Research Infrastructure Improvement (RII) Track-1 grant, titled “Basin Resilience to Extreme Events” (BREE), has built an IAM to evaluate the effects of future climate and policy scenarios on water quality in the Lake Champlain Basin.

Most IAMs are built in a tightly-coupled framework so that the complex interactions between the models can be efficiently implemented within the framework in a straight forward manner. However, tightly-coupled frameworks make it more difficult to change individual models within the IAM because of the high level of integration between the models. In a loosely-coupled architecture, however, each model runs independently, without access to the current state of the SES in system memory or each other as shown in the Figure below. The initial conditions of the SES required by each component model are provided by input files. The model then runs to completion for a given time frame. After execution, the model updates the SES system state by providing updated output files. Within each time frame, the models execute in a cascade, with the models that are most sensitive to the current SES state executing at the end of the cascade.

Loosely-Coupled Design
Loosely-Coupled Design

IAM Workflow

The independent nature of each scenario and of each annual modeling task, for certain models, is exploited to increase the parallelism of the IAM workflow. The workflow (depicted in the figure below) is defined as follows:

  1. The weather estimator runs once for each climate scenario and projects a number of daily weather variables for the entire simulation period as required by the lake hydrodynamic model, EFDC, from the daily projected temperature and precipitation of each climate scenario.
  2. After the weather estimator is queued, the workflow begins looping through the tasks in a decadal time step. The models are broken into decadal time steps which corresponds to decadal feedback between the models. The time step of a decade has been initially chosen as a tradeoff between the temporal precision of the feedback data available and the additional computational resources required as the time step decreases. The first task in the decadal loop is the Land Use Land Cover Change Agent Based Model (LULCC ABM). The LULCC ABM relies on downscaled climate data, outputting land-use maps for the region every 5 years.
  3. GRASS is a Geographic Information System (GIS) software suite used to processes the LULCC ABM land-use maps into a worldview format which is required by the hydrology model, RHESSys, for each modeling run.
  4. EFDC uses daily discharge values from RHESSys and projected weather variables from the weather estimator. Due to the lake freezing each winter, the hydrodynamics are essentially reset annually, and thus, each year-long run of EFDC is independent of each other and starts with the same frozen initial conditions.
  5. The final modeling step, the lake water quality model (RCA), requires the output files from EFDC along with estimates of nutrient levels contained in the discharge generated by RHESSys as input.
Pegasus Workflow for Cascading IAM
Pegasus Workflow for Cascading IAM (source: [1] supplemental material).

Research Impact

In [1], the proposed BREE IAM workflow has been used to test the impacts of global climate change and with anthropogenic land use and land cover changes on the hydrological regime, water temperature, water quality, bloom duration and severity through 2040 in transnational Lake Champlain’s Missisquoi Bay. In the study, the authors have demonstrated their ability to predict the biogeochemical conditions of the lake in response to changing climatic, land-use and hydrological conditions, in a dynamic and spatially explicit framework, and advanced the current state of the SES computational modeling. Such computational approaches enable propagation of uncertainty across climate and land use change scenarios as well as models that will prove critical as management communities develop plans to promote or preserve water quality as global climate continues to warm. More importantly, such computational models enable disaggregation of multi-scale drivers of change occurring at different speeds and accelerations.

Maps of Missisquoi Bay showing ChlA concentration (μg l−1) averaged for the month of August; comparing first decade (2001–2010) with last decade (2031–2040) projections for four global climate models under the business-as-usual  land-use scenario (source: [1]).

Experiment Statistics

  • 132 scenarios (12 climate scenarios x 11 phosphorus reduction scenarios)
  • 60 years of modeling (2001–2060)
  • 210,000+ tasks (model data preparation and independent model runs)
  • ~40 days of wall time


[1] Zia, Asim, et al. “Coupled impacts of climate and land use change across a river–lake continuum: insights from an integrated assessment model of Lake Champlain’s Missisquoi Basin, 2000–2040.” Environmental Research Letters 11.11 (2016): 114026.

This material is based upon work supported by the National Science Foundation under Grant No. OIA-1556770.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.