SOMOSPIE is an advanced earth science engine utilizing machine learning (ML) models to generate high-resolution soil moisture predictions from 27km-resolution satellite data. By integrating data from the ESA-CCI soil moisture database with hydrologically relevant terrain parameters for targeted regions, SOMOSPIE performs effective downscaling. This approach represents an alternative to traditional methods, which often rely on basic extrapolation and interpolation techniques based on data from monitoring networks.

The flexibility of SOMOSPIE’s framework enhances our understanding of soil moisture dynamics, aiding in the study of ecological processes and improving climate change forecasts on both regional and global scales. Its finer resolution is particularly suited for applications where coarse-resolution data falls short, such as precision forestry and agriculture, landscape hydrology, and ecological regeneration studies.

SOMOSPIE’s architecture consists of three main components: terrain parameter generation, data transformation, and ML-based prediction. Building on our previous work to enhance reproducibility and scalability through high-performance computing (HPC) and cloud-integrated technology, we have integrated SOMOSPIE within ACCESS Pegasus, structuring each component as a dedicated ACCESS Pegasus workflow. This integration allows SOMOSPIE to be portable across ACCESS resources, with workflows submitted from a single Jupyter Notebook to two different compute sites.

Upon submission, ACCESS Pegasus autonomously manages resources, leveraging HTCondor’s provisioning capabilities. On Jetstream2, provisioning activates preconfigured virtual machine instances, while on Anvil, the HTCondor Annex tool dispatches pilot jobs to Anvil.

Scientists: Camila Roa, Paula Olaya, Michela Taufer (University of Tennessee, Knoxville)

Resources:

Pegasus workflow implementing the terrain parameter generation component.

Pegasus workflow implementing the data transformation component.

Pegasus workflow implementing the ML-based prediction component.