Saemian et al. (2026) A Machine Learning approach for Total Water storage anomaly eXtension back to 1980 (ML-TWiX)
Identification
- Journal: Scientific Data
- Year: 2026
- Date: 2026-01-29
- Authors: Peyman Saemian, Mohammad J. Tourian, Karim Douch, James Foster, Junyang Gou, David Wiese, Amir AghaKouchak, Nico Sneeuw
- DOI: 10.1038/s41597-026-06604-w
Research Groups
- Institute of Geodesy, University of Stuttgart, Stuttgart, Germany
- European Space Agency, ESRIN, Frascati, Italy
- Institute of Geodesy and Photogrammetry, ETH Zurich, Zurich, Switzerland
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA
- Department of Civil and Environmental Engineering, University of California, Irvine, CA, USA
- Department of Earth System Science, University of California, Irvine, CA, USA
- United Nations University, Institute for Water, Environment and Health, United Nations University, Ontario, Canada
Short Summary
ML-TWiX is a global dataset of monthly total water storage anomalies (TWSA) reconstructed from 1980 to 2012 using an ensemble of machine learning models trained on GRACE observations and global hydrological model simulations. It provides a reliable and physically consistent extension of the GRACE record, outperforming or performing comparably to existing long-term reconstructions across various validation metrics.
Objective
- To develop and validate ML-TWiX, a global, monthly total water storage anomaly (TWSA) dataset reconstructed from 1980 to 2012, extending the GRACE record into the pre-GRACE era using an ensemble of machine learning models and providing spatially explicit uncertainty estimates.
Study Configuration
- Spatial Scale: Global land areas (excluding Greenland and Antarctica) on a 0.5° × 0.5° regular latitude-longitude grid.
- Temporal Scale: Monthly data from January 1980 to December 2012.
Methodology and Data
- Models used:
- Machine Learning Models: Random Forest (RF), Extreme Gradient Boosting (XGB), Gaussian Process Regression (GPR).
- Input Hydrological/Land Surface/Reanalysis Models: PCR-GLOBWB, SURFEX-TRIP, HBV-SIMREG, HTESSEL-CaMa, JULES, LISFLOOD, ORCHIDEE, SWBM, W3RA, Community Land Model Version 5 (CLM5) with GSWP3 forcing, CLM5 with CRUNCEP forcing, WaterGAP v2.2e, ERA5.
- Data sources:
- GRACE TWSA: JPL, CSR, and GSFC mascon solutions (ensemble mean used for training/validation).
- Satellite Laser Ranging (SLR) data: Monthly SLR-based gravity fields (November 1992 to June 2017) for pre-GRACE validation.
- Water Balance Fluxes: Ensemble mean of selected precipitation (P), evapotranspiration (ET), and runoff (R) datasets (e.g., GPCC for P, PML-V2 for ET, G-RUN ENSEMBLE for R) for water balance closure assessment.
- Global Mean Sea Level (GMSL): Frederikse et al. (2020) reconstruction (1900-2018) and NASA GSFC Version 5.2 altimetry dataset (post-1992) for sea level budget assessment.
Main Results
- ML-TWiX provides a global dataset of monthly TWSA from 1980 to 2012 on a 0.5° grid, including spatially explicit uncertainty estimates.
- During the GRACE era (2002-2012), ML-TWiX consistently ranked among the top performers compared to seven other reconstruction datasets. At the basin scale, it achieved a mean Nash-Sutcliffe Efficiency (NSE) of 0.96, a mean Normalized Root Mean Square Error (NRMSE) of 13.6%, and a mean correlation of 0.99 with GRACE TWSA.
- In the pre-GRACE era (1992-2002), ML-TWiX showed strong agreement with SLR-derived TWSA, with a median basin-scale correlation of 0.82, outperforming several other reconstructions.
- ML-TWiX demonstrated superior performance in closing the terrestrial water balance in the pre-GRACE era (1980-2001), achieving the highest correlation (0.94) between its TWSA derivative and residual fluxes (P-ET-R), and a low NRMSE (45%).
- In closing the global sea level budget, ML-TWiX achieved the lowest misfit during the GRACE era (2002-2012) and performed comparably to the best existing datasets in the 1980-2001 period.
- The mean uncertainty in ML-TWiX was approximately 21 mm during the pre-GRACE period (1980-2002), decreasing to about 11 mm during the GRACE era (2002-2012), demonstrating the effectiveness of the machine learning ensemble in reducing model disagreement.
Contributions
- Introduces ML-TWiX, a novel global, monthly TWSA dataset extending the GRACE record back to 1980 with spatially explicit uncertainty estimates.
- Systematically evaluates and combines an ensemble of three top-performing machine learning models (Random Forest, XGBoost, Gaussian Process Regression) to leverage diverse modeling strengths.
- Provides comprehensive validation against multiple independent datasets (SLR, water balance closure, global mean sea level budget), demonstrating robust and consistent performance, particularly in the pre-GRACE era where observational data are sparse.
- Addresses limitations of previous reconstructions by integrating a broad range of hydrological and land surface model simulations, reducing sensitivity to structural model biases and improving robustness across climate regimes.
Funding
- Projekt DEAL (Open Access funding)
- National Aeronautics and Space Administration (NASA) contract 80NM0018D0004 (for research carried out at Jet Propulsion Laboratory)
- Earth2Observe project (for providing access to global hydrological and land surface model outputs)
Citation
@article{Saemian2026Machine,
author = {Saemian, Peyman and Tourian, Mohammad J. and Douch, Karim and Foster, James and Gou, Junyang and Wiese, David and AghaKouchak, Amir and Sneeuw, Nico},
title = {A Machine Learning approach for Total Water storage anomaly eXtension back to 1980 (ML-TWiX)},
journal = {Scientific Data},
year = {2026},
doi = {10.1038/s41597-026-06604-w},
url = {https://doi.org/10.1038/s41597-026-06604-w}
}
Original Source: https://doi.org/10.1038/s41597-026-06604-w