Shen et al. (2026) An integrated SWAT–XGBoost–SHAP framework identifies key drivers of critical source areas during critical periods in a small watershed
Identification
- Journal: Environmental Monitoring and Assessment
- Year: 2026
- Date: 2026-04-10
- Authors: Zhengjie Shen, Huiping Zhou, Yiqun Wang, Jie Pan, Yingang Xue
- DOI: 10.1007/s10661-026-15274-5
Research Groups
- School of Environmental Science and Engineering, Changzhou University, Changzhou, China
- Changzhou Branch of Jiangsu Province Hydrology and Water Resources Investigation Bureau, Changzhou, China
Short Summary
This study developed an integrated SWAT–XGBoost–SHAP framework to identify critical periods (CPs) and critical source areas (CSAs) of non-point source (NPS) pollution in a small watershed, quantitatively analyzing the dominant driving factors and their nonlinear threshold effects. The framework identified February, June, and July as CPs contributing 59% of total nitrogen and 65% of total phosphorus loads, with fertilizer application amount and cultivated land proportion being the predominant drivers exhibiting critical thresholds.
Objective
- To quantitatively analyze the driving factors responsible for the formation of CSAs of NPS pollution in small watersheds during CPs.
- To characterize the spatiotemporal patterns of CSAs during hydrologically sensitive periods.
- To elucidate the dominant drivers and their nonlinear mechanisms governing CSA formation.
- To identify critical thresholds in these drivers to inform targeted, evidence-based watershed management strategies.
Study Configuration
- Spatial Scale: Zhongtianshe watershed, Liyang City and Guangde County, Jiangsu Province and Anhui Province, Eastern China. Total area of 40.7 square kilometres (km²). Discretized into 27 sub-watersheds (average 1.5 km²) and 225 Hydrologic Response Units (HRUs).
- Temporal Scale:
- Meteorological data: 2007–2023 (daily).
- Water quality monitoring data (Total Nitrogen (TN), Total Phosphorus (TP)): 2014–2017 (regularly sampled).
- SWAT warm-up period: 2007 (1 year).
- SWAT streamflow calibration: 2008–2015 (monthly).
- SWAT streamflow validation: 2016–2023 (monthly).
- SWAT TN and TP calibration: January 2014 – August 2015 (20 months).
- SWAT TN and TP validation: September 2015 – April 2017 (20 months).
- Focal period for analysis: 2016 (annual yield of TN and TP from each sub-watershed).
Methodology and Data
- Models used:
- Soil and Water Assessment Tool (SWAT)
- eXtreme Gradient Boosting (XGBoost)
- SHapley Additive exPlanations (SHAP)
- Data sources:
- Digital Elevation Model (DEM): 12.5 metre (m) resolution, ALOS satellite data (NASA’s EARTHDATA platform).
- Soil type classification: Digitized 1:50,000 soil map (Liyang Soil and Fertilizer Technology Guidance Station), Jiangsu Liyang Soil Records, Harmonized World Soil Database (HWSD).
- Land Use and Land Cover (LULC): 10 m resolution World Cover dataset (v2.0, European Space Agency), Sentinel-2 satellite imagery (10 m resolution, 2016, United States Geological Survey).
- Meteorological data: Daily (2007–2023) precipitation, wind speed, relative humidity, solar radiation, and temperature (Liyang Station, China Meteorological Data Service Centre).
- Hydrological observation data: Daily time series of surface runoff (Lianyuqiao monitoring station, Changzhou Branch of Jiangsu Provincial Hydrology and Water Resources Investigation Bureau).
- Water quality monitoring data: Total Nitrogen (TN) and Total Phosphorus (TP) concentrations (2014–2017, Nanjing Normal University Taihu watershed (Liyang) Water Environment Experimental Station).
- Fertilizer application rates: Field and literature surveys.
- Explanatory variables for XGBoost: Cropland cover percentage, runoff volume, sediment yield, elevation, fertilizer amount, and the USLE-K value of the dominant soil type (derived from SWAT and other spatial data).
Main Results
- Critical Periods (CPs): February, June, and July were identified as CPs, collectively contributing 59% of total nitrogen (TN) and 65% of total phosphorus (TP) loads.
- Critical Source Areas (CSAs): Within CSAs, 56.2–66.2% of TN/TP loads originated from just 35.2–36.8% of the watershed area during CPs.
- In February, 56.2% TN and 61.6% TP loads originated from 35.3% and 35.2% of the area, respectively.
- In June, 62.2% TN and 60.4% TP loads originated from 36.8% and 36.2% of the area, respectively.
- In July, 60.1% TN and 57.5% TP loads originated from 35.7% and 35.2% of the area, respectively.
- Dominant Drivers: Fertilizer application amount (mean |SHAP|= 1.91 on the log-odds scale) and the proportion of cultivated land (mean |SHAP|= 0.52) were identified as the predominant drivers governing CSA formation. Sediment yield (mean |SHAP|= 0.65) was also a significant factor.
- Threshold Effects:
- Runoff: A critical hydrological threshold was identified at approximately 30 millimetres (mm), beyond which CSAs risk substantially increases.
- Sediment yield: A transition from negative to positive influence was observed within the 10–20 tonnes per hectare (t ha⁻¹) range, indicating a shift from acceptable to significant erosion risk.
- Fertilization rate: A significant positive inflection point was found around 50–60 kilograms per hectare (kg ha⁻¹). Beyond 100 kg ha⁻¹, SHAP values increased substantially (reaching levels 2.0), indicating excessive fertilization as a dominant catalyst for high-risk CSAs.
- USLE-K factor: The factor transitioned from negative to positive effects between 0.22 and 0.24, indicating amplified erosion susceptibility.
- Model Performance: The XGBoost model demonstrated strong predictive performance (Training: Accuracy = 0.9856, Precision = 0.9245, Recall = 0.875, F1-score = 0.907; Independent Test: Accuracy = 0.9524, Precision = 0.875, Recall = 0.800, F1-score = 0.8889). The SWAT model exhibited satisfactory performance for runoff (Nash–Sutcliffe efficiency (NSE) > 0.63) and nutrient loads (NSE for TN and TP 0.70–0.86).
Contributions
- Proposes an integrated SWAT–XGBoost–SHAP framework for the joint identification of critical periods and critical source areas, coupled with a quantitative analysis of their driving factors and nonlinear threshold effects.
- Provides a robust scientific basis and novel methodology for precision watershed management by identifying quantifiable critical conditions (tipping points) that trigger the implementation of targeted best management practices.
- Systematically deconstructs the complex, nonlinear interactions embedded within process-based SWAT simulations using an interpretable machine learning framework (XGBoost-SHAP).
- Quantifies the relative contributions and directional influence of key environmental drivers on CSA formation, including the identification of critical thresholds for factors like runoff and fertilization rate.
Funding
No specific financial support was received for this study from any funding agencies in the public, commercial, or not-for-profit sectors. Data support was acknowledged from the National Earth System Science Data Center and the Changzhou Branch of Jiangsu Province Hydrology and Water Resources Investigation Bureau.
Citation
@article{Shen2026integrated,
author = {Shen, Zhengjie and Zhou, Huiping and Wang, Yiqun and Pan, Jie and Xue, Yingang},
title = {An integrated SWAT–XGBoost–SHAP framework identifies key drivers of critical source areas during critical periods in a small watershed},
journal = {Environmental Monitoring and Assessment},
year = {2026},
doi = {10.1007/s10661-026-15274-5},
url = {https://doi.org/10.1007/s10661-026-15274-5}
}
Original Source: https://doi.org/10.1007/s10661-026-15274-5