Yu et al. (2026) Integrating XGBoost and SHAP to uncover feature contributions for river network selection across different patterns

Identification

Journal: International Journal of Applied Earth Observation and Geoinformation
Year: 2026
Date: 2026-01-24
Authors: Huafei Yu, Min Yang, Xiang Lv, Tinghua Ai, Bin Li
DOI: 10.1016/j.jag.2026.105120

Research Groups

School of Resource and Environmental Sciences, Wuhan University, Wuhan, China
Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou, China

Short Summary

This study introduces an explainable artificial intelligence framework, integrating XGBoost with SHAP, to uncover how geometric, topological, and hydrological features contribute to river network selection across different drainage patterns. The framework categorizes features into universal, pattern-sensitive, and low-contribution types, providing insights for automated, pattern-preserving generation of multiscale river networks.

Objective

To uncover how individual river features influence selection decisions for multiscale river networks, especially across different drainage patterns, by integrating an explainable artificial intelligence framework (XGBoost with SHAP).

Study Configuration

Spatial Scale: Analysis of river network selection from a 1:24,000 scale to a 1:50,000 scale. The study involved three parallel river networks and three rectangular river networks located in the Mojave National Preserve (California, USA) and the Appalachian Mountains/Hiwassee Island (USA).
Temporal Scale: Not applicable for temporal dynamics of river networks; the study focuses on developing a methodology for selecting static river network datasets at different spatial scales.

Methodology and Data

Models used:
- eXtreme Gradient Boosting (XGBoost) for river network selection.
- SHapley Additive exPlanations (SHAP) for feature contribution analysis.
- Benchmarked against Support Vector Machine (SVM), Random Forest (RF), and Multilayer Perceptron (MLP).
Data sources:
- River network data with parallel and rectangular drainage patterns.
- 1:24,000 scale data sourced from the United States Geological Survey (USGS) HydroDataSet.
- 1:50,000 scale selection labels determined by integrating HydroDataSet labels with rigorous evaluations from cartographic generalization experts.
- Features extracted for each river stroke: Horton code, river length (meters), river density (meters per square meter), sinuosity, confluence angle (degrees), number of tributaries, distance between proximal rivers (meters), upstream drainage area (square meters), degree centrality, indegree centrality, outdegree centrality, betweenness centrality, and closeness centrality.

Main Results

The XGBoost-based selection model demonstrated superior performance compared to SVM, RF, and MLP across both parallel and rectangular river network patterns, achieving higher Precision, F1-score, and Coefficient of Line Correspondence (CLC).
Features influencing river network selection were categorized into three groups through macro-, meso-, and micro-level SHAP analysis:
- Universal features: Upstream cumulative area (UpArea), Horton code (Horton), Length, Number of tributaries (NumTri), and Sinuosity. These features consistently dominate selection decisions regardless of the drainage pattern, with UpArea and Horton being particularly crucial.
- Pattern-sensitive features: Confluence angle (CAngle), Distance between proximity rivers (DisProxi), and Density. Their influence varies significantly with the drainage pattern. For instance, parallel networks favor acute confluence angles, while rectangular networks favor angles near 90 degrees. DisProxi showed inverse trends across the two patterns.
- Low-contribution features: Degree centrality (DeCen), Outdegree centrality (OutCen), Indegree centrality (InCen), Betweenness centrality (BetCen), and Closeness centrality (CloCen). These topological features exhibited negligible impact due to functional overlap with universal features.
The findings provide explainable insights that support the automated, pattern-preserving generation of multiscale river networks from fine-scale Earth Observation (EO) data.

Contributions

Provides the first systematic evidence, using an XGBoost-SHAP framework, to reveal feature discrepancies in river network selection across different drainage patterns, identifying both universal and crucial pattern-sensitive features.
Verifies the superior advantages of the XGBoost model and conducts a comprehensive, multi-perspective (macro, meso, and micro) SHAP analysis to characterize the pattern-dependent contribution mechanisms of selection features.
Contributes to automated map generalization and the broader goal of producing explainable, pattern-preserving geoinformation layers from EO-derived hydrographic data, essential for various downstream applications.

Funding

National Natural Science Foundation of China [grant number 42401545, 42394065]
Fundamental Research Funds for the Central Universities, China [grant number 2042022dx0001]

Citation

@article{Yu2026Integrating,
  author = {Yu, Huafei and Yang, Min and Lv, Xiang and Ai, Tinghua and Li, Bin},
  title = {Integrating XGBoost and SHAP to uncover feature contributions for river network selection across different patterns},
  journal = {International Journal of Applied Earth Observation and Geoinformation},
  year = {2026},
  doi = {10.1016/j.jag.2026.105120},
  url = {https://doi.org/10.1016/j.jag.2026.105120}
}

Original Source: https://doi.org/10.1016/j.jag.2026.105120