Radebe et al. (2025) A near-surface groundwater prospectivity model for the Main Karoo Basin of South Africa derived from multivariate machine learning
Identification
- Journal: Applied Water Science
- Year: 2025
- Date: 2025-12-09
- Authors: Samkelo Radebe, Martin D. Clark
- DOI: 10.1007/s13201-025-02669-x
Research Groups
- University of the Free State, South Africa
Short Summary
This study developed a near-surface groundwater prospectivity model for the Main Karoo Basin, South Africa, using multivariate machine learning, demonstrating its effectiveness in identifying high-potential zones, particularly during drought periods. The model, based on the Fast Tree Decision Learning algorithm, achieved high accuracy and showed significant alignment with known groundwater indicators.
Objective
- To evaluate and map the availability of near-surface groundwater in the Main Karoo Basin, South Africa, using multivariate machine learning models that integrate 21 conditioning factors derived from observable surface phenomena.
Study Configuration
- Spatial Scale: Main Karoo Basin, South Africa, approximately 630,000 square kilometers.
- Temporal Scale: Focus on drought conditions, specifically the 2017–2018 drought period. Standardized Precipitation Index (SPI) calculated from 1981 to 2024 to identify drought years.
Methodology and Data
- Models used: Five machine learning algorithms were tested: simple logistic, logistic regression with ridge estimator, stochastic gradient descent, random forest, and Fast Tree Decision Learning. The Fast Tree Decision Learning model was selected as the best performer.
- Data sources:
- Training Data: 173 maps derived from dolerite dike-induced aquifer models, based on Modified Soil Adjusted Vegetation Index (MSAVI) from PlanetScope multispectral satellite imagery (346 training points).
- Conditioning Factors (21): Spectral indices (MSAVI), topographical features (elevation, slope, slope roughness), geological formations (dolerite dike density, distance to dikes, fault density, fault distance, lithology), hydrological parameters (distance to dams, hydraulic conductivity, gravity, total magnetic intensity), climatic factors (average precipitation, evapotranspiration), and soil properties (average clay content, soil bulk density, soil depth, broad soil properties, soil thermal properties).
- Drought Identification: Standardized Precipitation Index (SPI) calculated from Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) dataset.
- Validation Data: Spatial distribution of geological springs, borehole yield data, and distribution of Vachellia karroo (groundwater-dependent vegetation).
- Software: Waikato Environment for Knowledge Analysis (WEKA 3.8.6) for PCA and ML training; ESRI ArcGIS Pro 3.0.3 for data processing and mapping.
Main Results
- The Fast Tree Decision Learning model achieved the highest classification accuracy of 81.4%, a Kappa statistic of 0.62, and a Receiver Operating Characteristic (ROC) area curve of 0.87.
- Areas with high potential for near-surface groundwater were identified along the Drakensberg Escarpment, the Cape Fold Belt, and the eastern Main Karoo Basin adjacent to the Indian Ocean.
- In the arid western Main Karoo Basin, localized high-potential zones coincide with intersections of drainage networks and major geological structures, characterized by borehole yields exceeding 0.009 cubic meters per second.
- Validation against independent datasets showed statistically significant alignment (p < 0.00001): 34% of Vachellia karroo occurrences, 54% of springs, and 40% of boreholes with yield data were located within very high potential zones.
- Principal Component Analysis (PCA) indicated that dolerite dike density and proximity to dolerite dikes were the strongest conditioning factors influencing groundwater availability (eigenvalues > 0.5).
- Transmissivity values (inferred as m²/day, converted to m²/s for SI) and borehole yields (m³/s) showed strong correlation with the model's prospectivity classifications (e.g., Lady Grey and Standerton with low transmissivities (3.0 x 10⁻⁶ to 1.7 x 10⁻⁵ m²/s) classified as very low potential; Aberdeen and Steynsburg with high transmissivities (2.0 x 10⁻⁴ to 2.9 x 10⁻⁴ m²/s) classified as very high potential).
Contributions
- Provides a cost-effective, adaptable, and scalable method for generating regional near-surface groundwater potential maps and expected regional potential groundwater yield maps, particularly valuable in data-scarce, arid environments.
- Circumvents limitations of traditional groundwater management approaches that rely solely on regularly monitored boreholes by integrating remote sensing of groundwater-dependent vegetation.
- Identifies localized high-potential groundwater zones in arid regions of the Main Karoo Basin, offering critical insights for water resource management and community access.
- Demonstrates the effectiveness of machine learning models in characterizing prospective areas for near-surface groundwater without intrusive and time-intensive fieldwork.
Funding
- Hans Merensky Legacy Foundation (UFS-AGR20-000248)
Citation
@article{Radebe2025nearsurface,
author = {Radebe, Samkelo and Clark, Martin D.},
title = {A near-surface groundwater prospectivity model for the Main Karoo Basin of South Africa derived from multivariate machine learning},
journal = {Applied Water Science},
year = {2025},
doi = {10.1007/s13201-025-02669-x},
url = {https://doi.org/10.1007/s13201-025-02669-x}
}
Original Source: https://doi.org/10.1007/s13201-025-02669-x