Oliveira et al. (2025) High-Throughput Identification and Prediction of Early Stress Markers in Soybean Under Progressive Water Regimes via Hyperspectral Spectroscopy and Machine Learning
Identification
- Journal: Remote Sensing
- Year: 2025
- Date: 2025-10-11
- Authors: Caio Almeida de Oliveira, Nicole Ghinzelli Vedana, Weslei Augusto Mendonça, João Vitor Ferreira Gonçalves, Dihogo Gama de Matos, Renato Herrig Furlanetto, Luís Guilherme Teixeira Crusiol, Amanda Silveira Reis, Werner Camargos Antunes, Roney Berti de Oliveira, Marcelo Luiz Chicati, José Alexandre Melo Demattê, Marcos Rafael Nanni, Renan Falcioni
- DOI: 10.3390/rs17203409
Research Groups
- Graduate Program in Agronomy, State University of Maringá, Paraná, Brazil
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, USA
- Embrapa Soja (National Soybean Research Center—Brazilian Agricultural Research Corporation), Paraná, Brazil
- Department of Biology, State University of Maringá, Paraná, Brazil
- Department of Soil Science, Luiz de Queiroz College of Agriculture, University of São Paulo, São Paulo, Brazil
Short Summary
This study developed a high-throughput, nondestructive method using hyperspectral spectroscopy and machine learning to identify and predict early stress markers in soybean under progressive water regimes. It demonstrated that a minimal set of 12 spectral bands can accurately classify drought severity and predict biochemical changes, offering a rapid solution for precision irrigation.
Objective
- To evaluate the efficacy of combined hyperspectral spectroscopy and machine learning methods for the nondestructive prediction of key physiological parameters (leaf water status and biochemical and metabolic parameters in cells) and oxidative stress biomarkers (flavonoids, proline, and phenolics) in soybean (Glycine max (L.) Merrill) plants under a gradient of water regimes.
- To hypothesize that a minimal suite of approximately 12 strategically selected wavelengths, coupled with ensemble learning models, can predict both drought severity and biochemical composition with accuracy comparable to that of traditional laboratory assays.
Study Configuration
- Spatial Scale: Individual soybean plants (Glycine max (L.) Merrill) grown in 1 × 10⁻³ m³ plastic pots in a controlled-environment chamber. Measurements were obtained from the adaxial surface of fully expanded leaves.
- Temporal Scale: Plants were subjected to eleven distinct water regimes (100% to 0% field capacity) over a period of 14 days.
Methodology and Data
- Models used:
- Machine Learning Classifiers: Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbor (KNN), Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), Gradient Boosting (GBoost), Multilayer Perceptron (MLP classifier).
- Regression Models: Partial Least Squares Regression (PLSR).
- Dimensionality Reduction and Feature Selection: Principal Component Analysis (PCA), Variable Importance in Projection (VIP), Interval Partial Least Squares (iPLS-VIP), Genetic Algorithm (GA), Random Forest (RF), Competitive Adaptive Repeated Sampling (CARS), Boruta, Lasso, Mutual Information, Recursive Feature Elimination (RFE), Linear Discriminant Analysis (LDA).
- Statistical Analysis: Analysis of Variance (ANOVA), Pearson’s correlation test, Duncan’s test, Hierarchical Clustering.
- Data sources:
- Hyperspectral Reflectance Data: FieldSpec 3 spectroradiometer (ASD Inc., Boulder, CO, USA) covering the 350–2500 nm spectral range (UV–VIS–NIR–SWIR).
- Physiological and Biochemical Data (laboratory assays):
- Quantification of chlorophyll a (Chl a), chlorophyll b (Chl b), total chlorophyll (Chl a + b), and carotenoids (Car) from 0.5 × 10⁻⁴ m² leaf segments homogenized in 1.5 × 10⁻⁶ m³ chloroform–methanol solution.
- Quantification of flavonoids (Flv) in the polar methanol extract, measured at 358 nm (ε358 = 2500 m² mol⁻¹).
- Determination of proline (Pro) content from 100 × 10⁻⁶ kg fresh leaf segments, homogenized in 2 × 10⁻⁶ m³ of 3% (w/v) sulfosalicylic acid, with absorbance measured at 520 nm.
- Quantification of soluble phenolic compounds (Phe) via an adapted Folin–Ciocalteu method, using 150 × 10⁻⁹ m³ methanolic extracts, with absorbance measured at 725 nm.
- Determination of lignin (Lig) and cellulose (Cel) content from 150 × 10⁻⁶ kg dried, powdered leaf tissue (Protein-Free Cell Wall Fraction, PFCW). Lignin was quantified via the acetyl bromide method (ε = 2210 kg m⁻⁴ at 280 nm), and cellulose was quantified as glucose equivalents.
- Determination of antioxidant activity (RSA%) via the DPPH (2,2-diphenyl-1-picrylhydrazyl) free radical assay using 50 × 10⁻⁹ m³ methanolic leaf extracts and 200 × 10⁻⁹ m³ of 1 mol m⁻³ DPPH solution, with absorbance measured at 515 nm.
- Assessment of electrolyte leakage (ELK%) from 5 × 10⁻³ m diameter leaf discs incubated in 10 × 10⁻⁶ m³ deionized water.
- Determination of relative water content (RWC%) from fresh leaf discs.
Main Results
- Hyperspectral spectroscopy combined with machine learning enabled high-accuracy, nondestructive prediction of early stress markers (pigments, osmolytes, antioxidants, cell wall compounds, and water status) in soybean under progressive drought.
- Tree-based ensemble and neural network models (Random Forest, Gradient Boosting, and Multilayer Perceptron) achieved over 95% accuracy in classifying eleven distinct water regimes, outperforming distance- and probability-based classifiers.
- Principal Component Analysis (PCA) of leaf reflectance data revealed clear separation among water treatments, with the first two principal components explaining 88% of the total variance, effectively distinguishing well-watered, intermediate, and severe deficit groups.
- A minimal set of 12 spectral bands, primarily in the red-edge (550–750 nm) and shortwave infrared (SWIR) regions (specifically around 1450 nm, 1940 nm, and 2200 nm), was identified as highly informative for predicting stress levels and biochemical changes.
- Partial Least Squares Regression (PLSR) models demonstrated robust predictive performance for foliar pigment concentrations, with R² values ranging from 0.74–0.88 for area-based pigments (e.g., chlorophyll a: 391.51 × 10⁻⁶ kg m⁻²) and 0.44–0.67 for mass-based pigments (e.g., chlorophyll a: 21.18 × 10⁻³ kg kg⁻¹).
- PLSR models also showed strong predictive capacity for other biochemical and physiological parameters, with cellulose (103.92 × 10⁻⁶ mol kg⁻¹) achieving an R² of 0.96 and radical scavenging activity (64.75%) achieving an R² of 0.81.
- Carotenoid and anthocyanin indices (CRI2, ARI2, ARI1) exhibited the highest variable importance for predicting physiological and biochemical responses, while classic indices (e.g., NDVI, GNDVI, EVI) showed limited relevance (<2%).
- Mean values for key parameters included: total chlorophyll (a + b) of 578.52 × 10⁻⁶ kg m⁻², flavonoids of 67.91 × 10⁻⁵ mol m⁻², proline of 23.38 × 10⁻³ mol kg⁻¹, lignin of 27.53 × 10⁻³ kg kg⁻¹, and relative water content of 70.46%. Phenolic compounds were reported with a mean of 1.3588 m (derived from 135.88 mL cm⁻²).
Contributions
- Developed a novel hybrid feature selection and modeling framework for high-throughput, nondestructive prediction of early drought stress markers in soybean using hyperspectral spectroscopy and machine learning.
- Demonstrated that a minimal set of 12 strategically selected wavelengths (red-edge, 1450 nm, 1940 nm, 2200 nm) can achieve high predictive accuracy (over 95% classification, R² up to 0.96 for prediction) with over 80% data dimensionality reduction. This paves the way for simplified, cost-effective proximal or UAV-mounted sensors for large-scale drought phenotyping.
- Provided a rapid, field-deployable solution for early drought detection and precision irrigation management in soybean, potentially reducing the reliance on time-consuming laboratory assays.
- Identified specific spectral "hot spots" and trait-tuned hyperspectral vegetation indices that enhance predictive performance for agronomic endpoints, advancing the understanding of optical signatures underlying plant physiological status.
Funding
- Programa de Apoio à Fixação de Jovens Doutores no Brasil (CNPq 168180/2022–7)
- Fundação Araucária (CP 19/2022—Jovens Doutores)
- Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001
Citation
@article{Oliveira2025HighThroughput,
author = {Oliveira, Caio Almeida de and Vedana, Nicole Ghinzelli and Mendonça, Weslei Augusto and Gonçalves, João Vitor Ferreira and Matos, Dihogo Gama de and Furlanetto, Renato Herrig and Crusiol, Luís Guilherme Teixeira and Reis, Amanda Silveira and Antunes, Werner Camargos and Oliveira, Roney Berti de and Chicati, Marcelo Luiz and Demattê, José Alexandre Melo and Nanni, Marcos Rafael and Falcioni, Renan},
title = {High-Throughput Identification and Prediction of Early Stress Markers in Soybean Under Progressive Water Regimes via Hyperspectral Spectroscopy and Machine Learning},
journal = {Remote Sensing},
year = {2025},
doi = {10.3390/rs17203409},
url = {https://doi.org/10.3390/rs17203409}
}
Original Source: https://doi.org/10.3390/rs17203409