Nie et al. (2025) Large language models for environmental modeling: Framework, capabilities, constraints

Identification

Journal: Journal of Environmental Management
Year: 2025
Date: 2025-12-30
Authors: Qiyang Nie, Tong Liu
DOI: 10.1016/j.jenvman.2025.128417

Research Groups

Graduate School of Environmental Science, Hokkaido University, Japan
Faculty of Environmental Earth Science, Hokkaido University, Japan

Short Summary

This study introduces and evaluates two Large Language Model (LLM) integration frameworks, "Copilot" (human-AI collaborative) and "Autopilot" (LLM-driven automation), for environmental modeling workflows like parameter calibration and real-time correction, using the Rainfall–Runoff–Inundation (RRI) model in the Kuzuryu River basin. It finds that Copilot excels in human-supervised tasks, while Autopilot struggles with data-intensive, long-sequence tasks due to attention decay.

Objective

To introduce and evaluate two Large Language Model (LLM) integration frameworks (Copilot and Autopilot) for environmental modeling, specifically for parameter calibration and real-time correction, using the Rainfall–Runoff–Inundation (RRI) model.

Study Configuration

Spatial Scale: Kuzuryu River basin, Japan
Temporal Scale: Processes related to parameter calibration and real-time correction, implying historical periods for calibration and continuous/near-continuous processing for real-time applications.

Methodology and Data

Models used:
- Rainfall–Runoff–Inundation (RRI) model
- Large Language Models (LLMs)
- Copilot framework (human-AI collaborative LLM integration)
- Autopilot framework (LLM-driven automation)
Data sources: Implied observational data for rainfall, runoff, and inundation for model calibration and real-time correction.

Main Results

The Copilot framework demonstrated robust performance, effectively using prompt engineering for algorithm comprehension and code generation.
Copilot achieved strong performance in parameter calibration (Nash-Sutcliffe Efficiency of 0.91 for calibration and 0.81 for validation) and delivered stable real-time correction.
The Autopilot framework showed competence in physics-constrained parameter calibration but failed in long-sequence, data-intensive real-time correction tasks due to "attention decay."
LLMs are currently most effective as knowledge engines and coding assistants within human-supervised workflows (Copilot), while full automation (Autopilot) is limited by context window size and weaknesses in processing long numerical sequences.

Contributions

Introduction of two novel, reproducible frameworks (Copilot and Autopilot) for integrating Large Language Models into environmental modeling workflows.
Empirical evaluation of LLM capabilities and constraints in practical environmental modeling tasks (parameter calibration and real-time correction).
Identification of key design principles (task decomposition, physics constraints, oversight checkpoints) for generalizable LLM deployment in environmental modeling.
Highlighting the strengths of LLMs in human-supervised roles and their current limitations in fully automated, data-intensive environmental modeling tasks.

Funding

Not specified in the provided text.

Citation

@article{Nie2025Large,
  author = {Nie, Qiyang and Liu, Tong},
  title = {Large language models for environmental modeling: Framework, capabilities, constraints},
  journal = {Journal of Environmental Management},
  year = {2025},
  doi = {10.1016/j.jenvman.2025.128417},
  url = {https://doi.org/10.1016/j.jenvman.2025.128417}
}

Original Source: https://doi.org/10.1016/j.jenvman.2025.128417