Barahona et al. (2026) Deep learning representation of the aerosol size distribution

Identification

Journal: Geoscientific model development
Year: 2026
Date: 2026-03-26
Authors: Donifan Barahona, Katherine H. Breen, Karoline Block, Anton Darmenov
DOI: 10.5194/gmd-19-2437-2026

Research Groups

NASA, Goddard Space Flight Center, Greenbelt, MD, USA
Morgan State University, Baltimore, MD, USA
Leipzig Institute for Meteorology, Faculty of Physics and Earth Sciences, University of Leipzig, Leipzig, Germany

Short Summary

This study develops MAMnet, a deep learning model, to predict the aerosol size distribution (ASD) and mixing state for seven lognormal modes based on bulk aerosol mass and meteorological conditions. MAMnet accurately reproduces the output of a two-moment modal aerosol scheme and shows good agreement with field measurements when driven by reanalysis data, offering an efficient way to improve aerosol representation in atmospheric models.

Objective

To develop a neural network model (MAMnet) that predicts the aerosol size distribution (ASD) and mixing state for seven lognormal modes using bulk aerosol mass and meteorological state as inputs.
To enable computationally efficient, single-moment, mass-based aerosol schemes to represent detailed aerosol microphysics, thereby improving aerosol representation in atmospheric models for large-scale applications like weather forecasting, remote sensing, and data assimilation.

Study Configuration

Spatial Scale: Global (GEOS+MAM7 simulations at 1° horizontal resolution, 72 vertical levels from the surface to 0.01 hPa); regional (24 European ground sites for observations); global (CCN datasets). The parameterization is designed to be resolution-independent.
Temporal Scale: GEOS+MAM7 simulations for training/testing covered 5 years (2001–2006) with instantaneous outputs at 12-hour intervals. Observational data for evaluation spanned 2 years (2008–2009) with hourly measurements. Global CCN datasets were averaged over 2006–2021.

Methodology and Data

Models used:
- MAMnet: A Multilayer Perceptron (MLP) neural network developed in this study to predict aerosol number concentration and composition for 7 lognormal modes.
- MAM7 (Modal Aerosol Module): A two-moment modal aerosol scheme implemented in GEOS, used to generate the training data and for comparison.
- GOCART (Goddard Chemistry, Aerosol, Radiation, and Transport model): A mass-based aerosol model, representing the type of scheme MAMnet is designed to enhance.
- GEOS (NASA Goddard Earth Observing System): The Earth system model framework hosting MAM7 and GOCART.
Data sources:
- Training/Validation/Test Data: Simulated data from GEOS+MAM7 (5-year simulation, 1° horizontal resolution, 72 vertical levels, 12-hour output). Inputs to MAMnet included total mass mixing ratios of dust, sulfates, organics, black carbon, sea salt, air temperature (K), and air density (kg m⁻³).
- Reanalysis Data (for evaluation):
  - MERRA-2 (Modern Era Retrospective analysis for Research and Applications, version 2) for aerosol mass, temperature, and air density.
  - CAMS (Copernicus Atmosphere Monitoring Service) reanalysis for CCN comparison.
  - GiOcean (coupled atmosphere–ocean–aerosol reanalysis) for CCN comparison.
- Observational Data (for evaluation):
  - Near-surface cumulative aerosol number concentrations (30 to 500 nm) from 24 sites across Western Europe (EUSAAR and GUAN networks, 2008–2009).
  - In-situ cloud condensation nuclei (CCN) measurements from the Global Aerosol Synthesis and Science Project (GASSP) (37 field campaigns, >1000 aircraft flights).
  - CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) lidar measurements for CCN comparison.

Main Results

MAMnet accurately reproduces MERRA-2 aerosol concentrations when driven by MERRA-2 inputs, demonstrating mass conservation and successful generalization beyond the GEOS+MAM7 training data.
Comparison with GEOS+MAM7 simulations shows high spatial correlations (Pearson's R > 0.9) and low mean log-bias (MLB) typically within ±0.1 for most modal number concentrations and mass variables across all pressure levels.
Performance slightly degrades at very high altitudes (pressure < 100 hPa) and near the surface (pressure > 900 hPa), particularly for Aitken and coarse dust modes, exhibiting localized under- or overprediction.
MAMnet successfully learns the physically consistent relationship between aerosol mass and number, accurately reproducing the geometric mean diameter (Dpg) with MLB typically below 0.01 and R > 0.9, despite Dpg not being an explicit target.
Explainable machine learning analysis (Shapley values) identifies sulfate, sea salt, dust, temperature, and air density as dominant input features, highlighting their non-linear influence on aerosol number and mass predictions.
When driven by MERRA-2, MAMnet reasonably reproduces observed cumulative aerosol size distributions at various European ground sites, although it tends to predict slightly lower median values for larger particle sizes and underestimates observed variability.
MAMnet-derived CCN concentrations (MAMnet-MERRA2) are within the range of other global datasets but consistently show the lowest NCCN globally, particularly over oceanic regions, and a monotonic decrease with altitude, differing from peaked profiles observed in GiOcean, CALIOP, and in-situ data.

Contributions

Introduces MAMnet, a novel deep learning model that provides an efficient and accurate parameterization for predicting aerosol size distribution and mixing state from bulk aerosol mass and meteorological data.
Bridges the gap between computationally inexpensive bulk aerosol schemes and more physically comprehensive modal schemes, enabling improved aerosol representation in large-scale atmospheric models without significant computational overhead.
Demonstrates the ability of a neural network to learn complex, physically consistent relationships (e.g., mass-to-number conversion and mass conservation) implicitly from simulated data.
Offers a versatile and resolution-independent tool to enhance aerosol microphysics in applications such as weather forecasting, remote sensing, and data assimilation, where detailed aerosol information is crucial but often limited.

Funding

National Aeronautics and Space Administration (NASA) Modeling, Analysis and Prediction program, Grant NNH20ZDA001N-MAP.

Citation

@article{Barahona2026Deep,
  author = {Barahona, Donifan and Breen, Katherine H. and Block, Karoline and Darmenov, Anton},
  title = {Deep learning representation of the aerosol size distribution},
  journal = {Geoscientific model development},
  year = {2026},
  doi = {10.5194/gmd-19-2437-2026},
  url = {https://doi.org/10.5194/gmd-19-2437-2026}
}

Original Source: https://doi.org/10.5194/gmd-19-2437-2026