Lucia et al. (2025) Harnessing social sensing for real-time flood event reconstruction: A digital autopsy of the 2024 Valencia DANA
Identification
- Journal: International Journal of Disaster Risk Reduction
- Year: 2025
- Date: 2025-12-24
- Authors: José Giner Pérez de Lucia, Adrián López-Ballesteros, Julio Fernández-Pedauyé, Javier Senent‐Aparicio, José María Cecilia
- DOI: 10.1016/j.ijdrr.2025.105966
Research Groups
- Sensing Tools, S.L.
- Computer Engineering Department, Universitat Politècnica de València
- Centro de Investigaciones sobre Desertificación (CIDE), CSIC-UV-GVA
Short Summary
This study reconstructs the 2024 Valencia floods by integrating over 156,000 geolocated social media messages with hydrological and hydraulic models, demonstrating the feasibility of combining human-sensed information with physically based models to enhance real-time situational awareness and disaster risk reduction.
Objective
- How can citizen-generated data be systematically integrated with physically based flood models to improve reconstruction and monitoring?
- What are the temporal and spatial relationships between social media activity and modeled flood dynamics?
- How can Retrieval-Augmented Generation enhance the transformation of unstructured crisis information into actionable insights?
Study Configuration
- Spatial Scale: Province of Valencia, Spain, focusing on the Rambla del Poyo watershed and highly impacted urban areas including Paiporta, Catarroja, Massanassa, Sedaví, Alfafar, Picanya, Aldaia, Bonaire, Utiel, and Chiva. The hydraulic model covered a 126 km² 2D flow area, encompassing the last 30 km of the Rambla del Poyo.
- Temporal Scale: The catastrophic flood event occurred on October 29, 2024. Social media data were collected from 10:00 AM on October 29 to 2:00 PM on November 19, 2024. Hydrological and hydraulic simulations covered the period from October 29, 2024 00:00 h to October 31, 2024 00:00 h.
Methodology and Data
- Models used:
- Hydrological Model: Hydrologic Engineering Center’s Hydrologic Modeling System (HEC-HMS)
- Hydraulic Model: Hydrologic Engineering Center’s River Analysis System (HEC-RAS) (2D approach, Diffusion Wave Equations - DWE)
- Social Media Processing: Sensing Tools analytics framework
- Natural Language Processing (NLP):
- Sentiment Classification: multilingual Twitter-XLM-RoBERTa-base sentiment model
- Emotion Classification: RoBERTuito model
- Named Entity Recognition (NER): xlm-roberta-large-ner-spanish model
- Topic Modeling: BERTopic (using all-MiniLM-L12-v2 for embeddings, UMAP for dimensionality reduction, HDBSCAN for clustering, c-TF–IDF for keyword extraction)
- Retrieval-Augmented Generation (RAG) System:
- Embedding: OpenAI’s text-embedding-ada-002 model
- Vector Database: FAISS-based
- Retrieval: Maximum Marginal Relevance (MMR) with a multilingual cross-encoder re-ranking layer
- Generation: Instruction-tuned GPT-5-mini model
- Data sources:
- Social Media: Over 156,000 unique geolocated messages from X (formerly Twitter) via its Academic Research API.
- Meteorological: Subdaily precipitation data from October 29, 2024, collected from three meteorological stations (AEMET and Júcar River Basin Agency - CHJ), including station 8337X (Túris) which reported 771.6 mm in 24 hours.
- Topographic: 25-meter Digital Elevation Model (DEM) from Centro Nacional de Información Geográfica (CNIG) for HEC-HMS; high-resolution 2-meter DEM from CNIG for HEC-RAS.
- Land Use/Land Cover (LULC): European CORINE Land Cover (CLC) 2018 map.
- Hydrometric: Hourly flow data from a CHJ gauging station within the Rambla del Poyo watershed (for HEC-HMS calibration).
- Satellite Imagery: Sentinel-2 satellite imagery (for HEC-RAS validation).
- Gazetteer: GeoNames gazetteer (for spatial localization).
Main Results
- Social media activity showed a clear temporal alignment with flood progression, with online reporting peaking between 22:00 and 00:00 on October 29, coinciding with the Rambla del Poyo overflow and intensified national media coverage.
- Sentiment analysis revealed that 62.3% of the discourse was negative, with anger being the predominant emotion (71.1%), reflecting widespread concern, frustration over damages, and perceived institutional shortcomings.
- Topic modeling identified 17 discussion clusters, with dominant themes including flooding and extreme weather, specific affected localities, political accountability, and solidarity efforts.
- Hydrological (HEC-HMS) and hydraulic (HEC-RAS) models demonstrated satisfactory performance, with HEC-HMS achieving a Nash–Sutcliffe Efficiency (NSE) of 0.92 and HEC-RAS flood extent showing 87% agreement with satellite-derived inundation.
- The HEC-HMS model reconstructed a peak flow of approximately 2900 m³/s at 20:30 for the Rambla del Poyo, exceeding the 1938 m³/s recorded by the damaged gauging station.
- The RAG system successfully generated structured summaries from unstructured social media data, providing actionable insights into public concerns (e.g., emergency assistance, financial aid, infrastructure), dissatisfaction with authorities, and the perceived extensive economic impact (estimated at 1.789 billion euros, affecting 64.5% of businesses, over 25,000 hectares of crops, and 32% of regional GDP).
- Cross-validation confirmed a strong spatial alignment between geolocated tweet clusters reporting flooding in municipalities (e.g., Paiporta, Catarroja) and areas of highest modeled inundation depth in HEC-RAS simulations.
Contributions
- Systematically integrates citizen-generated social media data with physically based hydrological and hydraulic models to enhance flood event reconstruction and monitoring.
- Empirically evaluates the temporal and spatial correlations between online social media activity and modeled flood dynamics, demonstrating their strong alignment.
- Introduces and validates a Retrieval-Augmented Generation (RAG) system for transforming unstructured crisis information from social media into actionable, evidence-grounded insights for emergency management.
- Provides a comprehensive, hybrid data-driven and process-based framework for flood impact assessment and post-event situational awareness.
- Highlights methodological considerations for reliable social data use in emergency management, including noise reduction, sentiment/emotion analysis, named entity recognition, and topic modeling in multilingual contexts.
Funding
- Projects TED2021-130890B funded by MCIN/AEI/10.13039/501100011033
- AM-DS (INREED/2024/1), funded by the European Union - NextGenerationEU
- Spanish State Research Agency (Agencia Estatal de Investigación, AEI) through the Industrial PhD Grant (DIN2024-014116-2) for José Giner Pérez de Lucia
- Juan de la Cierva Postdoc Spanish Program (JDC2023-050965-I) for Adrián López-Ballesteros
Citation
@article{Lucia2025Harnessing,
author = {Lucia, José Giner Pérez de and López-Ballesteros, Adrián and Fernández-Pedauyé, Julio and Senent‐Aparicio, Javier and Cecilia, José María},
title = {Harnessing social sensing for real-time flood event reconstruction: A digital autopsy of the 2024 Valencia DANA},
journal = {International Journal of Disaster Risk Reduction},
year = {2025},
doi = {10.1016/j.ijdrr.2025.105966},
url = {https://doi.org/10.1016/j.ijdrr.2025.105966}
}
Original Source: https://doi.org/10.1016/j.ijdrr.2025.105966