Building a Drought Impact Database for the Netherlands and Cross-Border Catchments
Overview:
Help us build a structured, high-quality historical database of drought impacts in the Netherlands using NLP on newspaper archives. You'll extract detailed information from Dutch and selected Belgian/German news sources (areas influencing the Rhine, Meuse, and Vecht river systems). The goal is to support long-term drought planning and fill a critical gap in local-scale impact records beyond existing manual reporting systems.
Key challenges:
- Curate and pre-process archived articles from Dutch and regional BE/DE newspapers
- Use topic modelling and classification to filter drought-relevant content
- Extract structured information:
- Date, location, sector, impact type, and response
- Duration, spatial extent, severity, and monetized impacts (if available)
- Normalize results and build a searchable, geotagged database
- Optionally, support spatial/temporal analysis of historical drought patterns
You'll learn about:
- NLP pipelines for processing, filtering, and structuring large volumes of news text
- Topic modelling (e.g., BERTopic) to explore evolving themes and filter relevant articles
- Information extraction techniques (e.g., named entity recognition, pattern matching, date/number normalization) to identify key impact attributes
- Drought impact typologies and socio-hydrological indicators
- How to build a usable, extensible dataset from unstructured archives
Ideal for: A student with an interest in natural language processing, climate or environmental data, and real-world applications of AI in water management. You’re excited by the challenge of turning messy historical data sources into a structured resource that can support national and regional drought policy and planning.
Contact: Hans Korving and Tom Heskes