AI to the (climatological data) Rescue – data rescue to uncover the past
Knowledge about past climate and extreme weather events is an essential part of understanding future climate change and variability. Historically, knowledge about past weather events is under-represented in less-developed parts of the world, while digitized past climate information in developed nations is much more prevalent. These paper archives are the only record of historical climate variability, and many are stored in vulnerable environments. Today, archives are digitized by experienced typists at high costs or with citizen science projects that can take considerable time. Recent advances in AI/ML make the global data rescue community wonder if this process can be automated.
Historical meteorological data are typically recorded as handwritten digits. Weather parameters (e.g. temperature, precipitation) are diligently noted a few times per day in structured dense formats (Figure 1). There are two ways in which AI/ML can expediate the digitization of these records: 1. tabular structure recognition (TSR) methods can help extract locations of cells and headings, and 2. hand-written text recognition methods can digitize the text.
Drawing upon the experience of the Swedish Meteorological and Hydrological Institute, which developed the Python code Dawsonia, and a two-week hackathon at KNMI, this project will focus on improving the Tabular Structure Recognition (TSR) in photos of historical records using AI/ML methods. Challenges arise from the state of the historical paper, image quality, digits overlapping lines or one another, and the need for high-precision results.
Singh and Middleton (under review) suggest that semi-supervised techniques could facilitate rapid digitization of tabular data from historical sources. Manual steps to select a table outline or a grid overlay might be included. The student will access different standard methods, such as the CascadeTabNet model, to be used on this particular data format. Additionally, advanced automated image corrections will be considered.
About KNMI
Koninklijk Nederlands Meteorologisch Instituut (KNMI) or The Royal Netherlands Meteorological Institute in English, is the Dutch national weather forecasting service, which has its headquarters in De Bilt, in the province of Utrecht, central Netherlands. The primary tasks of KNMI are weather forecasting, monitoring of climate changes and monitoring seismic activity. This project will be done with the Climate Services team in the department ‘R&D Observations and Data Technology’. KNMI will provide the student with meteorological data forms both already digitized for labelling or training data and new test sets.
Requirements: regular in-person visits to KNMI, De Bilt (at least one time a week).
Contact: Yuliya Shapovalova (RU), Kirien Whan and Marlies van der Schee (KNMI).
References
Singh, L. G., & Middleton, S. E. (Under review). Data Rescue for Historical Document Tables Using Semi-Supervised Learning. International Journal on Document Analysis and Recognition (IJDAR). doi.org/10.21203/rs.3.rs-4391424/v1
Figure 1. Meteorological observations from St Eustatius, Caribbean Netherlands, from January 1911
