Are you a MSc student who confidently learned AI, machine learning, neural networks, and/or deep learning? Are you interested in applying your knowledge and expertise to solve real-life problems in science? Are you willing to go further than your comfort zone and learn about molecular absorption spectroscopy, its applications, and face the data analysis challenges in this field? Then, we have a topic for your MSc thesis/internship. Bear with us and read what is it all about:
The recent development of ultra-broadband spectroscopy systems unleashes a great potential for sensitive and simultaneous trace detection of a very long list of molecular species. This is very interesting for applications dealing with complex matrices, such as breath analysis and plasma diagnostics. Traditionally, the classical least squares (CLS) fitting method is used in spectral analysis to decompose interfering spectral patterns of different species and extract their concentrations accurately. In this method, the model spectra of the species in the sample matrix are all calculated using existing databases and then fitted altogether to the measured spectra to retrieve their concentrations (a linear multiline fitting scheme). However, CLS suffers from some difficulties and disadvantages, such as power drift of the laser, strong absorption features, spectral artifacts due to optical components, and unfitted absorption features. These can seriously affect the accuracy of CLS.
We have recently utilized a partial least squares (PLS) regression method with a novel hybrid dataset approach as an alternative. PLS is a purely statistical model and relies on calibration measurements as training datasets. However, constructing a real training dataset is far too time-consuming and costly, as it would require many different calibrated gas mixtures measured with high precision/accuracy. Our approach is to create a simulated dataset that is tailored to specific instruments by combining simulated absorbance spectra with measured blank (featureless background) intensity spectra. While the simulations provide the absorption spectra, the blank measurements provide the realistic unique features of the spectrometer, such as noise patterns and drifts. Combining these two results in an affordable and scalable process. We have achieved encouraging results using this approach.
Meanwhile, this workflow is not specific to PLS and can also be applied to other models, such as machine learning using neural networks (deep learning). Therefore, we would like to investigate this opportunity further: In particular, we know that PLS can be sensitive to outliers in real-world data, and so investigating different alternatives like the so-called LASSO (least absolute shrinkage and selection operator), that is known to be less sensitive to outliers than least-squares methods, could be a promising approach. Alternatively, we could explicitly model the (noisy) spectrometer measurement features, and include them as a separate penalty term in the overall metric. Your task will be to establish whether either of these approaches could alleviate some of the drawbacks of existing methods. Of course you could also come up with another idea of your own!
Your research will be conducted in the Life Science Trace Detection Laboratory (TDLab), part of the Institute for Molecules and Materials (IMM) of the Faculty of Science at Radboud University, in close collaboration with the Data Science (DaS) group, at the Institute for Computing and Information Sciences (iCIS). We have a vibrant and enthusiastic group of young researchers working at the crossroads of physics, chemistry, and biology. A successful candidate will receive a basic training on molecular absorption spectroscopy to understand the minimum necessary physics behind it and become familiar with the available molecular absorption databases. He/she will work closely with a visiting PhD candidate in our group (who has already experience with utilization of deep learning in spectroscopy [1]) and receive full support from other members of TDLab. The final goal is to devise the foundations of using deep learning to tackle the problem.
Interested? Need more information? Please contact Simona Cristescu of the Trace Detection Laboratory (TDLab) and/or Tom Claassen of Data Science. We will be happy to talk to you!
[1] Huang C., et al “Deep-learning-enabled high-fidelity absorbance spectra from distorted dual-comb absorption spectroscopy for gas quantification analysis,” Applied Spectroscopy 78(3), 310-320, (2024) doi:10.1177/00037028231226341