print · login   

Clinical Problem General practitioners (GPs) work with probabilities of diagnoses to head their diagnostic and therapeutic decisions. To a large extent, this is an implicit process, controlled by prior knowledge and so-called pattern recognition. Little is known about the concreteness and preciseness of the used probabilities. Uncertainty may lead to an overestimation of the probability of a rare disease, and thus lead to overuse of diagnostic facilities, unnecessary costs, and ultimately patient harm. Underestimating probabilities may lead to late diagnoses and negative consequences. The process of coming to a diagnosis starts when the patient tells his first complaint. This first utterance of a complaint during the consultation is called the reason for encounter (RFE; for example cough, back pain, headache). The RFE itself is related to probabilities of diagnoses. Every diagnosis starts with a RFE.

For common reasons to visit a doctor, age, sex and comorbidities influence probabilities of diagnoses. Deepening of our understanding of how diversity, context, multimorbidity and symptoms influence probabilities of final diagnoses, will help doctors to work more secure and evidence based. It will lead to the development of a diagnostic support tool to use in everyday practice. We aim to analyze how GPs can be supported in early diagnosis through AI support. In the period 2010-2022 we aim to define the predictive value of the patient’s RFE, medical history and contextual factors for the final diagnosis. More specifically we want to study which data influence predictive values. This knowledge is crucial to build ICT tools to support the GP. The AI challenge is to use machine learning to calculate probabilities of diagnoses based on the reason for encounter, modified for other personal and context variables, based on the data in our database.

Solution Students will be supervised by a group of GP-researchers and data managers from the Radboud Technology Center Health data. The student will get familiarized with the data structure of the data warehouse. Depending on discussions with the group, a machine learning method will be used to analyse the data. The goal is to create prediction models for diagnoses in general practice. The deliverable will be used in a consultation tool for GPs in everyday practice. To be able to show a physician the probability of diagnosis, create features out of all relevant patient data from the dataset. Features should capture relevant episodes of care and contextual information. It is easy to incorporate some patient information such as sex or age into a predictive model, but using other relevant information is much more difficult. Adding to this, a challenge is that parts of the information from the electronic health records is not medically relevant. How to use all the medical patient data and create a feature vector useable for predictions of diagnosis?

Previous work on this topic looked into different ways of using the medical history for predicting diagnosis. Three types of representation of the medical history were used for different purposes. The results two projects were promising, but not yet usable for daily practice. Limitations of these two projects were that the data was not formatted in correctly and the distinction between different types of codes was unclear. This resulted in predictions of irrelevant diagnosis. The projects also showed a large class imbalance in terms of rare diagnosis. Resulting in the prediction of too many rare diagnosis. For the third project, a GPT model was trained on the medical history of patients. Unfortunately, it did not seem to work that well. Predicting the diagnosis based on age, sex and RFE might be too difficult for our model. Therefore we want to try less complex machine learning methods for this prediction task. However, we still think this will be a challenge.

Data We will use data from the research network FaMeNet (www.famenet.nl) covering over 300.000 patient years, and over 1 million patient contacts. In this dataset, GPs structurally code every encounter in an episode of care structure. An episode of care is defined as a health problem in an individual from the first encounter until the completion of the last encounter. For every encounter, GPs structurally code the type of contact, the RFE, diagnoses and their diagnostic and therapeutic interventions. The dataset also contains information about contextual factors of the patients, such as chronic comorbidity, sex, age, ethnicity, educational level. This data is available for more than 50% of the adult population. The data are stored in a data warehouse at the department of Primary and Community care of the Radboud University Nijmegen Medical Centre (Radboud Technology Center Health Data).

People Liesbeth Hunik, Twan van Laarhoven, Michael Ricking, Annemarie Uijen, Tim Olde Hartman, Henk Schers

Requirements

  • Students data science, computer science, or artificial intelligence
  • Interest in machine learning and clinical data.

Information

  • Project duration: 3-6 months, this project is suitable as 3 month internship or master thesis.
  • Location: Radboud University Medical Center
  • For more information, please contact Liesbeth Hunik or Twan van Laarhoven