Causal discovery algorithms can unravel the causal graph describing the causal relationships between variables in the data set. Knowing the causal structure is very important in real-world problems such as healthcare to get a deeper understanding of how variables relate to each other, for example, understanding which patient characteristics are causally related to adverse outcomes after aortic surgery.
To maintain a realistic setting, we will only consider algorithms that work in the presence of latent confounders. For most of those algorithms, there is a special type of path that introduces additional edges in the output causal graph. In the recently published paper ‘Differentiable Causal Discovery Under Unmeasured Confounding’ by Bhattacharya et al. (2021), a causal discovery algorithm is designed that can output three different types of graphs of which one of them, the arid graph, does not introduce these additional edges. The algorithm learns the causal graph according to a score function. The score function has as input the full data set and requires the data to be continuous. This is, unfortunately, not a realistic setting as a mixture of continuous and discrete data is much more common in real-world contexts. There is, however, another version of this score function based on a correlation matrix. Working with a correlation matrix gives the possibility to work with mixed data, for example through ‘Learning causal structure from mixed data with missing values using Gaussian copula models’ by Cui et al. (2018).
The objective of this project is to modify the algorithm of Bhattacharya et al. so that it can have a correlation matrix as input, and combine both mentioned articles into one model. This way, we can apply the algorithm of Bhattacharya et al. to many more real-world data sets, for example in healthcare. If successful, this project can lead to a contribution to a scientific article. For this master's thesis, we are searching for a motivated master's student in Data Science with expertise in mathematics and programming.
Contact: Mirthe van Diepen or Tom Claassen for more information.