13,149 research outputs found

    Robust Conditional Independence maps of single-voxel Magnetic Resonance Spectra to elucidate associations between brain tumours and metabolites.

    Get PDF
    The aim of the paper is two-fold. First, we show that structure finding with the PC algorithm can be inherently unstable and requires further operational constraints in order to consistently obtain models that are faithful to the data. We propose a methodology to stabilise the structure finding process, minimising both false positive and false negative error rates. This is demonstrated with synthetic data. Second, to apply the proposed structure finding methodology to a data set comprising single-voxel Magnetic Resonance Spectra of normal brain and three classes of brain tumours, to elucidate the associations between brain tumour types and a range of observed metabolites that are known to be relevant for their characterisation. The data set is bootstrapped in order to maximise the robustness of feature selection for nominated target variables. Specifically, Conditional Independence maps (CI-maps) built from the data and their derived Bayesian networks have been used. A Directed Acyclic Graph (DAG) is built from CI-maps, being a major challenge the minimization of errors in the graph structure. This work presents empirical evidence on how to reduce false positive errors via the False Discovery Rate, and how to identify appropriate parameter settings to improve the False Negative Reduction. In addition, several node ordering policies are investigated that transform the graph into a DAG. The obtained results show that ordering nodes by strength of mutual information can recover a representative DAG in a reasonable time, although a more accurate graph can be recovered using a random order of samples at the expense of increasing the computation time

    Causal Feature Selection in Neuroscience

    Get PDF
    Causal inference, at times correct and at times false, is fundamentally intertwined with the human nature. Humans tend to approach and explain the systems in the world and every day life via causal reasoning and causal statements, by unconsciously trying to recover the causal graph that underlies their observations. Nevertheless, causal reasoning based on observations of the real world is seldom equitable and precise. Particularly when the method that one uses is based on plain correlations, causal statements can be far from causal, first, because of the implicit assumption about linear relationships, and second, due to the major problem of hidden confounding. One of the most complex and difficult systems for an applied scientist to explain is the human brain. The reason for that is threefold. First and foremost, because of the daedal and sophisticated manner that the human brain is constructed. Secondly, because of our limited means of observing its global functionality, which ultimately leads to the problem that no causal sufficiency can be assumed in such a system. In other words, hidden common causes (also termed hidden confounders) in our limited observations will be omnipresent. Finally, the significant heterogeneity that the human brain exhibits in some of its physiological functionalities, across subjects, hinders the problem even further. This, subsequently, justifies the lack of generalization of machine learning methods that try to predict biomarkers through the traditional approach of a non-causal model, across different brains. Hence, someone should be particularly careful with the methods that she or he selects to use and the causal statements that are made, to understand and interpret the brain functionality. In this thesis, we focus on constructing theorems and algorithms for causal inference on real data, trying to understand the relationship between the human brain and motor function. More specifically, we target the problem of the identification of causes of a target variable, without assuming causal sufficiency. We tackle both the cases of non-sequential and of time series data, proving theorems for both cases accordingly. Our methods' applications have an immediate focus on the activity of the human motor cortex at the time it arises, first, naturally, and second, from non-invasive brain stimulation. We build experimental set-ups and conduct electroencephalographic (EEG) and stimulation experiments to study the functionality of the motor cortex across different subjects, during these two different cases, with an ultimate goal to explain the observed heterogeneity in the recorded activity. The work presented in this thesis is both experimental --in its first part-- with non invasive experiments on the human brain, contributing to the better understanding of the motor cortex, and theoretical, with contributions of four theorems in the field of causal inference, and two causal feature selection methods. We first attempt to approach the brain activity from a purely machine learning perspective, analysing the data of the brain activity of 27 healthy subjects during an upper-limb reaching task. We introduce a multi-task regression method to build personalised models that predict movement stability from limited trials. We do so by taking into account information from other subjects as prior and updating -when necessary- the weights of the model with trials from the current subject. Although the original goal of this work was to show the superiority of this prediction method, a side-observation turned out to be the most fundamental key to define the next steps of the hereby presented research. The learnt features by the individual prediction models differed significantly across subjects, and although no causal claim can be made yet -since this is a correlation-based observation- it is the first hint of existing heterogeneity in the activity of the human motor cortex. Such a discrepancy, in frequency and location in the learnt features, could also imply a discrepancy in the response to non-invasive brain stimulation techniques, over the motor cortex. To examine this possibility, a new series of electrophysiological experiments, with application of transcranial alternating current stimulation at 70 Hz over the motor cortex --as this has been considered to facilitate movement-- , is conducted on twenty healthy participants. At this point, having observed a significant variability in the behavioural response, ranging from negative to positive responders, we decided to further investigate the reasons that could explain it. An incremental method with three steps is introduced to narrow down the causal model that can explain the aforementioned discrepancy in responses. With our method, we conclude that the beta oscillatory activity over the motor cortex could play a mediating role between the gamma stimulation and the motor performance, without being able to exclude the case that GABA activity could be a hidden common cause. Having witnessed such a heterogeneity, both during natural movements and under brain stimulation, we stress the importance of taking steps towards personalisation of brain stimulation parameters. We conclude the experimental part of this work by constructing a pipeline, to predict from \textit{resting state} EEG data the behavioural response of each subject to the stimulation treatment. Such a screening could avoid redundant or even harmful stimulation sessions. With two different stimulation studies, recruiting in total 42 healthy participants, we identify a biomarker that could be informative about the response of an individual to the aforementioned motor stimulation. In the theoretical part of this thesis, we focus on the problem of the identification of direct and indirect causes of a target (e.g. motor performance) given a collection of possible candidates (e.g. brain activity in different locations, in different frequencies), allowing at the same time for latent common causes. First, we propose and prove a theorem which introduces sufficient conditions, under assumptions that can naturally be met, to decide for the causal role of a feature, with a single \textit{conditional independence} test, and a single conditioning variable. Given the hardness of statistical testing of conditional independences in large and dense graphs (such as the brain), limiting the necessary tests to one, significantly boosts the statistical strength of the results. Application of our conditions on the aforementioned neurophysiological data supports further the validity of the method. Applying the proposed conditions independently on each individual, without prior knowledge, led to three groups of identified causal features, each one being related in a consistent manner with different quality of movements across subjects. We discuss how such a method could contribute in the selection of personalised brain stimulation parameters. As a final step, we approach the brain signal as continuous time series data. Although time series are observed almost everywhere in nature, yet, causal inference on such data, in the presence of hidden confounders, has been an unsolved problem, with the widely known Granger Causality being the only approach for almost half a century. The final contribution of this thesis, are two theorems with which we introduce both necessary and sufficient conditions for the causal feature selection on time series, under some graph constraints, and a third theorem that relaxes one of the stricter assumptions of the aforementioned two. We demonstrate the validity of our method both on simulated and real data

    Discovering Causal Relations and Equations from Data

    Full text link
    Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.Comment: 137 page
    • …
    corecore