Search CORE

3,654 research outputs found

Interpretable machine learning for genomics

Author: Watson DS
Publication venue
Publication date: 20/10/2021
Field of study

High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines

UCL Discovery

No explanation without inference

Author: Watson DS
Publication venue: Society for the Study of Artificial Intelligence & Simulation of Behaviour
Publication date: 01/04/2021
Field of study

Complex algorithms are increasingly used to automate high-stakes decisions in sensitive areas like healthcare and finance. However, the opacity of such models raises problems of intelligibility and trust. Researchers in interpretable machine learning (iML) have proposed a number of solutions, including local linear approximations, rule lists, and counterfactuals. I argue that all three methods share the same fundamental flaw – namely, a disregard for severe testing. Techniques for quantifying uncertainty and error are central to scientific explanation, yet iML has largely ignored this methodological imperative. I consider examples that illustrate the dangers of such negligence, with an emphasis on issues of scoping and confounding. Drawing on recent work in philosophy of science, I conclude that there can be no explanation – algorithmic or otherwise – without inference. I propose several ways to severely test existing iML methods and evaluate the resulting trade-offs

UCL Discovery

The explanation game: a formal framework for interpretable machine learning

Author: Floridi L
Watson DS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We propose a formal framework for interpretable machine learning. Combining elements from statistical learning, causal interventionism, and decision theory, we design an idealised explanation game in which players collaborate to find the best explanation(s) for a given algorithmic prediction. Through an iterative procedure of questions and answers, the players establish a three-dimensional Pareto frontier that describes the optimal trade-offs between explanatory accuracy, simplicity, and relevance. Multiple rounds are played at different levels of abstraction, allowing the players to explore overlapping causal patterns of variable granularity and scope. We characterise the conditions under which such a game is almost surely guaranteed to converge on a (conditionally) optimal explanation surface in polynomial time, and highlight obstacles that will tend to prevent the players from advancing beyond certain explanatory thresholds. The game serves a descriptive and a normative function, establishing a conceptual space in which to analyse and compare existing proposals, as well as design new and improved solutions

PhilPapers

UCL Discovery

Oxford University Research Archive

Queen Mary Research Online

Causal Discovery Under a Confounder Blanket

Author: Silva R
Watson DS
Publication venue: PMLR 180
Publication date: 05/08/2022
Field of study

Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgraph of variables known to descend from some (possibly large) set of confounding covariates, i.e. a confounder blanket. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing background information. Under a structural assumption called the confounder blanket principle, which we argue is essential for tractable causal discovery in high dimensions, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We present a structure learning algorithm that is provably sound and complete with respect to a so-called lazy oracle. We design inference procedures with finite sample error control for linear and nonlinear systems, and demonstrate our approach on a range of simulated and real-world datasets. An accompanying R package, cbl, is available from CRAN

UCL Discovery

The paradox of poor representation: How voter–party incongruence curbs affective polarisation

Author: Marchal N
Watson DS
Publication venue: 'Academy of Traumatology'
Publication date: 06/10/2021
Field of study

Research on the relationship between ideology and affective polarisation highlights ideological disagreement as a key driver of animosity between partisan groups. By operationalising disagreement on the left–right dimension, however, existing studies often overlook voter–party incongruence as a potential determinant of affective evaluations. How does incongruence on policy issues impact affective evaluations of mainstream political parties and their leaders? We tackle this question by analysing data from the British Election Study collected ahead of the 2019 UK General Election using an instrumental variable approach. Consistent with our expectations, we find that voter–party incongruence has a significant causal impact on affective evaluations. Perceived representational gaps between party and voter drive negative evaluations of the in-party and positive evaluations of the opposition, thus lowering affective polarisation overall. The results offer a more nuanced perspective on the role of ideological conflict in driving affective polarisation

UCL Discovery

Between whores and heroes: Women, voyeurism and ambiguity in Holocaust Film

Author: Brown AG
Waterhouse-Watson DS
Publication venue: University of Western Australia
Publication date: 01/01/2014
Field of study

Deakin Research Online

The other side of a slap in the face: Judgement and the ambiguities of violence in holocaust testimony

Author: Brown AG
Waterhouse-Watson DS
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2014
Field of study

Deakin Research Online

Status of HPV vaccine introduction and barriers to country uptake.

Author: Gallagher KE
LaMontagne DS
Watson-Jones D
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

During the last 12 years, over 80 countries have introduced national HPV vaccination programs. The majority of these countries are high or upper-middle income countries. The barriers to HPV vaccine introduction remain greatest in those countries with the highest burden of cervical cancer and the most need for vaccination. Innovation and global leadership is required to increase and sustain introductions in low income and lower-middle income countries

Crossref

LSHTM Research Online

Operationalizing Complex Causes: A Pragmatic View of Mediation

Author: Gultchin L
Kusner MJ
Silva R
Watson DS
Publication venue: MLResearchPress
Publication date: 10/06/2021
Field of study

We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In this work, we formalize this problem and provide an initial solution. Given a collection of candidate mediators, we propose (a) a two-step method for predicting the causal responses of crude interventions; and (b) a testing procedure to identify mediators of crude interventions. We demonstrate, on a range of simulated and real-world-inspired examples, that our approach allows us to efficiently estimate the effect of crude interventions with limited data from new treatment regimes

arXiv.org e-Print Archive

UCL Discovery

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci.

Author: Barnes MR
Cabrera CP
John CR
Munroe PB
Nicholls HL
Watson DS
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact

UCL Discovery

Oxford University Research Archive

Queen Mary Research Online