108 research outputs found

    Stable Prediction with Model Misspecification and Agnostic Distribution Shift

    Full text link
    For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift

    Interpretable machine learning for genomics

    Get PDF
    High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines

    Human Rights Treaty Commitment and Compliance: A Machine Learning-based Causal Inference Approach

    Get PDF
    Why do states ratify international human rights treaties? How much do human rights treaties influence state behaviors directly and indirectly? Why are some human rights treaty monitoring procedures more effective than others? What are the most predictively and causally important factors that can reduce and prevent state repression and human rights violations? This dissertation provide answers to these keys causal questions in political science research, using a novel approach that combines machine learning and the structural causal model framework. The four research questions are arranged in a chronological order that refects the causal process relating to international human rights treaties, going from (a) the causal determinants of treaty ratification to (b) the causal mechanisms of human rights treaties to (c) the causal effects of human rights treaty monitoring procedures to (d) other factors that causally influence human rights violations. Chapter 1 identifies the research traditions within which this dissertation is located, offers an overview of the methodological advances that enable this research, specifies the research questions, and previews the findings. Chapters 2, 3, 4, and 5 present in chronological order four empirical studies that answer these four research questions. Finally, Chapter 6 summarizes the substantive findings, suggests some other research questions that could be similarly investigated, and recaps the methodological approach and the contributions of the dissertation
    • …
    corecore