682 research outputs found

    Bayesian Learning in the Counterfactual World

    Get PDF
    Recent years have witnessed a surging interest towards the use of machine learning tools for causal inference. In contrast to the usual large data settings where the primary goal is prediction, many disciplines, such as health, economic and social sciences, are instead interested in causal questions. Learning individualized responses to an intervention is a crucial task in many applied fields (e.g., precision medicine, targeted advertising, precision agriculture, etc.) where the ultimate goal is to design optimal and highly-personalized policies based on individual features. In this work, I thus tackle the problem of estimating causal effects of an intervention that are heterogeneous across a population of interest and depend on an individual set of characteristics (e.g., a patient's clinical record, user's browsing history, etc..) in high-dimensional observational data settings. This is done by utilizing Bayesian Nonparametric or Probabilistic Machine Learning tools that are specifically adjusted for the causal setting and have desirable uncertainty quantification properties, with a focus on the issues of interpretability/explainability and inclusion of domain experts' prior knowledge. I begin by introducing terminology and concepts from causality and causal reasoning in the first chapter. Then I include a literature review of some of the state-of-the-art regression-based methods for heterogeneous treatment effects estimation, with an attempt to build a unifying taxonomy and lay down the finite-sample empirical properties of these models. The chapters forming the core of the dissertation instead present some novel methods addressing existing issues in individualized causal effects estimation: Chapter 3 develops both a Bayesian tree ensemble method and a deep learning architecture to tackle interpretability, uncertainty coverage and targeted regularization; Chapter 4 instead introduces a novel multi-task Deep Kernel Learning method particularly suited for multi-outcome | multi-action scenarios. The last chapter concludes with a discussion

    API design for machine learning software: experiences from the scikit-learn project

    Get PDF
    Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library

    Scalable Machine Learning Methods for Massive Biomedical Data Analysis.

    Full text link
    Modern data acquisition techniques have enabled biomedical researchers to collect and analyze datasets of substantial size and complexity. The massive size of these datasets allows us to comprehensively study the biological system of interest at an unprecedented level of detail, which may lead to the discovery of clinically relevant biomarkers. Nonetheless, the dimensionality of these datasets presents critical computational and statistical challenges, as traditional statistical methods break down when the number of predictors dominates the number of observations, a setting frequently encountered in biomedical data analysis. This difficulty is compounded by the fact that biological data tend to be noisy and often possess complex correlation patterns among the predictors. The central goal of this dissertation is to develop a computationally tractable machine learning framework that allows us to extract scientifically meaningful information from these massive and highly complex biomedical datasets. We motivate the scope of our study by considering two important problems with clinical relevance: (1) uncertainty analysis for biomedical image registration, and (2) psychiatric disease prediction based on functional connectomes, which are high dimensional correlation maps generated from resting state functional MRI.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111354/1/takanori_1.pd

    Multimodal Approaches to Computer Vision Problems

    Get PDF
    The goal of computer vision research is to automatically extract high-level information from images and videos. The vast majority of this research focuses specifically on visible light imagery. In this dissertation, we present approaches to computer vision problems that incorporate data obtained from alternative modalities including thermal infrared imagery, near-infrared imagery, and text. We consider approaches where other modalities are used in place of visible imagery as well as approaches that use other modalities to improve the performance of traditional computer vision algorithms. The bulk of this dissertation focuses on Heterogeneous Face Recognition (HFR). HFR is a variant of face recognition where the probe and gallery face images are obtained with different sensing modalities. We also present a method to incorporate text information into human activity recognition algorithms. We first present a kernel task-driven coupled dictionary model to represent the data across multiple domains for thermal infrared HFR. We extend a linear coupled dictionary model to use the kernel method to process the signals in a high dimensional space; this effectively enables the dictionaries to represent the data non-linearly in the original feature space. We further improve the model by making the dictionaries task-driven. This allows us to tune the dictionaries to perform well on the classification task at hand rather than the standard reconstruction task. We show that our algorithms outperform algorithms based on standard coupled dictionaries on three datasets for thermal infrared to visible face recognition. Next, we present a deep learning-based approach to near-infrared (NIR) HFR. Most approaches to HFR involve modeling the relationship between corresponding images from the visible and sensing domains. Due to data constraints, this is typically done at the patch level and/or with shallow models to prevent overfitting. In this approach, rather than modeling local patches or using a simple model, we use a complex, deep model to learn the relationship between the entirety of cross-modal face images. We describe a deep convolutional neural network-based method that leverages a large visible image face dataset to prevent overfitting. We present experimental results on two benchmark data sets showing its effectiveness. Third, we present a model order selection algorithm for deep neural networks. In recent years, deep learning has emerged as a dominant methodology in machine learning. While it has been shown to produce state-of-the-art results for a variety of applications, one aspect of deep networks that has not been extensively researched is how to determine the optimal network structure. This problem is generally solved by ad hoc methods. In this work we address a sub-problem of this task: determining the breadth (number of nodes) of each layer. We show how to use group-sparsity-inducing regularization to automatically select these hyper-parameters. We demonstrate the proposed method by using it to reduce the size of networks while maintaining performance for our NIR HFR deep-learning algorithm. Additionally, we demonstrate the generality of our algorithm by applying it to image classification tasks. Finally, we present a method to improve activity recognition algorithms through the use of multitask learning and information extracted from a large text corpora. Current state-of-the-art deep learning approaches are limited by the size and scope of the data set they use to train the networks. We present a multitask learning approach to expand the training data set. Specifically, we train the neural networks to recognize objects in addition to activities. This allows us to expand our training set with large, publicly available object recognition data sets and thus use deeper, state-of-the-art network architectures. Additionally, when learning about the target activities, the algorithms are limited to the information contained in the training set. It is virtually impossible to capture all variations of the target activities in a training set. In this work, we extract information about the target activities from a large text corpora. We incorporate this information into the training algorithm by using it to select relevant object recognition classes for the multitask learning approach. We present experimental results on a benchmark activity recognition data set showing the effectiveness of our approach

    Evaluation of cross-validation strategies in sequence-based binding prediction using deep learning

    Get PDF
    Binding prediction between targets and drug-like compounds through deep neural networks has generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database, and (4) splitting based both in the clustering and in the source database. These schemas are applied to a deep learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our deep learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compound clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.Peer ReviewedPostprint (author's final draft

    Meta-Learning for Cancer Phenotype Prediction from Gene Expression Data

    Get PDF
    Deep learning has become an essential element in various applications of technology over the past decades. Deep neural networks are now reaching performance on par with, or even beyond, human-level on a broad range of tasks. However, there are still several concerns and deficiencies that make these models impractical for some real-world applications. One of the important issues comes from a data-efficiency perspective. Most of the deep learning techniques need a large number of training samples in order to achieve a high performance on a given problem. This procedure is far from human general intelligence. Humans are good at learning from a few number of samples and quickly adapting to new tasks. In this work, we leverage the meta-learning framework in which the model can learn novel tasks by developing prior knowledge over past experiences. For this purpose, we propose a Meta-Dataset that contains 174 genomics and clinical tasks. Furthermore, we suggest a meta-model under the few-shot learning regime that can learn new genomics tasks. Finally, a comparison between the performance of the meta-learner and the performance of other classical baselines is also presented
    corecore