10 research outputs found

    Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines

    Full text link
    The dearth of prescribing guidelines for physicians is one key driver of the current opioid epidemic in the United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics of patients who are more prone to adverse outcomes after an initial synthetic opioid prescription. Toward this end, we propose a generative model that allows discovery from observational data of subgroups that demonstrate an enhanced or diminished causal effect due to treatment. Our approach models these sub-populations as a mixture distribution, using sparsity to enhance interpretability, while jointly learning nonlinear predictors of the potential outcomes to better adjust for confounding. The approach leads to human-interpretable insights on discovered subgroups, improving the practical utility for decision suppor

    Learning the Structure for Structured Sparsity

    Get PDF
    Structured sparsity has recently emerged in statistics, machine learning and signal processing as a promising paradigm for learning in high-dimensional settings. All existing methods for learning under the assumption of structured sparsity rely on prior knowledge on how to weight (or how to penalize) individual subsets of variables during the subset selection process, which is not available in general. Inferring group weights from data is a key open research problem in structured sparsity.In this paper, we propose a Bayesian approach to the problem of group weight learning. We model the group weights as hyperparameters of heavy-tailed priors on groups of variables and derive an approximate inference scheme to infer these hyperparameters. We empirically show that we are able to recover the model hyperparameters when the data are generated from the model, and we demonstrate the utility of learning weights in synthetic and real denoising problems

    Empowering differential networks using Bayesian analysis

    Get PDF
    Differential networks (DN) are important tools for modeling the changes in conditional dependencies between multiple samples. A Bayesian approach for estimating DNs, from the classical viewpoint, is introduced with a computationally efficient threshold selection for graphical model determination. The algorithm separately estimates the precision matrices of the DN using the Bayesian adaptive graphical lasso procedure. Synthetic experiments illustrate that the Bayesian DN performs exceptionally well in numerical accuracy and graphical structure determination in comparison to state of the art methods. The proposed method is applied to South African COVID-19 data to investigate the change in DN structure between various phases of the pandemic.DATA AVAILABILITY STATEMENT : The data underlying the results presented in the study are available from https://archive.ics.uci.edu/ml/datasets/ spambase for the spambase dataset. The corresponding COVID-19 data are available from https://www.nicd.ac.za/diseases-a-z-index/diseaseindex-covid-19/surveillance-reports/ and https:// ourworldindata.org/coronavirus/country/southafrica.SUPPORTING INFORMATION : S1 File. Supplementary material. Contains a block Gibbs sampler, as well as, additional optimal threshold; adjacency heatmaps and graphical network figures for dimensions p = 30 and p = 100. https://doi.org/10.1371/journal.pone.0261193.s001The National Research Foundation (NRF) of South Africa.http://www.plosone.orgdm2022Statistic

    Probabilistic Structured Models for Plant Trait Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. March 2017. Major: Communication Sciences and Disorders. Advisor: Arindam Banerjee. 1 computer file (PDF); xii, 171 pages.Many fields in modern science and engineering such as ecology, computational biology, astronomy, signal processing, climate science, brain imaging, natural language processing, and many more involve collecting data sets in which the dimensionality of the data p exceeds the sample size n. Since it is usually impossible to obtain consistent procedures unless p < n, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse structured graphical models, low-rank matrices, and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. Of particular interest are structure learning of graphical models in high dimensional setting. The majority of statistical analysis of graphical model estimations assume that all the data are fully observed and the data points are sampled from the same distribution and provide the sample complexity and convergence rate by considering only one graphical structure for all the observations. In this thesis, we extend the above results to estimate the structure of graphical models where the data is partially observed or the data is sampled from multiple distributions. First, we consider the problem of estimating change in the dependency structure of two p-dimensional models, based on samples drawn from two graphical models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual graphical models. Next, we consider the problem of estimating sparse structure of Gaussian copula distributions (corresponding to non-paranormal distributions) using samples with missing values. We prove that our proposed estimators consistently estimate the non-paranormal correlation matrix where the convergence rate depends on the probability of missing values. In the second part of thesis, we consider matrix completion problem. Low-rank matrix completion methods have been successful in a variety of settings such as recommendation systems. However, most of the existing matrix completion methods only provide a point estimate of missing entries, and do not characterize uncertainties of the predictions. First, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization, when marginalized over one latent factor has the Matrix Generalized Inverse Gaussian (MGIG) distribution. We show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation equation. The characterization leads to a novel Collapsed Monte Carlo inference algorithm for such latent factor models. Next, we propose a Bayesian hierarchical probabilistic matrix factorization (BHPMF) model to 1) incorporate hierarchical side information, and 2) provide uncertainty quantified predictions. The former yields significant performance improvements in the problem of plant trait prediction, a key problem in ecology, by leveraging the taxonomic hierarchy in the plant kingdom. The latter is helpful in identifying predictions of low confidence which can in turn be used to guide field work for data collection efforts. Finally, we consider applications of probabilistic structured models to plant trait analysis. We apply BHPMF model to fill the gaps in TRY database. The BHPMF model is the-state-of-the-art model for plant trait prediction and is getting increasing visibility and usage in the plant trait analysis. We have submitted a R package for BHPMF to CRAN. Next, we apply the Gaussian graphical model structure estimators to obtain the trait-trait interactions. We study the trait-trait interactions structure at different climate zones and among different plant growth forms and uncover the dependence of traits on climate and on vegetation
    corecore