2,861 research outputs found

    Stochastic Variational Inference

    Full text link
    We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets

    Individualized Treatment Effects with Censored Data via Fully Nonparametric Bayesian Accelerated Failure Time Models

    Full text link
    Individuals often respond differently to identical treatments, and characterizing such variability in treatment response is an important aim in the practice of personalized medicine. In this article, we describe a non-parametric accelerated failure time model that can be used to analyze heterogeneous treatment effects (HTE) when patient outcomes are time-to-event. By utilizing Bayesian additive regression trees and a mean-constrained Dirichlet process mixture model, our approach offers a flexible model for the regression function while placing few restrictions on the baseline hazard. Our non-parametric method leads to natural estimates of individual treatment effect and has the flexibility to address many major goals of HTE assessment. Moreover, our method requires little user input in terms of tuning parameter selection or subgroup specification. We illustrate the merits of our proposed approach with a detailed analysis of two large clinical trials for the prevention and treatment of congestive heart failure using an angiotensin-converting enzyme inhibitor. The analysis revealed considerable evidence for the presence of HTE in both trials as demonstrated by substantial estimated variation in treatment effect and by high proportions of patients exhibiting strong evidence of having treatment effects which differ from the overall treatment effect

    Parsimonious Topic Models with Salient Word Discovery

    Full text link
    We propose a parsimonious topic model for text corpora. In related models such as Latent Dirichlet Allocation (LDA), all words are modeled topic-specifically, even though many words occur with similar frequencies across different topics. Our modeling determines salient words for each topic, which have topic-specific probabilities, with the rest explained by a universal shared model. Further, in LDA all topics are in principle present in every document. By contrast our model gives sparse topic representation, determining the (small) subset of relevant topics for each document. We derive a Bayesian Information Criterion (BIC), balancing model complexity and goodness of fit. Here, interestingly, we identify an effective sample size and corresponding penalty specific to each parameter type in our model. We minimize BIC to jointly determine our entire model -- the topic-specific words, document-specific topics, all model parameter values, {\it and} the total number of topics -- in a wholly unsupervised fashion. Results on three text corpora and an image dataset show that our model achieves higher test set likelihood and better agreement with ground-truth class labels, compared to LDA and to a model designed to incorporate sparsity

    Stochastic Search with an Observable State Variable

    Full text link
    In this paper we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems

    Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling

    Full text link
    Many applications of interest involve data that can be analyzed as unit vectors on a d-dimensional sphere. Specific examples include text mining, in particular clustering of documents, biology, astronomy and medicine among others. Previous work has proposed a clustering method using mixtures of Poisson kernel-based distributions (PKBD) on the sphere. We prove identifiability of mixtures of the aforementioned model, convergence of the associated EM-type algorithm and study its operational characteristics. Furthermore, we propose an empirical densities distance plot for estimating the number of clusters in a PKBD model. Finally, we propose a method to simulate data from Poisson kernel-based densities and exemplify our methods via application on real data sets and simulation experiments

    Shared kernel Bayesian screening

    Full text link
    This article concerns testing for equality of distribution between groups. We focus on screening variables with shared distributional features such as common support, modes and patterns of skewness. We propose a Bayesian testing method using kernel mixtures, which improves performance by borrowing information across the different variables and groups through shared kernels and a common probability of group differences. The inclusion of shared kernels in a finite mixture, with Dirichlet priors on the weights, leads to a simple framework for testing that scales well for high-dimensional data. We provide closed asymptotic forms for the posterior probability of equivalence in two groups and prove consistency under model misspecification. The method is applied to DNA methylation array data from a breast cancer study, and compares favorably to competitors when type I error is estimated via permutation.Comment: Author version of article published in Biometrika; 23 pages, 9 figure

    MCMC Inference for a Model with Sampling Bias: An Illustration using SAGE data

    Full text link
    This paper explores Bayesian inference for a biased sampling model in situations where the population of interest cannot be sampled directly, but rather through an indirect and inherently biased method. Observations are viewed as being the result of a multinomial sampling process from a tagged population which is, in turn, a biased sample from the original population of interest. This paper presents several Gibbs Sampling techniques to estimate the joint posterior distribution of the original population based on the observed counts of the tagged population. These algorithms efficiently sample from the joint posterior distribution of a very large multinomial parameter vector. Samples from this method can be used to generate both joint and marginal posterior inferences. We also present an iterative optimization procedure based upon the conditional distributions of the Gibbs Sampler which directly computes the mode of the posterior distribution. To illustrate our approach, we apply it to a tagged population of messanger RNAs (mRNA) generated using a common high-throughput technique, Serial Analysis of Gene Expression (SAGE). Inferences for the mRNA expression levels in the yeast Saccharomyces cerevisiae are reported

    Test for the statistical significance of a treatment effect in the presence of hidden sub-populations

    Full text link
    For testing the statistical significance of a treatment effect, we usually compare between two parts of a population, one is exposed to the treatment, and the other is not exposed to it. Standard parametric and nonparametric two-sample tests are often used for this comparison. But direct applications of these tests can yield misleading results, especially when the population has some hidden sub-populations, and the impact of this sub-population difference on the study variables dominates the treatment effect. This problem becomes more evident if these subpopulations have widely different proportions of representatives in the samples taken from these two parts, which are often referred to as the treatment group and the control group. In this article, we make an attempt to overcome this problem. Our propose methods use suitable clustering algorithms to find the hidden sub-populations and then eliminate the sub-population effect by using suitable transformations. Standard two-sample tests, when they are applied on the transformed data, yield better results. Some simulated and real data sets are analyzed to show the utility of the proposed methods.Comment: This paper has been presented at the 'Contemporary Issues and Applications of Statistics' conference held at Indian Statistical Institute, Kolkat

    DGEclust: differential expression analysis of clustered count data

    Full text link
    Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported by the data) and uncertainty in parameter estimation. We show how this methodology can be utilised in differential expression analysis and we demonstrate its applicability on a more general class of problems and higher accuracy, when compared to popular alternatives. DGEclust is freely available at https://bitbucket.org/DimitrisVavoulis/dgeclustComment: 26 pages, 7 figure

    Binomial and Multinomial Proportions: Accurate Estimation and Reliable Assessment of Accuracy

    Full text link
    Misestimates of ΟƒPo\sigma_{P_o}, the \emph{uncertainty} in PoP_o from a 2-state Bayes equation used for binary classification, apparently arose from Οƒ^pi\hat{\sigma}_{p_i}, the uncertainty in underlying pdfs estimated from experimental bb-bin histograms. To address this, several Bayesian estimator pairs (p^i,Οƒ^pi)(\hat{p}_i, \hat{\sigma}_{p_i}) were compared for agreement between nominal confidence level (ΞΎ\xi) and calculated coverage values (CC). Large ΞΎ\xi-to-CC inconsistency for large bb and pi≫1b p_i \gg \frac{1}{b} arises for all multinomial estimators since priors downweight low likelihood, high pip_i values. To improve ΞΎ\xi-to-CC matching, (ΞΎβˆ’C)2(\xi-C)^2 was minimized against Ξ±0\alpha_0 in a more general prior pdf (B[Ξ±0,(bβˆ’1)Ξ±0;x]\mathcal{B}[\alpha_0,(b-1)\alpha_0;x]) to obtain (pi^)ξ↔C(\hat{p_i})_{\xi\leftrightarrow C}. This improved matching for b=2b=2, but for b>2b>2, ΞΎ\xi-to-CC matching by (pi^)ξ↔C(\hat{p_i})_{\xi\leftrightarrow C} required an effective value "b=2b=2" and renormalization, and this reduced p^i\hat{p}_i-to-pip_i matching. Better p^i\hat{p}_i-to-pip_i matching came from the original multinomial estimators, a new discrete-domain estimator p^(ni,N)\hat{p}(n_i,N), or an earlier \emph{joint} estimator, (pi^)β‹ˆ(\hat{p_i})_{\bowtie} that co-adjusted all estimates pip_i for James-Stein shrinkage to a mean vector. Best simultaneous ΞΎ\xi-to-CC and p^i\hat{p}_i-to-pip_i matching came by \emph{de-noising} initial estimates of underlying pdfs. For b=100b=100, N<12800N<12800, de-noised p^\hat{p} needed β‰ˆ10Γ—\approx 10\times fewer observations to achieve p^i\hat{p}_i-to-pip_i matching equivalent to that found for p^(ni,N)\hat{p}(n_i,N), (pi^)β‹ˆ(\hat{p_i})_{\bowtie} or the original multinomial p^i\hat{p}_i. De-noising each different type of initial estimate yielded similarly high accuracy in Monte-Carlo tests.Comment: 61 pages, 24 figures; Small changes occurred (Figs 13-18, A1 & A2, Tables 1, S1) after fixing a slight bug in the the source code. For comparison, version (N-1) prior to fixing the bug is at: http://www.researchgate.net/profile/Jonathan_Friedma
    • …
    corecore