2,861 research outputs found
Stochastic Variational Inference
We develop stochastic variational inference, a scalable algorithm for
approximating posterior distributions. We develop this technique for a large
class of probabilistic models and we demonstrate it with two probabilistic
topic models, latent Dirichlet allocation and the hierarchical Dirichlet
process topic model. Using stochastic variational inference, we analyze several
large collections of documents: 300K articles from Nature, 1.8M articles from
The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can
easily handle data sets of this size and outperforms traditional variational
inference, which can only handle a smaller subset. (We also show that the
Bayesian nonparametric topic model outperforms its parametric counterpart.)
Stochastic variational inference lets us apply complex Bayesian models to
massive data sets
Individualized Treatment Effects with Censored Data via Fully Nonparametric Bayesian Accelerated Failure Time Models
Individuals often respond differently to identical treatments, and
characterizing such variability in treatment response is an important aim in
the practice of personalized medicine. In this article, we describe a
non-parametric accelerated failure time model that can be used to analyze
heterogeneous treatment effects (HTE) when patient outcomes are time-to-event.
By utilizing Bayesian additive regression trees and a mean-constrained
Dirichlet process mixture model, our approach offers a flexible model for the
regression function while placing few restrictions on the baseline hazard. Our
non-parametric method leads to natural estimates of individual treatment effect
and has the flexibility to address many major goals of HTE assessment.
Moreover, our method requires little user input in terms of tuning parameter
selection or subgroup specification. We illustrate the merits of our proposed
approach with a detailed analysis of two large clinical trials for the
prevention and treatment of congestive heart failure using an
angiotensin-converting enzyme inhibitor. The analysis revealed considerable
evidence for the presence of HTE in both trials as demonstrated by substantial
estimated variation in treatment effect and by high proportions of patients
exhibiting strong evidence of having treatment effects which differ from the
overall treatment effect
Parsimonious Topic Models with Salient Word Discovery
We propose a parsimonious topic model for text corpora. In related models
such as Latent Dirichlet Allocation (LDA), all words are modeled
topic-specifically, even though many words occur with similar frequencies
across different topics. Our modeling determines salient words for each topic,
which have topic-specific probabilities, with the rest explained by a universal
shared model. Further, in LDA all topics are in principle present in every
document. By contrast our model gives sparse topic representation, determining
the (small) subset of relevant topics for each document. We derive a Bayesian
Information Criterion (BIC), balancing model complexity and goodness of fit.
Here, interestingly, we identify an effective sample size and corresponding
penalty specific to each parameter type in our model. We minimize BIC to
jointly determine our entire model -- the topic-specific words,
document-specific topics, all model parameter values, {\it and} the total
number of topics -- in a wholly unsupervised fashion. Results on three text
corpora and an image dataset show that our model achieves higher test set
likelihood and better agreement with ground-truth class labels, compared to LDA
and to a model designed to incorporate sparsity
Stochastic Search with an Observable State Variable
In this paper we study convex stochastic search problems where a noisy
objective function value is observed after a decision is made. There are many
stochastic search problems whose behavior depends on an exogenous state
variable which affects the shape of the objective function. Currently, there is
no general purpose algorithm to solve this class of problems. We use
nonparametric density estimation to take observations from the joint
state-outcome distribution and use them to infer the optimal decision for a
given query state. We propose two solution methods that depend on the problem
characteristics: function-based and gradient-based optimization. We examine two
weighting schemes, kernel-based weights and Dirichlet process-based weights,
for use with the solution methods. The weights and solution methods are tested
on a synthetic multi-product newsvendor problem and the hour-ahead wind
commitment problem. Our results show that in some cases Dirichlet process
weights offer substantial benefits over kernel based weights and more generally
that nonparametric estimation methods provide good solutions to otherwise
intractable problems
Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling
Many applications of interest involve data that can be analyzed as unit
vectors on a d-dimensional sphere. Specific examples include text mining, in
particular clustering of documents, biology, astronomy and medicine among
others. Previous work has proposed a clustering method using mixtures of
Poisson kernel-based distributions (PKBD) on the sphere. We prove
identifiability of mixtures of the aforementioned model, convergence of the
associated EM-type algorithm and study its operational characteristics.
Furthermore, we propose an empirical densities distance plot for estimating the
number of clusters in a PKBD model. Finally, we propose a method to simulate
data from Poisson kernel-based densities and exemplify our methods via
application on real data sets and simulation experiments
Shared kernel Bayesian screening
This article concerns testing for equality of distribution between groups. We
focus on screening variables with shared distributional features such as common
support, modes and patterns of skewness. We propose a Bayesian testing method
using kernel mixtures, which improves performance by borrowing information
across the different variables and groups through shared kernels and a common
probability of group differences. The inclusion of shared kernels in a finite
mixture, with Dirichlet priors on the weights, leads to a simple framework for
testing that scales well for high-dimensional data. We provide closed
asymptotic forms for the posterior probability of equivalence in two groups and
prove consistency under model misspecification. The method is applied to DNA
methylation array data from a breast cancer study, and compares favorably to
competitors when type I error is estimated via permutation.Comment: Author version of article published in Biometrika; 23 pages, 9
figure
MCMC Inference for a Model with Sampling Bias: An Illustration using SAGE data
This paper explores Bayesian inference for a biased sampling model in
situations where the population of interest cannot be sampled directly, but
rather through an indirect and inherently biased method. Observations are
viewed as being the result of a multinomial sampling process from a tagged
population which is, in turn, a biased sample from the original population of
interest. This paper presents several Gibbs Sampling techniques to estimate the
joint posterior distribution of the original population based on the observed
counts of the tagged population. These algorithms efficiently sample from the
joint posterior distribution of a very large multinomial parameter vector.
Samples from this method can be used to generate both joint and marginal
posterior inferences. We also present an iterative optimization procedure based
upon the conditional distributions of the Gibbs Sampler which directly computes
the mode of the posterior distribution. To illustrate our approach, we apply it
to a tagged population of messanger RNAs (mRNA) generated using a common
high-throughput technique, Serial Analysis of Gene Expression (SAGE).
Inferences for the mRNA expression levels in the yeast Saccharomyces cerevisiae
are reported
Test for the statistical significance of a treatment effect in the presence of hidden sub-populations
For testing the statistical significance of a treatment effect, we usually
compare between two parts of a population, one is exposed to the treatment, and
the other is not exposed to it. Standard parametric and nonparametric
two-sample tests are often used for this comparison. But direct applications of
these tests can yield misleading results, especially when the population has
some hidden sub-populations, and the impact of this sub-population difference
on the study variables dominates the treatment effect. This problem becomes
more evident if these subpopulations have widely different proportions of
representatives in the samples taken from these two parts, which are often
referred to as the treatment group and the control group. In this article, we
make an attempt to overcome this problem. Our propose methods use suitable
clustering algorithms to find the hidden sub-populations and then eliminate the
sub-population effect by using suitable transformations. Standard two-sample
tests, when they are applied on the transformed data, yield better results.
Some simulated and real data sets are analyzed to show the utility of the
proposed methods.Comment: This paper has been presented at the 'Contemporary Issues and
Applications of Statistics' conference held at Indian Statistical Institute,
Kolkat
DGEclust: differential expression analysis of clustered count data
Most published studies on the statistical analysis of count data generated by
next-generation sequencing technologies have paid surprisingly little attention
on cluster analysis. We present a statistical methodology (DGEclust) for
clustering digital expression data, which (contrary to alternative methods)
simultaneously addresses the problem of model selection (i.e. how many clusters
are supported by the data) and uncertainty in parameter estimation. We show how
this methodology can be utilised in differential expression analysis and we
demonstrate its applicability on a more general class of problems and higher
accuracy, when compared to popular alternatives. DGEclust is freely available
at https://bitbucket.org/DimitrisVavoulis/dgeclustComment: 26 pages, 7 figure
Binomial and Multinomial Proportions: Accurate Estimation and Reliable Assessment of Accuracy
Misestimates of , the \emph{uncertainty} in from a
2-state Bayes equation used for binary classification, apparently arose from
, the uncertainty in underlying pdfs estimated from
experimental -bin histograms. To address this, several Bayesian estimator
pairs were compared for agreement between
nominal confidence level () and calculated coverage values (). Large
-to- inconsistency for large and arises for
all multinomial estimators since priors downweight low likelihood, high
values. To improve -to- matching, was minimized against
in a more general prior pdf
() to obtain
. This improved matching for , but for
, -to- matching by required
an effective value "" and renormalization, and this reduced
-to- matching. Better -to- matching came from
the original multinomial estimators, a new discrete-domain estimator
, or an earlier \emph{joint} estimator,
that co-adjusted all estimates for James-Stein shrinkage to a mean
vector. Best simultaneous -to- and -to- matching came
by \emph{de-noising} initial estimates of underlying pdfs. For ,
, de-noised needed fewer observations to
achieve -to- matching equivalent to that found for
, or the original multinomial
. De-noising each different type of initial estimate yielded
similarly high accuracy in Monte-Carlo tests.Comment: 61 pages, 24 figures; Small changes occurred (Figs 13-18, A1 & A2,
Tables 1, S1) after fixing a slight bug in the the source code. For
comparison, version (N-1) prior to fixing the bug is at:
http://www.researchgate.net/profile/Jonathan_Friedma
- β¦