7,182 research outputs found
Revisiting the Gelman-Rubin Diagnostic
Gelman and Rubin's (1992) convergence diagnostic is one of the most popular
methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the
seminal paper, researchers have developed sophisticated methods for estimating
variance of Monte Carlo averages. We show that these estimators find immediate
use in the Gelman-Rubin statistic, a connection not previously established in
the literature. We incorporate these estimators to upgrade both the univariate
and multivariate Gelman-Rubin statistics, leading to improved stability in MCMC
termination time. An immediate advantage is that our new Gelman-Rubin statistic
can be calculated for a single chain. In addition, we establish a one-to-one
relationship between the Gelman-Rubin statistic and effective sample size.
Leveraging this relationship, we develop a principled termination criterion for
the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved
diagnostic via examples
Deep unsupervised clustering with Gaussian mixture variational autoencoders
We study a variant of the variational autoencoder model with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the standard variational approach in these models is unsuited for unsupervised clustering, and mitigate this problem by leveraging a principled information-theoretic regularisation term known as consistency violation. Adding this term to the standard variational optimisation objective yields networks with both meaningful internal representations and well-defined clusters. We demonstrate the performance of this scheme on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving higher performance on unsupervised clustering classification than previous approaches
Kernel estimators of asymptotic variance for adaptive Markov chain Monte Carlo
We study the asymptotic behavior of kernel estimators of asymptotic variances
(or long-run variances) for a class of adaptive Markov chains. The convergence
is studied both in and almost surely. The results also apply to Markov
chains and improve on the existing literature by imposing weaker conditions. We
illustrate the results with applications to the
Markov model and to an adaptive MCMC algorithm for Bayesian logistic
regression.Comment: Published in at http://dx.doi.org/10.1214/10-AOS828 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Multiorder neurons for evolutionary higher-order clustering and growth
This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the correlation of clusters found with ground truth information is used in measuring clustering accuracy, the proposed evolutionary multiorder neurons method can be shown to outperform other related clustering methods. The simulation results from the Iris, Wine, and Glass data sets show significant improvement when compared to the results obtained using self-organizing maps and higher-order neurons. The letter also proposes an intuitive model by which multiorder neurons can be grown, thereby determining the number of clusters in data
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability
We study a statistical method to estimate the optimal value, and the
optimality gap of a given solution for stochastic optimization as an assessment
of the solution quality. Our approach is based on bootstrap aggregating, or
bagging, resampled sample average approximation (SAA). We show how this
approach leads to valid statistical confidence bounds for non-smooth
optimization. We also demonstrate its statistical efficiency and stability that
are especially desirable in limited-data situations, and compare these
properties with some existing methods. We present our theory that views SAA as
a kernel in an infinite-order symmetric statistic, which can be approximated
via bagging. We substantiate our theoretical findings with numerical results
- …