7,647 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
The Bayesian Analysis of Complex, High-Dimensional Models: Can It Be CODA?
We consider the Bayesian analysis of a few complex, high-dimensional models
and show that intuitive priors, which are not tailored to the fine details of
the model and the estimated parameters, produce estimators which perform poorly
in situations in which good, simple frequentist estimators exist. The models we
consider are: stratified sampling, the partial linear model, linear and
quadratic functionals of white noise and estimation with stopping times. We
present a strong version of Doob's consistency theorem which demonstrates that
the existence of a uniformly -consistent estimator ensures that the
Bayes posterior is -consistent for values of the parameter in subsets
of prior probability 1. We also demonstrate that it is, at least, in principle,
possible to construct Bayes priors giving both global and local minimax rates,
using a suitable combination of loss functions. We argue that there is no
contradiction in these apparently conflicting findings.Comment: Published in at http://dx.doi.org/10.1214/14-STS483 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Simple Approach to Maximum Intractable Likelihood Estimation
Approximate Bayesian Computation (ABC) can be viewed as an analytic
approximation of an intractable likelihood coupled with an elementary
simulation step. Such a view, combined with a suitable instrumental prior
distribution permits maximum-likelihood (or maximum-a-posteriori) inference to
be conducted, approximately, using essentially the same techniques. An
elementary approach to this problem which simply obtains a nonparametric
approximation of the likelihood surface which is then used as a smooth proxy
for the likelihood in a subsequent maximisation step is developed here and the
convergence of this class of algorithms is characterised theoretically. The use
of non-sufficient summary statistics in this context is considered. Applying
the proposed method to four problems demonstrates good performance. The
proposed approach provides an alternative for approximating the maximum
likelihood estimator (MLE) in complex scenarios
Bayesian Subset Simulation: a kriging-based subset simulation algorithm for the estimation of small probabilities of failure
The estimation of small probabilities of failure from computer simulations is
a classical problem in engineering, and the Subset Simulation algorithm
proposed by Au & Beck (Prob. Eng. Mech., 2001) has become one of the most
popular method to solve it. Subset simulation has been shown to provide
significant savings in the number of simulations to achieve a given accuracy of
estimation, with respect to many other Monte Carlo approaches. The number of
simulations remains still quite high however, and this method can be
impractical for applications where an expensive-to-evaluate computer model is
involved. We propose a new algorithm, called Bayesian Subset Simulation, that
takes the best from the Subset Simulation algorithm and from sequential
Bayesian methods based on kriging (also known as Gaussian process modeling).
The performance of this new algorithm is illustrated using a test case from the
literature. We are able to report promising results. In addition, we provide a
numerical study of the statistical properties of the estimator.Comment: 11th International Probabilistic Assessment and Management Conference
(PSAM11) and The Annual European Safety and Reliability Conference (ESREL
2012), Helsinki : Finland (2012
Probabilistic Motion Estimation Based on Temporal Coherence
We develop a theory for the temporal integration of visual motion motivated
by psychophysical experiments. The theory proposes that input data are
temporally grouped and used to predict and estimate the motion flows in the
image sequence. This temporal grouping can be considered a generalization of
the data association techniques used by engineers to study motion sequences.
Our temporal-grouping theory is expressed in terms of the Bayesian
generalization of standard Kalman filtering. To implement the theory we derive
a parallel network which shares some properties of cortical networks. Computer
simulations of this network demonstrate that our theory qualitatively accounts
for psychophysical experiments on motion occlusion and motion outliers.Comment: 40 pages, 7 figure
- …