4,742 research outputs found
Tailoring density estimation via reproducing kernel moment matching
Moment matching is a popular means of parametric density estimation. We extend this technique to nonparametric estimation of mixture models. Our approach works by embedding distributions into a reproducing kernel Hilbert space, and performing moment matching in that space. This allows us to tailor density estimators to a function class of interest (i.e., for which we would like to compute expectations). We show our density estimation approach is useful in applications such as message compression in graphical models, and image classification and retrieval
Global and Local Two-Sample Tests via Regression
Two-sample testing is a fundamental problem in statistics. Despite its long
history, there has been renewed interest in this problem with the advent of
high-dimensional and complex data. Specifically, in the machine learning
literature, there have been recent methodological developments such as
classification accuracy tests. The goal of this work is to present a regression
approach to comparing multivariate distributions of complex data. Depending on
the chosen regression model, our framework can efficiently handle different
types of variables and various structures in the data, with competitive power
under many practical scenarios. Whereas previous work has been largely limited
to global tests which conceal much of the local information, our approach
naturally leads to a local two-sample testing framework in which we identify
local differences between multivariate distributions with statistical
confidence. We demonstrate the efficacy of our approach both theoretically and
empirically, under some well-known parametric and nonparametric regression
methods. Our proposed methods are applied to simulated data as well as a
challenging astronomy data set to assess their practical usefulness
Optimal Bayes Classifiers for Functional Data and Density Ratios
Bayes classifiers for functional data pose a challenge. This is because
probability density functions do not exist for functional data. As a
consequence, the classical Bayes classifier using density quotients needs to be
modified. We propose to use density ratios of projections on a sequence of
eigenfunctions that are common to the groups to be classified. The density
ratios can then be factored into density ratios of individual functional
principal components whence the classification problem is reduced to a sequence
of nonparametric one-dimensional density estimates. This is an extension to
functional data of some of the very earliest nonparametric Bayes classifiers
that were based on simple density ratios in the one-dimensional case. By means
of the factorization of the density quotients the curse of dimensionality that
would otherwise severely affect Bayes classifiers for functional data can be
avoided. We demonstrate that in the case of Gaussian functional data, the
proposed functional Bayes classifier reduces to a functional version of the
classical quadratic discriminant. A study of the asymptotic behavior of the
proposed classifiers in the large sample limit shows that under certain
conditions the misclassification rate converges to zero, a phenomenon that has
been referred to as "perfect classification". The proposed classifiers also
perform favorably in finite sample applications, as we demonstrate in
comparisons with other functional classifiers in simulations and various data
applications, including wine spectral data, functional magnetic resonance
imaging (fMRI) data for attention deficit hyperactivity disorder (ADHD)
patients, and yeast gene expression data
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
- …