377,095 research outputs found
Local Component Analysis
Kernel density estimation, a.k.a. Parzen windows, is a popular density
estimation method, which can be used for outlier detection or clustering. With
multivariate data, its performance is heavily reliant on the metric used within
the kernel. Most earlier work has focused on learning only the bandwidth of the
kernel (i.e., a scalar multiplicative factor). In this paper, we propose to
learn a full Euclidean metric through an expectation-minimization (EM)
procedure, which can be seen as an unsupervised counterpart to neighbourhood
component analysis (NCA). In order to avoid overfitting with a fully
nonparametric density estimator in high dimensions, we also consider a
semi-parametric Gaussian-Parzen density model, where some of the variables are
modelled through a jointly Gaussian density, while others are modelled through
Parzen windows. For these two models, EM leads to simple closed-form updates
based on matrix inversions and eigenvalue decompositions. We show empirically
that our method leads to density estimators with higher test-likelihoods than
natural competing methods, and that the metrics may be used within most
unsupervised learning techniques that rely on such metrics, such as spectral
clustering or manifold learning methods. Finally, we present a stochastic
approximation scheme which allows for the use of this method in a large-scale
setting
Decomposable Principal Component Analysis
We consider principal component analysis (PCA) in decomposable Gaussian
graphical models. We exploit the prior information in these models in order to
distribute its computation. For this purpose, we reformulate the problem in the
sparse inverse covariance (concentration) domain and solve the global
eigenvalue problem using a sequence of local eigenvalue problems in each of the
cliques of the decomposable graph. We demonstrate the application of our
methodology in the context of decentralized anomaly detection in the Abilene
backbone network. Based on the topology of the network, we propose an
approximate statistical graphical model and distribute the computation of PCA
Efficient independent component analysis
Independent component analysis (ICA) has been widely used for blind source
separation in many fields such as brain imaging analysis, signal processing and
telecommunication. Many statistical techniques based on M-estimates have been
proposed for estimating the mixing matrix. Recently, several nonparametric
methods have been developed, but in-depth analysis of asymptotic efficiency has
not been available. We analyze ICA using semiparametric theories and propose a
straightforward estimate based on the efficient score function by using
B-spline approximations. The estimate is asymptotically efficient under
moderate conditions and exhibits better performance than standard ICA methods
in a variety of simulations.Comment: Published at http://dx.doi.org/10.1214/009053606000000939 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …