8,550 research outputs found
Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions
In this work we consider the estimation of spatio-temporal covariance
matrices in the low sample non-Gaussian regime. We impose covariance structure
in the form of a sum of Kronecker products decomposition (Tsiligkaridis et al.
2013, Greenewald et al. 2013) with diagonal correction (Greenewald et al.),
which we refer to as DC-KronPCA, in the estimation of multiframe covariance
matrices. This paper extends the approaches of (Tsiligkaridis et al.) in two
directions. First, we modify the diagonally corrected method of (Greenewald et
al.) to include a block Toeplitz constraint imposing temporal stationarity
structure. Second, we improve the conditioning of the estimate in the very low
sample regime by using Ledoit-Wolf type shrinkage regularization similar to
(Chen, Hero et al. 2010). For improved robustness to heavy tailed
distributions, we modify the KronPCA to incorporate robust shrinkage estimation
(Chen, Hero et al. 2011). Results of numerical simulations establish benefits
in terms of estimation MSE when compared to previous methods. Finally, we apply
our methods to a real-world network spatio-temporal anomaly detection problem
and achieve superior results.Comment: To appear at IEEE SSP 2014 4 page
A clustering algorithm for multivariate data streams with correlated components
Common clustering algorithms require multiple scans of all the data to
achieve convergence, and this is prohibitive when large databases, with data
arriving in streams, must be processed. Some algorithms to extend the popular
K-means method to the analysis of streaming data are present in literature
since 1998 (Bradley et al. in Scaling clustering algorithms to large databases.
In: KDD. p. 9-15, 1998; O'Callaghan et al. in Streaming-data algorithms for
high-quality clustering. In: Proceedings of IEEE international conference on
data engineering. p. 685, 2001), based on the memorization and recursive update
of a small number of summary statistics, but they either don't take into
account the specific variability of the clusters, or assume that the random
vectors which are processed and grouped have uncorrelated components.
Unfortunately this is not the case in many practical situations. We here
propose a new algorithm to process data streams, with data having correlated
components and coming from clusters with different covariance matrices. Such
covariance matrices are estimated via an optimal double shrinkage method, which
provides positive definite estimates even in presence of a few data points, or
of data having components with small variance. This is needed to invert the
matrices and compute the Mahalanobis distances that we use for the data
assignment to the clusters. We also estimate the total number of clusters from
the data.Comment: title changed, rewritte
Structural Variability from Noisy Tomographic Projections
In cryo-electron microscopy, the 3D electric potentials of an ensemble of
molecules are projected along arbitrary viewing directions to yield noisy 2D
images. The volume maps representing these potentials typically exhibit a great
deal of structural variability, which is described by their 3D covariance
matrix. Typically, this covariance matrix is approximately low-rank and can be
used to cluster the volumes or estimate the intrinsic geometry of the
conformation space. We formulate the estimation of this covariance matrix as a
linear inverse problem, yielding a consistent least-squares estimator. For
images of size -by- pixels, we propose an algorithm for calculating this
covariance estimator with computational complexity
, where the condition number
is empirically in the range --. Its efficiency relies on the
observation that the normal equations are equivalent to a deconvolution problem
in 6D. This is then solved by the conjugate gradient method with an appropriate
circulant preconditioner. The result is the first computationally efficient
algorithm for consistent estimation of 3D covariance from noisy projections. It
also compares favorably in runtime with respect to previously proposed
non-consistent estimators. Motivated by the recent success of eigenvalue
shrinkage procedures for high-dimensional covariance matrices, we introduce a
shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We
evaluate our methods on simulated datasets and achieve classification results
comparable to state-of-the-art methods in shorter running time. We also present
results on clustering volumes in an experimental dataset, illustrating the
power of the proposed algorithm for practical determination of structural
variability.Comment: 52 pages, 11 figure
- …