238,184 research outputs found
Euclidean Distance Matrices: Essential Theory, Algorithms and Applications
Euclidean distance matrices (EDM) are matrices of squared distances between
points. The definition is deceivingly simple: thanks to their many useful
properties they have found applications in psychometrics, crystallography,
machine learning, wireless sensor networks, acoustics, and more. Despite the
usefulness of EDMs, they seem to be insufficiently known in the signal
processing community. Our goal is to rectify this mishap in a concise tutorial.
We review the fundamental properties of EDMs, such as rank or
(non)definiteness. We show how various EDM properties can be used to design
algorithms for completing and denoising distance data. Along the way, we
demonstrate applications to microphone position calibration, ultrasound
tomography, room reconstruction from echoes and phase retrieval. By spelling
out the essential algorithms, we hope to fast-track the readers in applying
EDMs to their own problems. Matlab code for all the described algorithms, and
to generate the figures in the paper, is available online. Finally, we suggest
directions for further research.Comment: - 17 pages, 12 figures, to appear in IEEE Signal Processing Magazine
- change of title in the last revisio
Asymptotics for high-dimensional covariance matrices and quadratic forms with applications to the trace functional and shrinkage
We establish large sample approximations for an arbitray number of bilinear
forms of the sample variance-covariance matrix of a high-dimensional vector
time series using -bounded and small -bounded weighting
vectors. Estimation of the asymptotic covariance structure is also discussed.
The results hold true without any constraint on the dimension, the number of
forms and the sample size or their ratios. Concrete and potential applications
are widespread and cover high-dimensional data science problems such as tests
for large numbers of covariances, sparse portfolio optimization and projections
onto sparse principal components or more general spanning sets as frequently
considered, e.g. in classification and dictionary learning. As two specific
applications of our results, we study in greater detail the asymptotics of the
trace functional and shrinkage estimation of covariance matrices. In shrinkage
estimation, it turns out that the asymptotics differs for weighting vectors
bounded away from orthogonaliy and nearly orthogonal ones in the sense that
their inner product converges to 0.Comment: 42 page
Graph kernels between point clouds
Point clouds are sets of points in two or three dimensions. Most kernel
methods for learning on sets of points have not yet dealt with the specific
geometrical invariances and practical constraints associated with point clouds
in computer vision and graphics. In this paper, we present extensions of graph
kernels for point clouds, which allow to use kernel methods for such ob jects
as shapes, line drawings, or any three-dimensional point clouds. In order to
design rich and numerically efficient kernels with as few free parameters as
possible, we use kernels between covariance matrices and their factorizations
on graphical models. We derive polynomial time dynamic programming recursions
and present applications to recognition of handwritten digits and Chinese
characters from few training examples
Distributed Machine Learning via Sufficient Factor Broadcasting
Matrix-parametrized models, including multiclass logistic regression and
sparse coding, are used in machine learning (ML) applications ranging from
computer vision to computational biology. When these models are applied to
large-scale ML problems starting at millions of samples and tens of thousands
of classes, their parameter matrix can grow at an unexpected rate, resulting in
high parameter synchronization costs that greatly slow down distributed
learning. To address this issue, we propose a Sufficient Factor Broadcasting
(SFB) computation model for efficient distributed learning of a large family of
matrix-parameterized models, which share the following property: the parameter
update computed on each data sample is a rank-1 matrix, i.e., the outer product
of two "sufficient factors" (SFs). By broadcasting the SFs among worker
machines and reconstructing the update matrices locally at each worker, SFB
improves communication efficiency --- communication costs are linear in the
parameter matrix's dimensions, rather than quadratic --- without affecting
computational correctness. We present a theoretical convergence analysis of
SFB, and empirically corroborate its efficiency on four different
matrix-parametrized ML models
- …