238,184 research outputs found

    Euclidean Distance Matrices: Essential Theory, Algorithms and Applications

    Get PDF
    Euclidean distance matrices (EDM) are matrices of squared distances between points. The definition is deceivingly simple: thanks to their many useful properties they have found applications in psychometrics, crystallography, machine learning, wireless sensor networks, acoustics, and more. Despite the usefulness of EDMs, they seem to be insufficiently known in the signal processing community. Our goal is to rectify this mishap in a concise tutorial. We review the fundamental properties of EDMs, such as rank or (non)definiteness. We show how various EDM properties can be used to design algorithms for completing and denoising distance data. Along the way, we demonstrate applications to microphone position calibration, ultrasound tomography, room reconstruction from echoes and phase retrieval. By spelling out the essential algorithms, we hope to fast-track the readers in applying EDMs to their own problems. Matlab code for all the described algorithms, and to generate the figures in the paper, is available online. Finally, we suggest directions for further research.Comment: - 17 pages, 12 figures, to appear in IEEE Signal Processing Magazine - change of title in the last revisio

    Asymptotics for high-dimensional covariance matrices and quadratic forms with applications to the trace functional and shrinkage

    Full text link
    We establish large sample approximations for an arbitray number of bilinear forms of the sample variance-covariance matrix of a high-dimensional vector time series using â„“1 \ell_1-bounded and small â„“2\ell_2-bounded weighting vectors. Estimation of the asymptotic covariance structure is also discussed. The results hold true without any constraint on the dimension, the number of forms and the sample size or their ratios. Concrete and potential applications are widespread and cover high-dimensional data science problems such as tests for large numbers of covariances, sparse portfolio optimization and projections onto sparse principal components or more general spanning sets as frequently considered, e.g. in classification and dictionary learning. As two specific applications of our results, we study in greater detail the asymptotics of the trace functional and shrinkage estimation of covariance matrices. In shrinkage estimation, it turns out that the asymptotics differs for weighting vectors bounded away from orthogonaliy and nearly orthogonal ones in the sense that their inner product converges to 0.Comment: 42 page

    Graph kernels between point clouds

    Get PDF
    Point clouds are sets of points in two or three dimensions. Most kernel methods for learning on sets of points have not yet dealt with the specific geometrical invariances and practical constraints associated with point clouds in computer vision and graphics. In this paper, we present extensions of graph kernels for point clouds, which allow to use kernel methods for such ob jects as shapes, line drawings, or any three-dimensional point clouds. In order to design rich and numerically efficient kernels with as few free parameters as possible, we use kernels between covariance matrices and their factorizations on graphical models. We derive polynomial time dynamic programming recursions and present applications to recognition of handwritten digits and Chinese characters from few training examples

    Distributed Machine Learning via Sufficient Factor Broadcasting

    Full text link
    Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two "sufficient factors" (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency --- communication costs are linear in the parameter matrix's dimensions, rather than quadratic --- without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models
    • …
    corecore