Search CORE

19 research outputs found

Preconditioned Data Sparsification for Big Data with Applications to PCA and K-means

Author: Becker Stephen
Pourkamali-Anaraki Farhad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2016
Field of study

We analyze a compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample. The benefit is that the output is a sparse matrix and therefore subsequent processing, such as PCA or K-means, is significantly faster, especially in a distributed-data setting. Furthermore, the sampling is single-pass and applicable to streaming data. The sampling mechanism is a variant of previous methods proposed in the literature combined with a randomized preconditioning to smooth the data. We provide guarantees for PCA in terms of the covariance matrix, and guarantees for K-means in terms of the error in the center estimators at a given step. We present numerical evidence to show both that our bounds are nearly tight and that our algorithms provide a real benefit when applied to standard test data sets, as well as providing certain benefits over related sampling approaches.Comment: 28 pages, 10 figure

arXiv.org e-Print Archive

CU Scholar Institutional Repository

Crossref

Recommended from our members

ESTIMATING ACTIVE SUBSPACES WITH RANDOMIZED GRADIENT SAMPLING

Author: Becker Stephen
Pourkamali-Anaraki Farhad
Publication venue
Publication date: 01/07/2017
Field of study

In this work, we present an efficient method for estimating active subspaces using only random observations of gradient vectors. Our method is based on the bi-linear representation of low-rank gradient matrices with a novel initialization step for alternating minimization

CU Scholar Institutional Repository

Unsupervised Learning for Subterranean Junction Recognition Based on 2D Point Cloud

Author: Agha-Mohammadi Ali-Akbar
Burdick Joel
Castano Miguel
Nikolakopoulos George
Pourkamali-Anaraki Farhad
Sharif Mansouri Sina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2020
Field of study

This article proposes a novel unsupervised learning framework for detecting the number of tunnel junctions in subterranean environments based on acquired 2D point clouds. The implementation of the framework provides valuable information for high level mission planners to navigate an aerial platform in unknown areas or robot homing missions. The framework utilizes spectral clustering, which is capable of uncovering hidden structures from connected data points lying on non-linear manifolds. The spectral clustering algorithm computes a spectral embedding of the original 2D point cloud by utilizing the eigen decomposition of a matrix that is derived from the pairwise similarities of these points. We validate the developed framework using multiple data-sets, collected from multiple realistic simulations, as well as from real flights in underground environments, demonstrating the performance and merits of the proposed methodology