2 research outputs found
Online low-rank representation learning for joint multi-subspace recovery and clustering
Benefiting from global rank constraints, the lowrank
representation (LRR) method has been shown to be an
effective solution to subspace learning. However, the global
mechanism also means that the LRR model is not suitable for
handling large-scale data or dynamic data. For large-scale data,
the LRR method suffers from high time complexity, and for
dynamic data, it has to recompute a complex rank minimization
for the entire data set whenever new samples are dynamically
added, making it prohibitively expensive. Existing attempts to
online LRR either take a stochastic approach or build the
representation purely based on a small sample set and treat
new input as out-of-sample data. The former often requires
multiple runs for good performance and thus takes longer time
to run, and the latter formulates online LRR as an out-ofsample
classification problem and is less robust to noise. In
this paper, a novel online low-rank representation subspace
learning method is proposed for both large-scale and dynamic
data. The proposed algorithm is composed of two stages: static
learning and dynamic updating. In the first stage, the subspace
structure is learned from a small number of data samples. In
the second stage, the intrinsic principal components of the entire
data set are computed incrementally by utilizing the learned
subspace structure, and the low-rank representation matrix can
also be incrementally solved by an efficient online singular value
decomposition (SVD) algorithm. The time complexity is reduced
dramatically for large-scale data, and repeated computation is
avoided for dynamic problems. We further perform theoretical
analysis comparing the proposed online algorithm with the batch
LRR method. Finally, experimental results on typical tasks
of subspace recovery and subspace clustering show that the
proposed algorithm performs comparably or better than batch
methods including the batch LRR, and significantly outperforms
state-of-the-art online methods
Sampling and Subspace Methods for Learning Sparse Group Structures in Computer Vision
The unprecedented growth of data in volume and dimension has led to an increased number of computationally-demanding and data-driven decision-making methods in many disciplines, such as computer vision, genomics, finance, etc. Research on big data aims to understand and describe trends in massive volumes of high-dimensional data. High volume and dimension are the determining factors in both computational and time complexity of algorithms. The challenge grows when the data are formed of the union of group-structures of different dimensions embedded in a high-dimensional ambient space. To address the problem of high volume, we propose a sampling method referred to as the Sparse Withdrawal of Inliers in a First Trial (SWIFT), which determines the smallest sample size in one grab so that all group-structures are adequately represented and discovered with high probability. The key features of SWIFT are: (i) sparsity, which is independent of the population size; (ii) no prior knowledge of the distribution of data, or the number of underlying group-structures; and (iii) robustness in the presence of an overwhelming number of outliers. We report a comprehensive study of the proposed sampling method in terms of accuracy, functionality, and effectiveness in reducing the computational cost in various applications of computer vision. In the second part of this dissertation, we study dimensionality reduction for multi-structural data. We propose a probabilistic subspace clustering method that unifies soft- and hard-clustering in a single framework. This is achieved by introducing a delayed association of uncertain points to subspaces of lower dimensions based on a confidence measure. Delayed association yields higher accuracy in clustering subspaces that have ambiguities, i.e. due to intersections and high-level of outliers/noise, and hence leads to more accurate self-representation of underlying subspaces. Altogether, this dissertation addresses the key theoretical and practically issues of size and dimension in big data analysis