8 research outputs found

    Subspace clustering via good neighbors

    Get PDF
    Finding the informative clusters of a high-dimensional dataset is at the core of numerous applications in computer vision, where spectral based subspace clustering algorithm is arguably the most widely-studied methods due to its empirical performance and provable guarantees under various assumptions. It is well-known that sparsity and connectivity of the affinity graph play important rules for effective subspace clustering. However, it is difficult to simultaneously optimize both factors due to their conflicting nature, and most existing methods are designed to deal with only one factor. In this paper, we propose an algorithm to optimize both sparsity and connectivity by finding good neighbors which induce key connections among samples within a subspace. First, an initial coefficient matrix is generated from the input dataset. For each sample, we find its good neighbors which not only have large coefficients but are strongly connected to each other. We reassign the coefficients of good neighbors and eliminate other entries to generate a new coefficient matrix, which can be used by spectral clustering methods. Experiments on five benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy with a negligible increase in speed

    Simultaneous subspace clustering and cluster number estimating based on triplet relationship

    Get PDF
    In this paper we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, stateof- the-art subspace clustering approaches often optimize a selfrepresentation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method

    Improved image analysis by maximised statistical use of geometry-shape constraints

    Get PDF
    Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the construction of higher order affinities between data points into a hypergraph, which can then be partitioned using spectral clustering. Calculating the weights of all hyperedges is computationally expensive. Hence an approximation is required. In this thesis, the aim is to find an efficient and effective approximation that produces an excellent segmentation outcome. Firstly, the effect of hyperedge sizes on the speed and accuracy of the clustering is investigated. Almost all previous work on hypergraph clustering in computer vision, has considered the smallest possible hyperedge size, due to the lack of research into the potential benefits of large hyperedges and effective algorithms to generate them. In this thesis, it is shown that large hyperedges are better from both theoretical and empirical standpoints. The efficiency of this technique on various higher-order grouping problems is investigated. In particular, we show that our approach improves the accuracy and efficiency of motion segmentation from dense, long-term, trajectories. A shortcoming of the above approach is that the probability of a generated sample being impure increases as the size of the sample increases. To address this issue, a novel guided sampling strategy for large hyperedges, based on the concept of minimizing the largest residual, is also included. It is proposed to guide each sample by optimizing over a kk\textsuperscript{th} order statistics based cost function. Samples are generated using a greedy algorithm coupled with a data sub-sampling strategy. The experimental analysis shows that this proposed step is both accurate and computationally efficient compared to state-of-the-art robust multi-model fitting techniques. However, the optimization method for guiding samples involves hard-to-tune parameters. Thus a sampling method is eventually developed that significantly facilitates solving the segmentation problem using a new form of the Markov-Chain-Monte-Carlo (MCMC) method to efficiently sample from hyperedge distribution. To sample from the above distribution effectively, the proposed Markov Chain includes new types of long and short jumps to perform exploration and exploitation of all structures. Unlike common sampling methods, this method does not require any specific prior knowledge about the distribution of models. The output set of samples leads to a clustering solution by which the final model parameters for each segment are obtained. The overall method competes favorably with the state-of-the-art both in terms of computation power and segmentation accuracy

    A Deterministic Analysis for LRR

    No full text