47,682 research outputs found

    Low-Rank Matrices on Graphs: Generalized Recovery & Applications

    Get PDF
    Many real world datasets subsume a linear or non-linear low-rank structure in a very low-dimensional space. Unfortunately, one often has very little or no information about the geometry of the space, resulting in a highly under-determined recovery problem. Under certain circumstances, state-of-the-art algorithms provide an exact recovery for linear low-rank structures but at the expense of highly inscalable algorithms which use nuclear norm. However, the case of non-linear structures remains unresolved. We revisit the problem of low-rank recovery from a totally different perspective, involving graphs which encode pairwise similarity between the data samples and features. Surprisingly, our analysis confirms that it is possible to recover many approximate linear and non-linear low-rank structures with recovery guarantees with a set of highly scalable and efficient algorithms. We call such data matrices as \textit{Low-Rank matrices on graphs} and show that many real world datasets satisfy this assumption approximately due to underlying stationarity. Our detailed theoretical and experimental analysis unveils the power of the simple, yet very novel recovery framework \textit{Fast Robust PCA on Graphs

    Fast Robust PCA on Graphs

    Get PDF
    Mining useful clusters from high dimensional data has received significant attention of the computer vision and pattern recognition community in the recent years. Linear and non-linear dimensionality reduction has played an important role to overcome the curse of dimensionality. However, often such methods are accompanied with three different problems: high computational complexity (usually associated with the nuclear norm minimization), non-convexity (for matrix factorization methods) and susceptibility to gross corruptions in the data. In this paper we propose a principal component analysis (PCA) based solution that overcomes these three issues and approximates a low-rank recovery method for high dimensional datasets. We target the low-rank recovery by enforcing two types of graph smoothness assumptions, one on the data samples and the other on the features by designing a convex optimization problem. The resulting algorithm is fast, efficient and scalable for huge datasets with O(nlog(n)) computational complexity in the number of data samples. It is also robust to gross corruptions in the dataset as well as to the model parameters. Clustering experiments on 7 benchmark datasets with different types of corruptions and background separation experiments on 3 video datasets show that our proposed model outperforms 10 state-of-the-art dimensionality reduction models. Our theoretical analysis proves that the proposed model is able to recover approximate low-rank representations with a bounded error for clusterable data

    Robust Principal Component Analysis on Graphs

    Get PDF
    Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA solves the first issue with a sparse penalty term. The second issue can be handled with the matrix factorization model, which is however non-convex. Besides, PCA based clustering can also be enhanced by using a graph of data similarity. In this article, we introduce a new model called "Robust PCA on Graphs" which incorporates spectral graph regularization into the Robust PCA framework. Our proposed model benefits from 1) the robustness of principal components to occlusions and missing values, 2) enhanced low-rank recovery, 3) improved clustering property due to the graph smoothness assumption on the low-rank matrix, and 4) convexity of the resulting optimization problem. Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with corruptions clearly reveal that our model outperforms 10 other state-of-the-art models in its clustering and low-rank recovery tasks

    Scalable and Robust Community Detection with Randomized Sketching

    Full text link
    This paper explores and analyzes the unsupervised clustering of large partially observed graphs. We propose a scalable and provable randomized framework for clustering graphs generated from the stochastic block model. The clustering is first applied to a sub-matrix of the graph's adjacency matrix associated with a reduced graph sketch constructed using random sampling. Then, the clusters of the full graph are inferred based on the clusters extracted from the sketch using a correlation-based retrieval step. Uniform random node sampling is shown to improve the computational complexity over clustering of the full graph when the cluster sizes are balanced. A new random degree-based node sampling algorithm is presented which significantly improves upon the performance of the clustering algorithm even when clusters are unbalanced. This algorithm improves the phase transitions for matrix-decomposition-based clustering with regard to computational complexity and minimum cluster size, which are shown to be nearly dimension-free in the low inter-cluster connectivity regime. A third sampling technique is shown to improve balance by randomly sampling nodes based on spatial distribution. We provide analysis and numerical results using a convex clustering algorithm based on matrix completion

    Compressive PCA for Low-Rank Matrices on Graphs

    Get PDF
    We introduce a novel framework for an approxi- mate recovery of data matrices which are low-rank on graphs, from sampled measurements. The rows and columns of such matrices belong to the span of the first few eigenvectors of the graphs constructed between their rows and columns. We leverage this property to recover the non-linear low-rank structures efficiently from sampled data measurements, with a low cost (linear in n). First, a Resrtricted Isometry Property (RIP) condition is introduced for efficient uniform sampling of the rows and columns of such matrices based on the cumulative coherence of graph eigenvectors. Secondly, a state-of-the-art fast low-rank recovery method is suggested for the sampled data. Finally, several efficient, parallel and parameter-free decoders are presented along with their theoretical analysis for decoding the low-rank and cluster indicators for the full data matrix. Thus, we overcome the computational limitations of the standard linear low-rank recovery methods for big datasets. Our method can also be seen as a major step towards efficient recovery of non- linear low-rank structures. For a matrix of size n X p, on a single core machine, our method gains a speed up of p2/kp^2/k over Robust Principal Component Analysis (RPCA), where k << p is the subspace dimension. Numerically, we can recover a low-rank matrix of size 10304 X 1000, 100 times faster than Robust PCA

    Principal manifolds and graphs in practice: from molecular biology to dynamical systems

    Full text link
    We present several applications of non-linear data modeling, using principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). These approaches are generalizations of the Kohonen's self-organizing maps, a class of artificial neural networks. On several examples we show advantages of using non-linear objects for data approximation in comparison to the linear ones. We propose four numerical criteria for comparing linear and non-linear mappings of datasets into the spaces of lower dimension. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems.Comment: 12 pages, 9 figure
    • …
    corecore