16 research outputs found

    Scalable and Robust Community Detection with Randomized Sketching

    Full text link
    This paper explores and analyzes the unsupervised clustering of large partially observed graphs. We propose a scalable and provable randomized framework for clustering graphs generated from the stochastic block model. The clustering is first applied to a sub-matrix of the graph's adjacency matrix associated with a reduced graph sketch constructed using random sampling. Then, the clusters of the full graph are inferred based on the clusters extracted from the sketch using a correlation-based retrieval step. Uniform random node sampling is shown to improve the computational complexity over clustering of the full graph when the cluster sizes are balanced. A new random degree-based node sampling algorithm is presented which significantly improves upon the performance of the clustering algorithm even when clusters are unbalanced. This algorithm improves the phase transitions for matrix-decomposition-based clustering with regard to computational complexity and minimum cluster size, which are shown to be nearly dimension-free in the low inter-cluster connectivity regime. A third sampling technique is shown to improve balance by randomly sampling nodes based on spatial distribution. We provide analysis and numerical results using a convex clustering algorithm based on matrix completion

    Framework for Contextual Outlier Identification using Multivariate Analysis approach and Unsupervised Learning

    Get PDF
    Majority of the existing commercial application for video surveillance system only captures the event frames where the accuracy level of captures is too poor. We reviewed the existing system to find that at present there is no such research technique that offers contextual-based scene identification of outliers. Therefore, we presented a framework that uses unsupervised learning approach to perform precise identification of outliers for a given video frames concerning the contextual information of the scene. The proposed system uses matrix decomposition method using multivariate analysis to maintain an equilibrium better faster response time and higher accuracy of the abnormal event/object detection as an outlier. Using an analytical methodology, the proposed system blocking operation followed by sparsity to perform detection. The study outcome shows that proposed system offers an increasing level of accuracy in contrast to the existing system with faster response time

    Robust PCA by Manifold Optimization

    Full text link
    Robust PCA is a widely used statistical procedure to recover a underlying low-rank matrix with grossly corrupted observations. This work considers the problem of robust PCA as a nonconvex optimization problem on the manifold of low-rank matrices, and proposes two algorithms (for two versions of retractions) based on manifold optimization. It is shown that, with a proper designed initialization, the proposed algorithms are guaranteed to converge to the underlying low-rank matrix linearly. Compared with a previous work based on the Burer-Monterio decomposition of low-rank matrices, the proposed algorithms reduce the dependence on the conditional number of the underlying low-rank matrix theoretically. Simulations and real data examples confirm the competitive performance of our method

    Randomized Robust Subspace Recovery for High Dimensional Data Matrices

    Full text link
    This paper explores and analyzes two randomized designs for robust Principal Component Analysis (PCA) employing low-dimensional data sketching. In one design, a data sketch is constructed using random column sampling followed by low dimensional embedding, while in the other, sketching is based on random column and row sampling. Both designs are shown to bring about substantial savings in complexity and memory requirements for robust subspace learning over conventional approaches that use the full scale data. A characterization of the sample and computational complexity of both designs is derived in the context of two distinct outlier models, namely, sparse and independent outlier models. The proposed randomized approach can provably recover the correct subspace with computational and sample complexity that are almost independent of the size of the data. The results of the mathematical analysis are confirmed through numerical simulations using both synthetic and real data

    Provable Self-Representation Based Outlier Detection in a Union of Subspaces

    Full text link
    Many computer vision tasks involve processing large amounts of data contaminated by outliers, which need to be detected and rejected. While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more low-dimensional subspaces. This paper proposes a new outlier detection method that combines tools from sparse representation with random walks on a graph. By exploiting the property that data points can be expressed as sparse linear combinations of each other, we obtain an asymmetric affinity matrix among data points, which we use to construct a weighted directed graph. By defining a suitable Markov Chain from this graph, we establish a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks. We provide a theoretical analysis that justifies the correctness of our method under geometric and connectivity assumptions. Experimental results on image databases demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.Comment: 16 pages. CVPR 2017 spotlight oral presentatio

    Detection of Thin Boundaries between Different Types of Anomalies in Outlier Detection using Enhanced Neural Networks

    Full text link
    Outlier detection has received special attention in various fields, mainly for those dealing with machine learning and artificial intelligence. As strong outliers, anomalies are divided into the point, contextual and collective outliers. The most important challenges in outlier detection include the thin boundary between the remote points and natural area, the tendency of new data and noise to mimic the real data, unlabelled datasets and different definitions for outliers in different applications. Considering the stated challenges, we defined new types of anomalies called Collective Normal Anomaly and Collective Point Anomaly in order to improve a much better detection of the thin boundary between different types of anomalies. Basic domain-independent methods are introduced to detect these defined anomalies in both unsupervised and supervised datasets. The Multi-Layer Perceptron Neural Network is enhanced using the Genetic Algorithm to detect newly defined anomalies with higher precision so as to ensure a test error less than that calculated for the conventional Multi-Layer Perceptron Neural Network. Experimental results on benchmark datasets indicated reduced error of anomaly detection process in comparison to baselines

    High Dimensional Low Rank plus Sparse Matrix Decomposition

    Full text link
    This paper is concerned with the problem of low rank plus sparse matrix decomposition for big data. Conventional algorithms for matrix decomposition use the entire data to extract the low-rank and sparse components, and are based on optimization problems with complexity that scales with the dimension of the data, which limits their scalability. Furthermore, existing randomized approaches mostly rely on uniform random sampling, which is quite inefficient for many real world data matrices that exhibit additional structures (e.g. clustering). In this paper, a scalable subspace-pursuit approach that transforms the decomposition problem to a subspace learning problem is proposed. The decomposition is carried out using a small data sketch formed from sampled columns/rows. Even when the data is sampled uniformly at random, it is shown that the sufficient number of sampled columns/rows is roughly O(r\mu), where \mu is the coherency parameter and r the rank of the low rank component. In addition, adaptive sampling algorithms are proposed to address the problem of column/row sampling from structured data. We provide an analysis of the proposed method with adaptive sampling and show that adaptive sampling makes the required number of sampled columns/rows invariant to the distribution of the data. The proposed approach is amenable to online implementation and an online scheme is proposed.Comment: IEEE Transactions on Signal Processin
    corecore