126,668 research outputs found

    Robust Principal Component Analysis on Graphs

    Get PDF
    Principal Component Analysis (PCA) is the most widely used tool for linear dimensionality reduction and clustering. Still it is highly sensitive to outliers and does not scale well with respect to the number of data samples. Robust PCA solves the first issue with a sparse penalty term. The second issue can be handled with the matrix factorization model, which is however non-convex. Besides, PCA based clustering can also be enhanced by using a graph of data similarity. In this article, we introduce a new model called "Robust PCA on Graphs" which incorporates spectral graph regularization into the Robust PCA framework. Our proposed model benefits from 1) the robustness of principal components to occlusions and missing values, 2) enhanced low-rank recovery, 3) improved clustering property due to the graph smoothness assumption on the low-rank matrix, and 4) convexity of the resulting optimization problem. Extensive experiments on 8 benchmark, 3 video and 2 artificial datasets with corruptions clearly reveal that our model outperforms 10 other state-of-the-art models in its clustering and low-rank recovery tasks

    Robust Optimization using a new Volume-Based Clustering approach

    Get PDF
    We propose a new data-driven technique for constructing uncertainty sets for robust optimization problems. The technique captures the underlying structure of sparse data through volume-based clustering, resulting in less conservative solutions than most commonly used robust optimization approaches. This can aid management in making informed decisions under uncertainty, allowing a better understanding of the potential outcomes and risks associated with possible decisions. The paper demonstrates how clustering can be performed using any desired geometry and provides a mathematical optimization formulation for generating clusters and constructing the uncertainty set. In order to find an efficient solution to the problem, we explore different approaches since the method may be computationally expensive. This contribution to the field provides a novel data-driven approach to uncertainty set construction for robust optimization that can be applied to real-world scenarios

    Mean Robust Optimization

    Full text link
    Robust optimization is a tractable and expressive technique for decision-making under uncertainty, but it can lead to overly conservative decisions when pessimistic assumptions are made on the uncertain parameters. Wasserstein distributionally robust optimization can reduce conservatism by being data-driven, but it often leads to very large problems with prohibitive solution times. We introduce mean robust optimization, a general framework that combines the best of both worlds by providing a trade-off between computational effort and conservatism. We propose uncertainty sets constructed based on clustered data rather than on observed data points directly thereby significantly reducing problem size. By varying the number of clusters, our method bridges between robust and Wasserstein distributionally robust optimization. We show finite-sample performance guarantees and explicitly control the potential additional pessimism introduced by any clustering procedure. In addition, we prove conditions for which, when the uncertainty enters linearly in the constraints, clustering does not affect the optimal solution. We illustrate the efficiency and performance preservation of our method on several numerical examples, obtaining multiple orders of magnitude speedups in solution time with little-to-no effect on the solution quality

    PRISMA: PRoximal Iterative SMoothing Algorithm

    Full text link
    Motivated by learning problems including max-norm regularized matrix completion and clustering, robust PCA and sparse inverse covariance selection, we propose a novel optimization algorithm for minimizing a convex objective which decomposes into three parts: a smooth part, a simple non-smooth Lipschitz part, and a simple non-smooth non-Lipschitz part. We use a time variant smoothing strategy that allows us to obtain a guarantee that does not depend on knowing in advance the total number of iterations nor a bound on the domain

    Sketched sparse subspace clustering for large-scale hyperspectral images

    Get PDF
    Sparse subspace clustering (SSC) has achieved the state-of-the-art performance in clustering of hyperspectral images. However, the computational complexity of SSC-based methods is prohibitive for large-scale problems. We propose a large-scale SSC-based method, which processes efficiently large-scale HSIs without sacrificing the clustering accuracy. The proposed approach incorporates sketching of the self-representation dictionary reducing thereby largely the number of optimization variables. In addition, we employ a total variation (TV) regularization of the sparse matrix, resulting in a robust sparse representation. We derive a solver based on the alternating direction method of multipliers (ADMM) for the resulting optimization problem. Experimental results on real data show improvements over the traditional SSC-based methods in terms of accuracy and running time

    Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap

    Full text link
    Clustering procedures typically estimate which data points are clustered together, a quantity of primary importance in many analyses. Often used as a preliminary step for dimensionality reduction or to facilitate interpretation, finding robust and stable clusters is often crucial for appropriate for downstream analysis. In the present work, we consider Bayesian nonparametric (BNP) models, a particularly popular set of Bayesian models for clustering due to their flexibility. Because of its complexity, the Bayesian posterior often cannot be computed exactly, and approximations must be employed. Mean-field variational Bayes forms a posterior approximation by solving an optimization problem and is widely used due to its speed. An exact BNP posterior might vary dramatically when presented with different data. As such, stability and robustness of the clustering should be assessed. A popular mean to assess stability is to apply the bootstrap by resampling the data, and rerun the clustering for each simulated data set. The time cost is thus often very expensive, especially for the sort of exploratory analysis where clustering is typically used. We propose to use a fast and automatic approximation to the full bootstrap called the "linear bootstrap", which can be seen by local data perturbation. In this work, we demonstrate how to apply this idea to a data analysis pipeline, consisting of an MFVB approximation to a BNP clustering posterior of time course gene expression data. We show that using auto-differentiation tools, the necessary calculations can be done automatically, and that the linear bootstrap is a fast but approximate alternative to the bootstrap.Comment: 9 pages, NIPS 2017 Advances in Approximate Bayesian Inference Worksho
    • …
    corecore