60 research outputs found

    Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

    Full text link
    PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201

    Fast and Sample-Efficient Federated Low Rank Matrix Recovery from Column-wise Linear and Quadratic Projections

    Full text link
    This work studies the following problem and its magnitude-only extension: develop a federated solution to recover an n×qn \times q rank-rr matrix, X=[x1,x2,...xq]X^* =[x^*_1 , x^*_2 ,...x^*_q], from mm independent linear projections of each of its columns, i.e., from yk:=Akxk,k[q]y_k := A_k x^*_k , k \in [q], where yky_k is an mm-length vector. Even though low-rank recovery problems have been extensively studied in the last decade, this particular problem has received surprisingly little attention. There exist only two provable solutions with a reasonable sample complexity, both of which are slow, have sub-optimal sample-complexity, and cannot be federated efficiently. We introduce a novel gradient descent (GD) based solution called GD-min that needs only Ω((n+q)r2log(1/ϵ))\Omega((n+q) r^2 \log(1/\epsilon)) samples and O(mqnrlog(1/ϵ))O( mq nr \log (1/\epsilon)) time to obtain an ϵ\epsilon-accurate estimate. Based on comparison with other well-studied problems, this is the best achievable sample complexity guarantee for a non-convex solution to the above problem. The time complexity is nearly linear and cannot be improved significantly either. Finally, in a federated setting, our solution has low communication cost and maintains privacy of the nodes' data and of the corresponding column estimates
    corecore