10,901 research outputs found

    Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard

    Get PDF
    Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).Comment: Proof of Lemma 4 (Lemma 3 in v1) has been corrected. Some remarks and comments have been added. Accepted in SIAM Journal on Matrix Analysis and Application

    Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers

    Full text link
    We consider worker skill estimation for the singlecoin Dawid-Skene crowdsourcing model. In practice skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary, and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills identifiable if and only if the sampling matrix (observed components) is irreducible and aperiodic. We then propose an efficient gradient descent scheme and show that skill estimates converges to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP hard in general. Next we derive sample complexity bounds for the noisy case in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.Published versio

    Low-rank matrix approximation with weights or missing data is NP-hard

    Get PDF
    Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).low-rank matrix approximation, weighted low-rank approximation, missing data, matrix completion with noise, PCA with missing data, computational complexity, maximum-edge biclique problem

    Matrix factorization with Binary Components

    Full text link
    Motivated by an application in computational biology, we consider low-rank matrix factorization with {0,1}\{0,1\}-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size 2mr2^{m \cdot r}, where mm is the dimension of the data points and rr the rank of the factorization. Despite apparent intractability, we provide - in the line of recent work on non-negative matrix factorization by Arora et al. (2012) - an algorithm that provably recovers the underlying factorization in the exact case with O(mr2r+mnr+r2n)O(m r 2^r + mnr + r^2 n) operations for nn datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.Comment: appeared in NIPS 201

    Learning Output Kernels for Multi-Task Problems

    Full text link
    Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernel-based multi-task learning technique that automatically reveals structural inter-task relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a low-rank multi-task kernel by solving a non-convex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data

    Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations

    Get PDF
    Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a d×nd \times n binary matrix AA and a small integer kk. The goal is to find two binary matrices UU and VV of sizes d×kd \times k and k×nk \times n respectively, so that the Frobenius norm of AUVA - U V is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2)\mathrm{GF}(2) model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by Singular Value Decomposition, approximation of binary matrix is NPNP-hard even for k=1k=1. In this paper, we consider the problem of Column Subset Selection (CSS), in which one low rank matrix must be formed by kk columns of the data matrix. We characterize the approximation ratio of CSS for binary matrices. For GF(2)GF(2) model, we show the approximation ratio of CSS is bounded by k2+1+k2(2k1)\frac{k}{2}+1+\frac{k}{2(2^k-1)} and this bound is asymptotically tight. For Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of one low rank matrix are generated from Boolean formulas operating bitwise on columns of the data matrix. We show the approximation ratio of GCSS is bounded by 2k1+12^{k-1}+1, and the exponential dependency on kk is inherent.Comment: 38 page
    corecore