Search CORE

10,901 research outputs found

Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard

Author: Gillis Nicolas
Glineur François
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2010
Field of study

Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).Comment: Proof of Lemma 4 (Lemma 3 in v1) has been corrected. Some remarks and comments have been added. Accepted in SIAM Journal on Matrix Analysis and Application

arXiv.org e-Print Archive

CiteSeerX

Crossref

DIAL UCLouvain

Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers

Author: Czepesvari Csaba
Ma Yao
Olshevsky Alexander
Saligrama Venkatesh
Publication venue
Publication date: 10/07/2018
Field of study

We consider worker skill estimation for the singlecoin Dawid-Skene crowdsourcing model. In practice skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary, and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills identifiable if and only if the sampling matrix (observed components) is irreducible and aperiodic. We then propose an efficient gradient descent scheme and show that skill estimates converges to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP hard in general. Next we derive sample complexity bounds for the noisy case in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.Published versio

Boston University Institutional Repository (OpenBU)

Low-rank matrix approximation with weights or missing data is NP-hard

Author: GILLIS Nicolas
GLINEUR François
Publication venue
Publication date
Field of study

Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).low-rank matrix approximation, weighted low-rank approximation, missing data, matrix completion with noise, PCA with missing data, computational complexity, maximum-edge biclique problem

Research Papers in Economics

Matrix factorization with Binary Components

Author: Hein Matthias
Lutsik Pavlo
Slawski Martin
Publication venue
Publication date: 01/01/2013
Field of study

Motivated by an application in computational biology, we consider low-rank matrix factorization with

\{0,1\}

-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size

2^{m \cdot r}

, where

m

is the dimension of the data points and

r

the rank of the factorization. Despite apparent intractability, we provide - in the line of recent work on non-negative matrix factorization by Arora et al. (2012) - an algorithm that provably recovers the underlying factorization in the exact case with

O(m r 2^r + mnr + r^2 n)

operations for

n

datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.Comment: appeared in NIPS 201

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Learning Output Kernels for Multi-Task Problems

Author: Dinuzzo Francesco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Simultaneously solving multiple related learning tasks is beneficial under a variety of circumstances, but the prior knowledge necessary to correctly model task relationships is rarely available in practice. In this paper, we develop a novel kernel-based multi-task learning technique that automatically reveals structural inter-task relationships. Building over the framework of output kernel learning (OKL), we introduce a method that jointly learns multiple functions and a low-rank multi-task kernel by solving a non-convex regularization problem. Optimization is carried out via a block coordinate descent strategy, where each subproblem is solved using suitable conjugate gradient (CG) type iterative methods for linear operator equations. The effectiveness of the proposed approach is demonstrated on pharmacological and collaborative filtering data

arXiv.org e-Print Archive

CiteSeerX

Publikationsserver der Universität Tübingen

MPG.PuRe

Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations

Author: Dan Chen
Hansen Kristoffer Arnsfelt
Jiang He
Wang Liwei
Zhou Yuchen
Publication venue
Publication date: 20/04/2017
Field of study

Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a

d \times n

binary matrix

A

and a small integer

k

. The goal is to find two binary matrices

U

and

V

of sizes

d \times k

and

k \times n

respectively, so that the Frobenius norm of

A - U V

is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The

\mathrm{GF}(2)

model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by Singular Value Decomposition, approximation of binary matrix is

NP

-hard even for

k=1

. In this paper, we consider the problem of Column Subset Selection (CSS), in which one low rank matrix must be formed by

k

columns of the data matrix. We characterize the approximation ratio of CSS for binary matrices. For

GF(2)

model, we show the approximation ratio of CSS is bounded by

\frac{k}{2}+1+\frac{k}{2(2^k-1)}

and this bound is asymptotically tight. For Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of one low rank matrix are generated from Boolean formulas operating bitwise on columns of the data matrix. We show the approximation ratio of GCSS is bounded by

2^{k-1}+1

, and the exponential dependency on

k

is inherent.Comment: 38 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server