10,901 research outputs found
Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard
Weighted low-rank approximation (WLRA), a dimensionality reduction technique
for data analysis, has been successfully used in several applications, such as
in collaborative filtering to design recommender systems or in computer vision
to recover structure from motion. In this paper, we study the computational
complexity of WLRA and prove that it is NP-hard to find an approximate
solution, even when a rank-one approximation is sought. Our proofs are based on
a reduction from the maximum-edge biclique problem, and apply to strictly
positive weights as well as binary weights (the latter corresponding to
low-rank matrix approximation with missing data).Comment: Proof of Lemma 4 (Lemma 3 in v1) has been corrected. Some remarks and
comments have been added. Accepted in SIAM Journal on Matrix Analysis and
Application
Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers
We consider worker skill estimation for the singlecoin
Dawid-Skene crowdsourcing model. In
practice skill-estimation is challenging because
worker assignments are sparse and irregular due
to the arbitrary, and uncontrolled availability of
workers. We formulate skill estimation as a
rank-one correlation-matrix completion problem,
where the observed components correspond to
observed label correlation between workers. We
show that the correlation matrix can be successfully
recovered and skills identifiable if and only
if the sampling matrix (observed components) is
irreducible and aperiodic. We then propose an
efficient gradient descent scheme and show that
skill estimates converges to the desired global optima
for such sampling matrices. Our proof is
original and the results are surprising in light of
the fact that even the weighted rank-one matrix
factorization problem is NP hard in general. Next
we derive sample complexity bounds for the noisy
case in terms of spectral properties of the signless
Laplacian of the sampling matrix. Our proposed
scheme achieves state-of-art performance on a
number of real-world datasets.Published versio
Low-rank matrix approximation with weights or missing data is NP-hard
Weighted low-rank approximation (WLRA), a dimensionality reduction technique for data analysis, has been successfully used in several applications, such as in collaborative filtering to design recommender systems or in computer vision to recover structure from motion. In this paper, we study the computational complexity of WLRA and prove that it is NP-hard to find an approximate solution, even when a rank-one approximation is sought. Our proofs are based on a reduction from the maximum-edge biclique problem, and apply to strictly positive weights as well as binary weights (the latter corresponding to low-rank matrix approximation with missing data).low-rank matrix approximation, weighted low-rank approximation, missing data, matrix completion with noise, PCA with missing data, computational complexity, maximum-edge biclique problem
Matrix factorization with Binary Components
Motivated by an application in computational biology, we consider low-rank
matrix factorization with -constraints on one of the factors and
optionally convex constraints on the second one. In addition to the
non-convexity shared with other matrix factorization schemes, our problem is
further complicated by a combinatorial constraint set of size ,
where is the dimension of the data points and the rank of the
factorization. Despite apparent intractability, we provide - in the line of
recent work on non-negative matrix factorization by Arora et al. (2012) - an
algorithm that provably recovers the underlying factorization in the exact case
with operations for datapoints. To obtain this
result, we use theory around the Littlewood-Offord lemma from combinatorics.Comment: appeared in NIPS 201
Learning Output Kernels for Multi-Task Problems
Simultaneously solving multiple related learning tasks is beneficial under a
variety of circumstances, but the prior knowledge necessary to correctly model
task relationships is rarely available in practice. In this paper, we develop a
novel kernel-based multi-task learning technique that automatically reveals
structural inter-task relationships. Building over the framework of output
kernel learning (OKL), we introduce a method that jointly learns multiple
functions and a low-rank multi-task kernel by solving a non-convex
regularization problem. Optimization is carried out via a block coordinate
descent strategy, where each subproblem is solved using suitable conjugate
gradient (CG) type iterative methods for linear operator equations. The
effectiveness of the proposed approach is demonstrated on pharmacological and
collaborative filtering data
Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations
Low rank matrix approximation is an important tool in machine learning. Given
a data matrix, low rank approximation helps to find factors, patterns and
provides concise representations for the data. Research on low rank
approximation usually focus on real matrices. However, in many applications
data are binary (categorical) rather than continuous. This leads to the problem
of low rank approximation of binary matrix. Here we are given a
binary matrix and a small integer . The goal is to find two binary
matrices and of sizes and respectively, so
that the Frobenius norm of is minimized. There are two models of this
problem, depending on the definition of the dot product of binary vectors: The
model and the Boolean semiring model. Unlike low rank
approximation of real matrix which can be efficiently solved by Singular Value
Decomposition, approximation of binary matrix is -hard even for .
In this paper, we consider the problem of Column Subset Selection (CSS), in
which one low rank matrix must be formed by columns of the data matrix. We
characterize the approximation ratio of CSS for binary matrices. For
model, we show the approximation ratio of CSS is bounded by
and this bound is asymptotically tight. For
Boolean model, it turns out that CSS is no longer sufficient to obtain a bound.
We then develop a Generalized CSS (GCSS) procedure in which the columns of one
low rank matrix are generated from Boolean formulas operating bitwise on
columns of the data matrix. We show the approximation ratio of GCSS is bounded
by , and the exponential dependency on is inherent.Comment: 38 page
- …