42,537 research outputs found
Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations
Low rank matrix approximation is an important tool in machine learning. Given
a data matrix, low rank approximation helps to find factors, patterns and
provides concise representations for the data. Research on low rank
approximation usually focus on real matrices. However, in many applications
data are binary (categorical) rather than continuous. This leads to the problem
of low rank approximation of binary matrix. Here we are given a
binary matrix and a small integer . The goal is to find two binary
matrices and of sizes and respectively, so
that the Frobenius norm of is minimized. There are two models of this
problem, depending on the definition of the dot product of binary vectors: The
model and the Boolean semiring model. Unlike low rank
approximation of real matrix which can be efficiently solved by Singular Value
Decomposition, approximation of binary matrix is -hard even for .
In this paper, we consider the problem of Column Subset Selection (CSS), in
which one low rank matrix must be formed by columns of the data matrix. We
characterize the approximation ratio of CSS for binary matrices. For
model, we show the approximation ratio of CSS is bounded by
and this bound is asymptotically tight. For
Boolean model, it turns out that CSS is no longer sufficient to obtain a bound.
We then develop a Generalized CSS (GCSS) procedure in which the columns of one
low rank matrix are generated from Boolean formulas operating bitwise on
columns of the data matrix. We show the approximation ratio of GCSS is bounded
by , and the exponential dependency on is inherent.Comment: 38 page
A divide-and-conquer algorithm for binary matrix completion
We propose an algorithm for low rank matrix completion for matrices with
binary entries which obtains explicit binary factors. Our algorithm, which we
call TBMC (\emph{Tiling for Binary Matrix Completion}), gives interpretable
output in the form of binary factors which represent a decomposition of the
matrix into tiles. Our approach is inspired by a popular algorithm from the
data mining community called PROXIMUS: it adopts the same recursive
partitioning approach while extending to missing data. The algorithm relies
upon rank-one approximations of incomplete binary matrices, and we propose a
linear programming (LP) approach for solving this subproblem. We also prove a
-approximation result for the LP approach which holds for any level of
subsampling and for any subsampling pattern. Our numerical experiments show
that TBMC outperforms existing methods on recommender systems arising in the
context of real datasets.Comment: 14 pages,4 figure
Boolean and Fp-Matrix Factorization: From Theory to Practice
Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Factorizing a matrix into low-rank matrices is used to gain more information about the data, like discovering relationships between the features and samples, roles and users, topics and articles, etc. In many applications, the binary nature of the factor matrices could enormously increase the interpretability of the data. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very re-cently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an effi-cient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the fol-lowing. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related Fp -Matrix Factorization. Here, given a matrix A over a finite field GF (p) where p is a prime, and an integer r. our objective is to find a matrix B over the same field with GF (p) -rank at most r minimizing some norm of A-B. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and Fp-Matrix Factorization. © 2022 IEEE
On the Complexity and Approximation of Binary Evidence in Lifted Inference
Lifted inference algorithms exploit symmetries in probabilistic models to
speed up inference. They show impressive performance when calculating
unconditional probabilities in relational models, but often resort to
non-lifted inference when computing conditional probabilities. The reason is
that conditioning on evidence breaks many of the model's symmetries, which can
preempt standard lifting techniques. Recent theoretical results show, for
example, that conditioning on evidence which corresponds to binary relations is
#P-hard, suggesting that no lifting is to be expected in the worst case. In
this paper, we balance this negative result by identifying the Boolean rank of
the evidence as a key parameter for characterizing the complexity of
conditioning in lifted inference. In particular, we show that conditioning on
binary evidence with bounded Boolean rank is efficient. This opens up the
possibility of approximating evidence by a low-rank Boolean matrix
factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26
(NIPS), Lake Tahoe, USA, December 201
Low-Rank Boolean Matrix Approximation by Integer Programming
Low-rank approximations of data matrices are an important dimensionality
reduction tool in machine learning and regression analysis. We consider the
case of categorical variables, where it can be formulated as the problem of
finding low-rank approximations to Boolean matrices. In this paper we give what
is to the best of our knowledge the first integer programming formulation that
relies on only polynomially many variables and constraints, we discuss how to
solve it computationally and report numerical tests on synthetic and real-world
data
- …