Search CORE

42,537 research outputs found

Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations

Author: Dan Chen
Hansen Kristoffer Arnsfelt
Jiang He
Wang Liwei
Zhou Yuchen
Publication venue
Publication date: 20/04/2017
Field of study

Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a

d \times n

binary matrix

A

and a small integer

k

. The goal is to find two binary matrices

U

and

V

of sizes

d \times k

and

k \times n

respectively, so that the Frobenius norm of

A - U V

is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The

\mathrm{GF}(2)

model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by Singular Value Decomposition, approximation of binary matrix is

NP

-hard even for

k=1

. In this paper, we consider the problem of Column Subset Selection (CSS), in which one low rank matrix must be formed by

k

columns of the data matrix. We characterize the approximation ratio of CSS for binary matrices. For

GF(2)

model, we show the approximation ratio of CSS is bounded by

\frac{k}{2}+1+\frac{k}{2(2^k-1)}

and this bound is asymptotically tight. For Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of one low rank matrix are generated from Boolean formulas operating bitwise on columns of the data matrix. We show the approximation ratio of GCSS is bounded by

2^{k-1}+1

, and the exponential dependency on

k

is inherent.Comment: 38 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A divide-and-conquer algorithm for binary matrix completion

Author: Beckerleg Melanie
Thompson Andrew
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

We propose an algorithm for low rank matrix completion for matrices with binary entries which obtains explicit binary factors. Our algorithm, which we call TBMC (\emph{Tiling for Binary Matrix Completion}), gives interpretable output in the form of binary factors which represent a decomposition of the matrix into tiles. Our approach is inspired by a popular algorithm from the data mining community called PROXIMUS: it adopts the same recursive partitioning approach while extending to missing data. The algorithm relies upon rank-one approximations of incomplete binary matrices, and we propose a linear programming (LP) approach for solving this subproblem. We also prove a

2

-approximation result for the LP approach which holds for any level of subsampling and for any subsampling pattern. Our numerical experiments show that TBMC outperforms existing methods on recommender systems arising in the context of real datasets.Comment: 14 pages,4 figure

arXiv.org e-Print Archive

Oxford University Research Archive

Boolean and Fp-Matrix Factorization: From Theory to Practice

Author: et al .
Fomin Fedor
Panolan Fahad
Patil Anurag
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Factorizing a matrix into low-rank matrices is used to gain more information about the data, like discovering relationships between the features and samples, roles and users, topics and articles, etc. In many applications, the binary nature of the factor matrices could enormously increase the interpretability of the data. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very re-cently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an effi-cient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the fol-lowing. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related Fp -Matrix Factorization. Here, given a matrix A over a finite field GF (p) where p is a prime, and an integer r. our objective is to find a matrix B over the same field with GF (p) -rank at most r minimizing some norm of A-B. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and Fp-Matrix Factorization. © 2022 IEEE

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Broeck Guy Van den
Darwiche Adnan
Publication venue
Publication date: 26/11/2013
Field of study

Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model's symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26 (NIPS), Lake Tahoe, USA, December 201

arXiv.org e-Print Archive

Low-Rank Boolean Matrix Approximation by Integer Programming

Author: Gunluk Oktay
Hauser Raphael
Kovacs Reka
Publication venue
Publication date: 01/01/2018
Field of study

Low-rank approximations of data matrices are an important dimensionality reduction tool in machine learning and regression analysis. We consider the case of categorical variables, where it can be formulated as the problem of finding low-rank approximations to Boolean matrices. In this paper we give what is to the best of our knowledge the first integer programming formulation that relies on only polynomially many variables and constraints, we discuss how to solve it computationally and report numerical tests on synthetic and real-world data

arXiv.org e-Print Archive

Oxford University Research Archive