24,213 research outputs found
Low-Rank Boolean Matrix Approximation by Integer Programming
Low-rank approximations of data matrices are an important dimensionality
reduction tool in machine learning and regression analysis. We consider the
case of categorical variables, where it can be formulated as the problem of
finding low-rank approximations to Boolean matrices. In this paper we give what
is to the best of our knowledge the first integer programming formulation that
relies on only polynomially many variables and constraints, we discuss how to
solve it computationally and report numerical tests on synthetic and real-world
data
Boolean Matrix Factorization Meets Consecutive Ones Property
Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well
Using Underapproximations for Sparse Nonnegative Matrix Factorization
Nonnegative Matrix Factorization consists in (approximately) factorizing a
nonnegative data matrix by the product of two low-rank nonnegative matrices. It
has been successfully applied as a data analysis technique in numerous domains,
e.g., text mining, image processing, microarray data analysis, collaborative
filtering, etc.
We introduce a novel approach to solve NMF problems, based on the use of an
underapproximation technique, and show its effectiveness to obtain sparse
solutions. This approach, based on Lagrangian relaxation, allows the resolution
of NMF problems in a recursive fashion. We also prove that the
underapproximation problem is NP-hard for any fixed factorization rank, using a
reduction of the maximum edge biclique problem in bipartite graphs.
We test two variants of our underapproximation approach on several standard
image datasets and show that they provide sparse part-based representations
with low reconstruction error. Our results are comparable and sometimes
superior to those obtained by two standard Sparse Nonnegative Matrix
Factorization techniques.Comment: Version 2 removed the section about convex reformulations, which was
not central to the development of our main results; added material to the
introduction; added a review of previous related work (section 2.3);
completely rewritten the last part (section 4) to provide extensive numerical
results supporting our claims. Accepted in J. of Pattern Recognitio
Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition
Product reviews and ratings on e-commerce websites provide customers with
detailed insights about various aspects of the product such as quality,
usefulness, etc. Since they influence customers' buying decisions, product
reviews have become a fertile ground for abuse by sellers (colluding with
reviewers) to promote their own products or to tarnish the reputation of
competitor's products. In this paper, our focus is on detecting such abusive
entities (both sellers and reviewers) by applying tensor decomposition on the
product reviews data. While tensor decomposition is mostly unsupervised, we
formulate our problem as a semi-supervised binary multi-target tensor
decomposition, to take advantage of currently known abusive entities. We
empirically show that our multi-target semi-supervised model achieves higher
precision and recall in detecting abusive entities as compared to unsupervised
techniques. Finally, we show that our proposed stochastic partial natural
gradient inference for our model empirically achieves faster convergence than
stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2019. Contains supplementary material. arXiv admin note: text
overlap with arXiv:1804.0383
A Map of the Inorganic Ternary Metal Nitrides
Exploratory synthesis in novel chemical spaces is the essence of solid-state
chemistry. However, uncharted chemical spaces can be difficult to navigate,
especially when materials synthesis is challenging. Nitrides represent one such
space, where stringent synthesis constraints have limited the exploration of
this important class of functional materials. Here, we employ a suite of
computational materials discovery and informatics tools to construct a large
stability map of the inorganic ternary metal nitrides. Our map clusters the
ternary nitrides into chemical families with distinct stability and
metastability, and highlights hundreds of promising new ternary nitride spaces
for experimental investigation--from which we experimentally realized 7 new Zn-
and Mg-based ternary nitrides. By extracting the mixed metallicity, ionicity,
and covalency of solid-state bonding from the DFT-computed electron density, we
reveal the complex interplay between chemistry, composition, and electronic
structure in governing large-scale stability trends in ternary nitride
materials
- …