Search CORE

11,301 research outputs found

Parameterized Low-Rank Binary Matrix Approximation

Author: Fomin Fedor V.
Golovach Petr A.
Panolan Fahad
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

We provide a number of algorithmic results for the following family of problems: For a given binary m x n matrix A and a nonnegative integer k, decide whether there is a "simple" binary matrix B which differs from A in at most k entries. For an integer r, the "simplicity" of B is characterized as follows. - Binary r-Means: Matrix B has at most r different columns. This problem is known to be NP-complete already for r=2. We show that the problem is solvable in time 2^{O(k log k)}*(nm)^O(1) and thus is fixed-parameter tractable parameterized by k. We also complement this result by showing that when being parameterized by r and k, the problem admits an algorithm of running time 2^{O(r^{3/2}* sqrt{k log k})}(nm)^O(1), which is subexponential in k for r in o((k/log k)^{1/3}). - Low GF(2)-Rank Approximation: Matrix B is of GF(2)-rank at most r. This problem is known to be NP-complete already for r=1. It is also known to be W[1]-hard when parameterized by k. Interestingly, when parameterized by r and k, the problem is not only fixed-parameter tractable, but it is solvable in time 2^{O(r^{3/2}* sqrt{k log k})}(nm)^O(1), which is subexponential in k for r in o((k/log k)^{1/3}). - Low Boolean-Rank Approximation: Matrix B is of Boolean rank at most r. The problem is known to be NP-complete for k=0 as well as for r=1. We show that it is solvable in subexponential in k time 2^{O(r2^r * sqrt{k log k})}(nm)^O(1)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Broeck Guy Van den
Darwiche Adnan
Publication venue
Publication date: 26/11/2013
Field of study

Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model's symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically.Comment: To appear in Advances in Neural Information Processing Systems 26 (NIPS), Lake Tahoe, USA, December 201

arXiv.org e-Print Archive

Lirias

Crowdsourcing with Sparsely Interacting Workers

Author: Ma Yao
Olshevsky Alex
Saligrama Venkatesh
Szepesvari Csaba
Publication venue
Publication date: 01/01/2017
Field of study

We consider estimation of worker skills from worker-task interaction data (with unknown labels) for the single-coin crowd-sourcing binary classification model in symmetric noise. We define the (worker) interaction graph whose nodes are workers and an edge between two nodes indicates whether or not the two workers participated in a common task. We show that skills are asymptotically identifiable if and only if an appropriate limiting version of the interaction graph is irreducible and has odd-cycles. We then formulate a weighted rank-one optimization problem to estimate skills based on observations on an irreducible, aperiodic interaction graph. We propose a gradient descent scheme and show that for such interaction graphs estimates converge asymptotically to the global minimum. We characterize noise robustness of the gradient scheme in terms of spectral properties of signless Laplacians of the interaction graph. We then demonstrate that a plug-in estimator based on the estimated skills achieves state-of-art performance on a number of real-world datasets. Our results have implications for rank-one matrix completion problem in that gradient descent can provably recover

W \times W

rank-one matrices based on

W+1

off-diagonal observations of a connected graph with a single odd-cycle

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Parameterized Complexity of Feature Selection for Categorical Data Clustering

Author: Bandyapadhyay Sayan
Fomin Fedor V.
Golovach Petr A.
Simonov Kirill
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)
Publication date: 01/01/2021
Field of study

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature selection methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers l (the number of irrelevant features) and k (the number of clusters), budget B, and a set of n categorical data points (represented by m-dimensional vectors whose elements belong to a finite set of values Σ), we want to select m-l relevant features such that the cost of any optimal k-clustering on these features does not exceed B. Here the cost of a cluster is the sum of Hamming distances (l0-distances) between the selected features of the elements of the cluster and its center. The clustering cost is the total sum of the costs of the clusters. We use the framework of parameterized complexity to identify how the complexity of the problem depends on parameters k, B, and |Σ|. Our main result is an algorithm that solves the Feature Selection problem in time f(k,B,|Σ|)⋅m^{g(k,|Σ|)}⋅n² for some functions f and g. In other words, the problem is fixed-parameter tractable parameterized by B when |Σ| and k are constants. Our algorithm for Feature Selection is based on a solution to a more general problem, Constrained Clustering with Outliers. In this problem, we want to delete a certain number of outliers such that the remaining points could be clustered around centers satisfying specific constraints. One interesting fact about Constrained Clustering with Outliers is that besides Feature Selection, it encompasses many other fundamental problems regarding categorical data such as Robust Clustering, Binary and Boolean Low-rank Matrix Approximation with Outliers, and Binary Robust Projective Clustering. Thus as a byproduct of our theorem, we obtain algorithms for all these problems. We also complement our algorithmic findings with complexity lower bounds.publishedVersio

University of Bergen

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives