Search CORE

1,608 research outputs found

A polynomial-time algorithm to approximately count contingency tables when the number of rows is constant.

Author: Barvinok
Chung
Diaconis
Diaconis
Dyer
Dyer
Dyer
Dyer
Dyer
Hernek
Jerrum
Jerrum
Kannan
Martin Dyer
Mary Cryan
Sinclair
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

AbstractWe consider the problem of counting the number of contingency tables with given row and column sums. This problem is known to be #P-complete, even when there are only two rows (Random Structures Algorithms 10(4) (1997) 487). In this paper we present the first fully polynomial randomized approximation scheme for counting contingency tables when the number of rows is constant. A novel feature of our algorithm is that it is a hybrid of an exact counting technique with an approximation algorithm, giving two distinct phases. In the first, the columns are partitioned into “small” and “large”. We show that the number of contingency tables can be expressed as the weighted sum of a polynomial number of new instances of the problem, where each instance consists of some new row sums and the original large column sums. In the second phase, we show how to approximately count contingency tables when all the column sums are large. In this case, we show that the solution lies in approximating the volume of a single convex body, a problem which is known to be solvable in polynomial time (J. ACM 38 (1) (1991) 1)

Elsevier - Publisher Connector

Crossref

Edinburgh Research Explorer

Enumerating contingency tables via random permanents

Author: Barvinok Alexander
Publication venue
Publication date: 01/01/2005
Field of study

Given m positive integers R=(r_i), n positive integers C=(c_j) such that sum r_i = sum c_j =N, and mn non-negative weights W=(w_{ij}), we consider the total weight T=T(R, C; W) of non-negative integer matrices (contingency tables) D=(d_{ij}) with the row sums r_i, column sums c_j, and the weight of D equal to prod w_{ij}^{d_{ij}}. We present a randomized algorithm of a polynomial in N complexity which computes a number T'=T'(R,C; W) such that T' < T < alpha(R, C) T' where alpha(R,C) = min{prod r_i! r_i^{-r_i}, prod c_j! c_j^{-c_j}} N^N/N!. In many cases, ln T' provides an asymptotically accurate estimate of ln T. The idea of the algorithm is to express T as the expectation of the permanent of an N x N random matrix with exponentially distributed entries and approximate the expectation by the integral T' of an efficiently computable log-concave function on R^{mn}. Applications to counting integer flows in graphs are also discussed.Comment: 19 pages, bounds are sharpened, references are adde

arXiv.org e-Print Archive

CiteSeerX

Counting magic squares in quasi-polynomial time

Author: Barvinok Alexander
Samorodnitsky Alex
Yong Alexander
Publication venue
Publication date: 08/03/2007
Field of study

We present a randomized algorithm, which, given positive integers n and t and a real number 0< epsilon <1, computes the number Sigma(n, t) of n x n non-negative integer matrices (magic squares) with the row and column sums equal to t within relative error epsilon. The computational complexity of the algorithm is polynomial in 1/epsilon and quasi-polynomial in N=nt, that is, of the order N^{log N}. A simplified version of the algorithm works in time polynomial in 1/epsilon and N and estimates Sigma(n,t) within a factor of N^{log N}. This simplified version has been implemented. We present results of the implementation, state some conjectures, and discuss possible generalizations.Comment: 30 page

arXiv.org e-Print Archive

CiteSeerX

Sequential importance sampling for multiway tables

Author: Chen Yuguo
Dinwoodie Ian H.
Sullivant Seth
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

We describe an algorithm for the sequential sampling of entries in multiway contingency tables with given constraints. The algorithm can be used for computations in exact conditional inference. To justify the algorithm, a theory relates sampling values at each step to properties of the associated toric ideal using computational commutative algebra. In particular, the property of interval cell counts at each step is related to exponents on lead indeterminates of a lexicographic Gr\"{o}bner basis. Also, the approximation of integer programming by linear programming for sampling is related to initial terms of a toric ideal. We apply the algorithm to examples of contingency tables which appear in the social and medical sciences. The numerical results demonstrate that the theory is applicable and that the algorithm performs well.Comment: Published at http://dx.doi.org/10.1214/009053605000000822 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

Author: Lee M. S.
Moore A.
Publication venue
Publication date: 01/01/1997
Field of study

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Author: Dwork Cynthia
Nikolov Aleksandar
Talwar Kunal
Publication venue
Publication date: 06/08/2013
Field of study

Consider a database of

n

people, each represented by a bit-string of length

d

corresponding to the setting of

d

binary attributes. A

k

-way marginal query is specified by a subset

S

k

attributes, and a

|S|

-dimensional binary vector

\beta

specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to

S

agrees with

\beta

. Privately releasing approximate answers to a set of

k

-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least

\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\})

and at most

\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})

. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small

n

. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most

\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})

. This error bound is as good as the best known information theoretic upper bounds for

k=2

. This bound is an improvement over previous work on efficiently releasing marginals when

k

is small and when error

o(n)

is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

arXiv.org e-Print Archive

CiteSeerX