Search CORE

299 research outputs found

Learning pseudo-Boolean k-DNF and Submodular Functions

Author: Raskhodnikova Sofya
Yaroslavtsev Grigory
Publication venue
Publication date: 10/08/2012
Field of study

We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated with the terms of the formula are bounded. This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k \log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi, Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for learning submodular functions in this setting, for fixed epsilon and delta. Our learning algorithm implies a property tester for submodularity of functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon

arXiv.org e-Print Archive

CiteSeerX

DNF Sparsification and a Faster Deterministic Counting Algorithm

Author: Gopala Parikshit
Meka Raghu
Reingold Omer
Publication venue
Publication date: 01/01/2012
Field of study

Given a DNF formula on n variables, the two natural size measures are the number of terms or size s(f), and the maximum width of a term w(f). It is folklore that short DNF formulas can be made narrow. We prove a converse, showing that narrow formulas can be sparsified. More precisely, any width w DNF irrespective of its size can be

\epsilon

-approximated by a width

w

DNF with at most

(w\log(1/\epsilon))^{O(w)}

terms. We combine our sparsification result with the work of Luby and Velikovic to give a faster deterministic algorithm for approximately counting the number of satisfying solutions to a DNF. Given a formula on n variables with poly(n) terms, we give a deterministic

n^{\tilde{O}(\log \log(n))}

time algorithm that computes an additive

\epsilon

approximation to the fraction of satisfying assignments of f for \epsilon = 1/\poly(\log n). The previous best result due to Luby and Velickovic from nearly two decades ago had a run-time of

n^{\exp(O(\sqrt{\log \log n}))}

.Comment: To appear in the IEEE Conference on Computational Complexity, 201

arXiv.org e-Print Archive

CiteSeerX

Learning Coverage Functions and Private Release of Marginals

Author: Feldman Vitaly
Kothari Pravesh
Publication venue
Publication date: 27/05/2014
Field of study

We study the problem of approximating and learning coverage functions. A function

c: 2^{[n]} \rightarrow \mathbf{R}^{+}

is a coverage function, if there exists a universe

U

with non-negative weights

w(u)

for each

u \in U

and subsets

A_1, A_2, \ldots, A_n

U

such that

c(S) = \sum_{u \in \cup_{i \in S} A_i} w(u)

. Alternatively, coverage functions can be described as non-negative linear combinations of monotone disjunctions. They are a natural subclass of submodular functions and arise in a number of applications. We give an algorithm that for any

\gamma,\delta>0

, given random and uniform examples of an unknown coverage function

c

, finds a function

h

that approximates

c

within factor

1+\gamma

on all but

\delta

-fraction of the points in time

poly(n,1/\gamma,1/\delta)

. This is the first fully-polynomial algorithm for learning an interesting class of functions in the demanding PMAC model of Balcan and Harvey (2011). Our algorithms are based on several new structural properties of coverage functions. Using the results in (Feldman and Kothari, 2014), we also show that coverage functions are learnable agnostically with excess

\ell_1

-error

\epsilon

over all product and symmetric distributions in time

n^{\log(1/\epsilon)}

. In contrast, we show that, without assumptions on the distribution, learning coverage functions is at least as hard as learning polynomial-size disjoint DNF formulas, a class of functions for which the best known algorithm runs in time

2^{\tilde{O}(n^{1/3})}

(Klivans and Servedio, 2004). As an application of our learning results, we give simple differentially-private algorithms for releasing monotone conjunction counting queries with low average error. In particular, for any

k \leq n

, we obtain private release of

k

-way marginals with average error

\bar{\alpha}

in time

n^{O(\log(1/\bar{\alpha}))}

arXiv.org e-Print Archive

CiteSeerX

Learning DNF Expressions from Fourier Spectrum

Author: Feldman Vitaly
Publication venue
Publication date: 01/01/2012
Field of study

Since its introduction by Valiant in 1984, PAC learning of DNF expressions remains one of the central problems in learning theory. We consider this problem in the setting where the underlying distribution is uniform, or more generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed that in this setting a DNF expression can be efficiently approximated from its "heavy" low-degree Fourier coefficients alone. This is in contrast to previous approaches where boosting was used and thus Fourier coefficients of the target function modified by various distributions were needed. This property is crucial for learning of DNF expressions over smoothed product distributions, a learning model introduced by Kalai et al. (2009) and inspired by the seminal smoothed analysis model of Spielman and Teng (2001). We introduce a new approach to learning (or approximating) a polynomial threshold functions which is based on creating a function with range [-1,1] that approximately agrees with the unknown function on low-degree Fourier coefficients. We then describe conditions under which this is sufficient for learning polynomial threshold functions. Our approach yields a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions. We also describe an application of our algorithm to learning monotone DNF expressions over product distributions. Building on the work of Servedio (2001), we give an algorithm that runs in time \poly((s \cdot \log{(s/\eps)})^{\log{(s/\eps)}}, n), where

s

is the size of the target DNF expression and \eps is the accuracy. This improves on \poly((s \cdot \log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio (2001).Comment: Appears in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

CiteSeerX

Achieving New Upper Bounds for the Hypergraph Duality Problem through Logic

Author: Eiter T.
Eiter T.
Gaur D. R.
Gaur D. R.
Gottlob G.
Hagen M.
Henzinger T. A.
Kavvadias D. J.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2014
Field of study

The hypergraph duality problem DUAL is defined as follows: given two simple hypergraphs

\mathcal{G}

and

\mathcal{H}

, decide whether

\mathcal{H}

consists precisely of all minimal transversals of

\mathcal{G}

(in which case we say that

\mathcal{G}

is the dual of

\mathcal{H}

). This problem is equivalent to deciding whether two given non-redundant monotone DNFs are dual. It is known that non-DUAL, the complementary problem to DUAL, is in

\mathrm{GC}(\log^2 n,\mathrm{PTIME})

, where

\mathrm{GC}(f(n),\mathcal{C})

denotes the complexity class of all problems that after a nondeterministic guess of

O(f(n))

bits can be decided (checked) within complexity class

\mathcal{C}

. It was conjectured that non-DUAL is in

\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})

. In this paper we prove this conjecture and actually place the non-DUAL problem into the complexity class

\mathrm{GC}(\log^2 n,\mathrm{TC}^0)

which is a subclass of

\mathrm{GC}(\log^2 n,\mathrm{LOGSPACE})

. We here refer to the logtime-uniform version of

\mathrm{TC}^0

, which corresponds to

\mathrm{FO(COUNT)}

, i.e., first order logic augmented by counting quantifiers. We achieve the latter bound in two steps. First, based on existing problem decomposition methods, we develop a new nondeterministic algorithm for non-DUAL that requires to guess

O(\log^2 n)

bits. We then proceed by a logical analysis of this algorithm, allowing us to formulate its deterministic part in

\mathrm{FO(COUNT)}

. From this result, by the well known inclusion

\mathrm{TC}^0\subseteq\mathrm{LOGSPACE}

, it follows that DUAL belongs also to

\mathrm{DSPACE}[\log^2 n]

. Finally, by exploiting the principles on which the proposed nondeterministic algorithm is based, we devise a deterministic algorithm that, given two hypergraphs

\mathcal{G}

and

\mathcal{H}

, computes in quadratic logspace a transversal of

\mathcal{G}

missing in

\mathcal{H}

.Comment: Restructured the presentation in order to be the extended version of a paper that will shortly appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Research Exeter

Oxford University Research Archive

King's Research Portal