Search CORE

449 research outputs found

On Boosting Sparse Parities

Author: Lev Reyzin
Publication venue
Publication date: 03/04/2020
Field of study

Abstract While boosting has been extensively studied, considerably less attention has been devoted to the task of designing good weak learning algorithms. In this paper we consider the problem of designing weak learners that are especially adept to the boosting procedure and specifically the AdaBoost algorithm. First we describe conditions desirable for a weak learning algorithm. We then propose using sparse parity functions as weak learners, which have many of our desired properties, as weak learners in boosting. Our experimental tests show the proposed weak learners to be competitive with the most widely used ones: decision stumps and pruned decision trees

CiteSeerX

Agnostic Learning of Disjunctions on Symmetric Distributions

Author: Feldman Vitaly
Kothari Pravesh
Publication venue
Publication date: 25/05/2015
Field of study

We consider the problem of approximating and learning disjunctions (or equivalently, conjunctions) on symmetric distributions over

\{0,1\}^n

. Symmetric distributions are distributions whose PDF is invariant under any permutation of the variables. We give a simple proof that for every symmetric distribution

\mathcal{D}

, there exists a set of

n^{O(\log{(1/\epsilon)})}

functions

\mathcal{S}

, such that for every disjunction

c

, there is function

p

, expressible as a linear combination of functions in

\mathcal{S}

, such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

\mathbf{E}_{x \sim \mathcal{D}}[ |c(x)-p(x)|] \leq \epsilon

. This directly gives an agnostic learning algorithm for disjunctions on symmetric distributions that runs in time

n^{O( \log{(1/\epsilon)})}

. The best known previous bound is

n^{O(1/\epsilon^4)}

and follows from approximation of the more general class of halfspaces (Wimmer, 2010). We also show that there exists a symmetric distribution

\mathcal{D}

, such that the minimum degree of a polynomial that

1/3

-approximates the disjunction of all

n

variables is

\ell_1

distance on

\mathcal{D}

\Omega( \sqrt{n})

. Therefore the learning result above cannot be achieved via

\ell_1

-regression with a polynomial basis used in most other agnostic learning algorithms. Our technique also gives a simple proof that for any product distribution

\mathcal{D}

and every disjunction

c

, there exists a polynomial

p

of degree

O(\log{(1/\epsilon)})

such that

p

\epsilon

-approximates

c

\ell_1

distance on

\mathcal{D}

. This was first proved by Blais et al. (2008) via a more involved argument

arXiv.org e-Print Archive

CiteSeerX

Learning using Local Membership Queries

Author: Awasthi Pranjal
Feldman Vitaly
Kanade Varun
Publication venue
Publication date: 17/04/2013
Field of study

We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution. We restrict our attention to functions defined on the

n

-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most

O(\log(n))

. We show the following results in this model: (i) The class of sparse polynomials (with coefficients in R) over

\{0,1\}^n

is polynomial time learnable under a large class of \emph{locally smooth} distributions using

O(\log(n))

-local queries. This class also includes the class of

O(\log(n))

-depth decision trees. (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using

O(\log(n))

-local queries. (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using

O(\log(n))

-local queries in time

n^{O(\log(\log(n)))}

. (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model

arXiv.org e-Print Archive

CiteSeerX

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Author: Dwork Cynthia
Nikolov Aleksandar
Talwar Kunal
Publication venue
Publication date: 06/08/2013
Field of study

Consider a database of

n

people, each represented by a bit-string of length

d

corresponding to the setting of

d

binary attributes. A

k

-way marginal query is specified by a subset

S

k

attributes, and a

|S|

-dimensional binary vector

\beta

specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to

S

agrees with

\beta

. Privately releasing approximate answers to a set of

k

-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least

\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\})

and at most

\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})

. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small

n

. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most

\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})

. This error bound is as good as the best known information theoretic upper bounds for

k=2

. This bound is an improvement over previous work on efficiently releasing marginals when

k

is small and when error

o(n)

is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

arXiv.org e-Print Archive

CiteSeerX

On the hardness of learning sparse parities

Author: Bhattacharyya Arnab
Gadekar Ameet
Ghoshal Suprovat
Saket Rishi
Publication venue
Publication date: 25/11/2015
Field of study

This work investigates the hardness of computing sparse solutions to systems of linear equations over F_2. Consider the k-EvenSet problem: given a homogeneous system of linear equations over F_2 on n variables, decide if there exists a nonzero solution of Hamming weight at most k (i.e. a k-sparse solution). While there is a simple O(n^{k/2})-time algorithm for it, establishing fixed parameter intractability for k-EvenSet has been a notorious open problem. Towards this goal, we show that unless k-Clique can be solved in n^{o(k)} time, k-EvenSet has no poly(n)2^{o(sqrt{k})} time algorithm and no polynomial time algorithm when k = (log n)^{2+eta} for any eta > 0. Our work also shows that the non-homogeneous generalization of the problem -- which we call k-VectorSum -- is W[1]-hard on instances where the number of equations is O(k log n), improving on previous reductions which produced Omega(n) equations. We also show that for any constant eps > 0, given a system of O(exp(O(k))log n) linear equations, it is W[1]-hard to decide if there is a k-sparse linear form satisfying all the equations or if every function on at most k-variables (k-junta) satisfies at most (1/2 + eps)-fraction of the equations. In the setting of computational learning, this shows hardness of approximate non-proper learning of k-parities. In a similar vein, we use the hardness of k-EvenSet to show that that for any constant d, unless k-Clique can be solved in n^{o(k)} time there is no poly(m, n)2^{o(sqrt{k}) time algorithm to decide whether a given set of m points in F_2^n satisfies: (i) there exists a non-trivial k-sparse homogeneous linear form evaluating to 0 on all the points, or (ii) any non-trivial degree d polynomial P supported on at most k variables evaluates to zero on approx. Pr_{F_2^n}[P(z) = 0] fraction of the points i.e., P is fooled by the set of points

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Algorithms and lower bounds for de Morgan formulas of low-communication leaf gates

Author: Carboni Oliveira Igor
Kabanets Valentine
Koroth Sajin
Lu Zhenjian
Myrisiotis Dimitrios
Publication venue
Publication date: 01/01/2020
Field of study

The class

FORMULA[s] \circ \mathcal{G}

consists of Boolean functions computable by size-

s

de Morgan formulas whose leaves are any Boolean functions from a class

\mathcal{G}

. We give lower bounds and (SAT, Learning, and PRG) algorithms for

FORMULA[n^{1.99}]\circ \mathcal{G}

, for classes

\mathcal{G}

of functions with low communication complexity. Let

R^{(k)}(\mathcal{G})

be the maximum

k

-party NOF randomized communication complexity of

\mathcal{G}

. We show: (1) The Generalized Inner Product function

GIP^k_n

cannot be computed in

FORMULA[s]\circ \mathcal{G}

on more than

1/2+\varepsilon

fraction of inputs for

s = o \! \left ( \frac{n^2}{ \left(k \cdot 4^k \cdot {R}^{(k)}(\mathcal{G}) \cdot \log (n/\varepsilon) \cdot \log(1/\varepsilon) \right)^{2}} \right).

As a corollary, we get an average-case lower bound for

GIP^k_n

against

FORMULA[n^{1.99}]\circ PTF^{k-1}

. (2) There is a PRG of seed length

n/2 + O\left(\sqrt{s} \cdot R^{(2)}(\mathcal{G}) \cdot\log(s/\varepsilon) \cdot \log (1/\varepsilon) \right)

that

\varepsilon

-fools

FORMULA[s] \circ \mathcal{G}

. For

FORMULA[s] \circ LTF

, we get the better seed length

O\left(n^{1/2}\cdot s^{1/4}\cdot \log(n)\cdot \log(n/\varepsilon)\right)

. This gives the first non-trivial PRG (with seed length

o(n)

) for intersections of

n

half-spaces in the regime where

\varepsilon \leq 1/n

. (3) There is a randomized

2^{n-t}

-time

\#

SAT algorithm for

FORMULA[s] \circ \mathcal{G}

, where

t=\Omega\left(\frac{n}{\sqrt{s}\cdot\log^2(s)\cdot R^{(2)}(\mathcal{G})}\right)^{1/2}.

In particular, this implies a nontrivial #SAT algorithm for

FORMULA[n^{1.99}]\circ LTF

. (4) The Minimum Circuit Size Problem is not in

FORMULA[n^{1.99}]\circ XOR

. On the algorithmic side, we show that

FORMULA[n^{1.99}] \circ XOR

can be PAC-learned in time

2^{O(n/\log n)}

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Warwick Research Archives Portal Repository

Embedding Hard Learning Problems Into Gaussian Space

Author: Klivans Adam
Kothari Pravesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)
Publication date: 01/01/2014
Field of study

We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n^Omega(log(1/epsilon)) time under the assumption that k-sparse LPN requires n^Omega(k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. 2013 who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions

CiteSeerX

Dagstuhl Research Online Publication Server