Search CORE

3,864 research outputs found

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity

Author: Feldman Vitaly
Xiao David
Publication venue
Publication date: 13/09/2015
Field of study

In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and well-studied notion of privacy introduced by Dwork et al. (2006) that ensures that the output of an algorithm leaks little information about the data point provided by any of the participating individuals. Sample complexity of private PAC and agnostic learning was studied in a number of prior works starting with (Kasiviswanathan et al., 2008) but a number of basic questions still remain open, most notably whether learning with privacy requires more samples than learning without privacy. We show that the sample complexity of learning with (pure) differential privacy can be arbitrarily higher than the sample complexity of learning without the privacy constraint or the sample complexity of learning with approximate differential privacy. Our second contribution and the main tool is an equivalence between the sample complexity of (pure) differentially private learning of a concept class

C

(or

SCDP(C)

) and the randomized one-way communication complexity of the evaluation problem for concepts from

C

. Using this equivalence we prove the following bounds: 1.

SCDP(C) = \Omega(LDim(C))

, where

LDim(C)

is the Littlestone's (1987) dimension characterizing the number of mistakes in the online-mistake-bound learning model. Known bounds on

LDim(C)

then imply that

SCDP(C)

can be much higher than the VC-dimension of

C

. 2. For any

t

, there exists a class

C

such that

LDim(C)=2

but

SCDP(C) \geq t

. 3. For any

t

, there exists a class

C

such that the sample complexity of (pure)

\alpha

-differentially private PAC learning is

\Omega(t/\alpha)

but the sample complexity of the relaxed

(\alpha,\beta)

-differentially private PAC learning is

O(\log(1/\beta)/\alpha)

. This resolves an open problem of Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

CiteSeerX

A Survey of Quantum Learning Theory

Author: Arunachalam Srinivasan
de Wolf Ronald
Publication venue
Publication date: 01/06/2017
Field of study

This paper surveys quantum learning theory: the theoretical aspects of machine learning using quantum computers. We describe the main results known for three models of learning: exact learning from membership queries, and Probably Approximately Correct (PAC) and agnostic learning from classical or quantum examples.Comment: 26 pages LaTeX. v2: many small changes to improve the presentation. This version will appear as Complexity Theory Column in SIGACT News in June 2017. v3: fixed a small ambiguity in the definition of gamma(C) and updated a referenc

arXiv.org e-Print Archive

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

Author: Feldman Vitaly
Vondrak Jan
Publication venue
Publication date: 30/03/2015
Field of study

We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function

f:\{0,1\}^n \rightarrow [0,1]

within

\ell_2

-error

\epsilon

over the uniform distribution: 1. If

f

is submodular, then it is

\epsilon

-close to a function of

O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon})

variables. This is an exponential improvement over previously known results. We note that

\Omega(\frac{1}{\epsilon^2})

variables are necessary even for linear functions. 2. If

f

is fractionally subadditive (XOS) it is

\epsilon

-close to a function of

2^{O(1/\epsilon^2)}

variables. This result holds for all functions with low total

\ell_1

-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that

2^{\Omega(1/\epsilon)}

variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time

2^{poly(1/\epsilon)} poly(n)

. For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative

1+\gamma

factor approximation with probability at least

1-\epsilon

over the target distribution. Our uniform distribution algorithm runs in time

2^{poly(1/(\gamma\epsilon))} poly(n)

. This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201

arXiv.org e-Print Archive

Crossref

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Author: Abelló Gamazo Alberto
Calders Toon
Jovanovic Petar
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

Author: Karchmer Ari
Publication venue
Publication date: 11/11/2023
Field of study

(Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n)

arXiv.org e-Print Archive