3,864 research outputs found

    Sample Complexity Bounds on Differentially Private Learning via Communication Complexity

    Full text link
    In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and well-studied notion of privacy introduced by Dwork et al. (2006) that ensures that the output of an algorithm leaks little information about the data point provided by any of the participating individuals. Sample complexity of private PAC and agnostic learning was studied in a number of prior works starting with (Kasiviswanathan et al., 2008) but a number of basic questions still remain open, most notably whether learning with privacy requires more samples than learning without privacy. We show that the sample complexity of learning with (pure) differential privacy can be arbitrarily higher than the sample complexity of learning without the privacy constraint or the sample complexity of learning with approximate differential privacy. Our second contribution and the main tool is an equivalence between the sample complexity of (pure) differentially private learning of a concept class CC (or SCDP(C)SCDP(C)) and the randomized one-way communication complexity of the evaluation problem for concepts from CC. Using this equivalence we prove the following bounds: 1. SCDP(C)=Ω(LDim(C))SCDP(C) = \Omega(LDim(C)), where LDim(C)LDim(C) is the Littlestone's (1987) dimension characterizing the number of mistakes in the online-mistake-bound learning model. Known bounds on LDim(C)LDim(C) then imply that SCDP(C)SCDP(C) can be much higher than the VC-dimension of CC. 2. For any tt, there exists a class CC such that LDim(C)=2LDim(C)=2 but SCDP(C)tSCDP(C) \geq t. 3. For any tt, there exists a class CC such that the sample complexity of (pure) α\alpha-differentially private PAC learning is Ω(t/α)\Omega(t/\alpha) but the sample complexity of the relaxed (α,β)(\alpha,\beta)-differentially private PAC learning is O(log(1/β)/α)O(\log(1/\beta)/\alpha). This resolves an open problem of Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT) 201

    A Survey of Quantum Learning Theory

    Get PDF
    This paper surveys quantum learning theory: the theoretical aspects of machine learning using quantum computers. We describe the main results known for three models of learning: exact learning from membership queries, and Probably Approximately Correct (PAC) and agnostic learning from classical or quantum examples.Comment: 26 pages LaTeX. v2: many small changes to improve the presentation. This version will appear as Complexity Theory Column in SIGACT News in June 2017. v3: fixed a small ambiguity in the definition of gamma(C) and updated a referenc

    COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

    Full text link
    COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

    Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas

    Full text link
    We investigate the approximability of several classes of real-valued functions by functions of a small number of variables ({\em juntas}). Our main results are tight bounds on the number of variables required to approximate a function f:{0,1}n[0,1]f:\{0,1\}^n \rightarrow [0,1] within 2\ell_2-error ϵ\epsilon over the uniform distribution: 1. If ff is submodular, then it is ϵ\epsilon-close to a function of O(1ϵ2log1ϵ)O(\frac{1}{\epsilon^2} \log \frac{1}{\epsilon}) variables. This is an exponential improvement over previously known results. We note that Ω(1ϵ2)\Omega(\frac{1}{\epsilon^2}) variables are necessary even for linear functions. 2. If ff is fractionally subadditive (XOS) it is ϵ\epsilon-close to a function of 2O(1/ϵ2)2^{O(1/\epsilon^2)} variables. This result holds for all functions with low total 1\ell_1-influence and is a real-valued analogue of Friedgut's theorem for boolean functions. We show that 2Ω(1/ϵ)2^{\Omega(1/\epsilon)} variables are necessary even for XOS functions. As applications of these results, we provide learning algorithms over the uniform distribution. For XOS functions, we give a PAC learning algorithm that runs in time 2poly(1/ϵ)poly(n)2^{poly(1/\epsilon)} poly(n). For submodular functions we give an algorithm in the more demanding PMAC learning model (Balcan and Harvey, 2011) which requires a multiplicative 1+γ1+\gamma factor approximation with probability at least 1ϵ1-\epsilon over the target distribution. Our uniform distribution algorithm runs in time 2poly(1/(γϵ))poly(n)2^{poly(1/(\gamma\epsilon))} poly(n). This is the first algorithm in the PMAC model that over the uniform distribution can achieve a constant approximation factor arbitrarily close to 1 for all submodular functions. As follows from the lower bounds in (Feldman et al., 2013) both of these algorithms are close to optimal. We also give applications for proper learning, testing and agnostic learning with value queries of these classes.Comment: Extended abstract appears in proceedings of FOCS 201

    H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

    Get PDF
    The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.Peer ReviewedPostprint (author's final draft

    Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

    Full text link
    (Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n)
    corecore