20,295 research outputs found

    Improved Bounds on Quantum Learning Algorithms

    Full text link
    In this article we give several new results on the complexity of algorithms that learn Boolean functions from quantum queries and quantum examples. Hunziker et al. conjectured that for any class C of Boolean functions, the number of quantum black-box queries which are required to exactly identify an unknown function from C is O(logCγ^C)O(\frac{\log |C|}{\sqrt{{\hat{\gamma}}^{C}}}), where γ^C\hat{\gamma}^{C} is a combinatorial parameter of the class C. We essentially resolve this conjecture in the affirmative by giving a quantum algorithm that, for any class C, identifies any unknown function from C using O(logCloglogCγ^C)O(\frac{\log |C| \log \log |C|}{\sqrt{{\hat{\gamma}}^{C}}}) quantum black-box queries. We consider a range of natural problems intermediate between the exact learning problem (in which the learner must obtain all bits of information about the black-box function) and the usual problem of computing a predicate (in which the learner must obtain only one bit of information about the black-box function). We give positive and negative results on when the quantum and classical query complexities of these intermediate problems are polynomially related to each other. Finally, we improve the known lower bounds on the number of quantum examples (as opposed to quantum black-box queries) required for (ϵ,δ)(\epsilon,\delta)-PAC learning any concept class of Vapnik-Chervonenkis dimension d over the domain {0,1}n\{0,1\}^n from Ω(dn)\Omega(\frac{d}{n}) to Ω(1ϵlog1δ+d+dϵ)\Omega(\frac{1}{\epsilon}\log \frac{1}{\delta}+d+\frac{\sqrt{d}}{\epsilon}). This new lower bound comes closer to matching known upper bounds for classical PAC learning.Comment: Minor corrections. 18 pages. To appear in Quantum Information Processing. Requires: algorithm.sty, algorithmic.sty to buil

    Database Learning: Toward a Database that Becomes Smarter Every Time

    Full text link
    In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 201

    A Complete Characterization of Statistical Query Learning with Applications to Evolvability

    Get PDF
    Statistical query (SQ) learning model of Kearns (1993) is a natural restriction of the PAC learning model in which a learning algorithm is allowed to obtain estimates of statistical properties of the examples but cannot see the examples themselves. We describe a new and simple characterization of the query complexity of learning in the SQ learning model. Unlike the previously known bounds on SQ learning our characterization preserves the accuracy and the efficiency of learning. The preservation of accuracy implies that that our characterization gives the first characterization of SQ learning in the agnostic learning framework. The preservation of efficiency is achieved using a new boosting technique and allows us to derive a new approach to the design of evolutionary algorithms in Valiant's (2006) model of evolvability. We use this approach to demonstrate the existence of a large class of monotone evolutionary learning algorithms based on square loss performance estimation. These results differ significantly from the few known evolutionary algorithms and give evidence that evolvability in Valiant's model is a more versatile phenomenon than there had been previous reason to suspect.Comment: Simplified Lemma 3.8 and it's application

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: \circ Upper bound: For every ff with decision tree size ss and every ε(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log(s/ε)log(1/ε))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. \circ Lower bound: For every ε(0,12)\varepsilon \in (0,\frac1{2}) and s2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(logs)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(logs/ε)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(logs4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Query Complexity of Approximate Equilibria in Anonymous Games

    Full text link
    We study the computation of equilibria of anonymous games, via algorithms that may proceed via a sequence of adaptive queries to the game's payoff function, assumed to be unknown initially. The general topic we consider is \emph{query complexity}, that is, how many queries are necessary or sufficient to compute an exact or approximate Nash equilibrium. We show that exact equilibria cannot be found via query-efficient algorithms. We also give an example of a 2-strategy, 3-player anonymous game that does not have any exact Nash equilibrium in rational numbers. However, more positive query-complexity bounds are attainable if either further symmetries of the utility functions are assumed or we focus on approximate equilibria. We investigate four sub-classes of anonymous games previously considered by \cite{bfh09, dp14}. Our main result is a new randomized query-efficient algorithm that finds a O(n1/4)O(n^{-1/4})-approximate Nash equilibrium querying O~(n3/2)\tilde{O}(n^{3/2}) payoffs and runs in time O~(n3/2)\tilde{O}(n^{3/2}). This improves on the running time of pre-existing algorithms for approximate equilibria of anonymous games, and is the first one to obtain an inverse polynomial approximation in poly-time. We also show how this can be utilized as an efficient polynomial-time approximation scheme (PTAS). Furthermore, we prove that Ω(nlogn)\Omega(n \log{n}) payoffs must be queried in order to find any ϵ\epsilon-well-supported Nash equilibrium, even by randomized algorithms

    Privately Releasing Conjunctions and the Statistical Query Barrier

    Full text link
    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to the first algorithm that efficiently releases differentially private answers to of all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms
    corecore