327 research outputs found

    Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

    Full text link
    The paper studies machine learning problems where each example is described using a set of Boolean features and where hypotheses are represented by linear threshold elements. One method of increasing the expressiveness of learned hypotheses in this context is to expand the feature set to include conjunctions of basic features. This can be done explicitly or where possible by using a kernel function. Focusing on the well known Perceptron and Winnow algorithms, the paper demonstrates a tradeoff between the computational efficiency with which the algorithm can be run over the expanded feature space and the generalization ability of the corresponding learning algorithm. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over a feature space of exponentially many conjunctions; however we also show that using such kernels, the Perceptron algorithm can provably make an exponential number of mistakes even when learning simple functions. We then consider the question of whether kernel functions can analogously be used to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. Known upper bounds imply that the Winnow algorithm can learn Disjunctive Normal Form (DNF) formulae with a polynomial mistake bound in this setting. However, we prove that it is computationally hard to simulate Winnows behavior for learning DNF over such a feature set. This implies that the kernel functions which correspond to running Winnow for this problem are not efficiently computable, and that there is no general construction that can run Winnow with kernels

    Tight Bounds on Proper Equivalence Query Learning of DNF

    Full text link
    We prove a new structural lemma for partial Boolean functions ff, which we call the seed lemma for DNF. Using the lemma, we give the first subexponential algorithm for proper learning of DNF in Angluin's Equivalence Query (EQ) model. The algorithm has time and query complexity 2(O~n)2^{(\tilde{O}{\sqrt{n}})}, which is optimal. We also give a new result on certificates for DNF-size, a simple algorithm for properly PAC-learning DNF, and new results on EQ-learning logn\log n-term DNF and decision trees

    From average case complexity to improper learning complexity

    Full text link
    The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from NP\mathbf{NP}-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning DNF\mathrm{DNF}'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of ω(1)\omega(1) halfspaces is hard.Comment: 34 page

    The Consistency dimension and distribution-dependent learning from queries

    Get PDF
    We prove a new combinatorial characterization of polynomial learnability from equivalence queries, and state some of its consequences relating the learnability of a class with the learnability via equivalence and membership queries of its subclasses obtained by restricting the instance space. Then we propose and study two models of query learning in which there is a probability distribution on the instance space, both as an application of the tools developed from the combinatorial characterization and as models of independent interest.Postprint (published version

    Quantum machine learning: a classical perspective

    Get PDF
    Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning techniques to impressive results in regression, classification, data-generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets are motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed-up classical machine learning algorithms. Here we review the literature in quantum machine learning and discuss perspectives for a mixed readership of classical machine learning and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in machine learning are identified as promising directions for the field. Practical questions, like how to upload classical data into quantum form, will also be addressed.Comment: v3 33 pages; typos corrected and references adde

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: \circ Upper bound: For every ff with decision tree size ss and every ε(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log(s/ε)log(1/ε))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. \circ Lower bound: For every ε(0,12)\varepsilon \in (0,\frac1{2}) and s2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(logs)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(logs/ε)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(logs4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Decision lists and related Boolean functions

    Get PDF
    AbstractWe consider Boolean functions represented by decision lists, and study their relationships to other classes of Boolean functions. It turns out that the elementary class of 1-decision lists has interesting relationships to independently defined classes such as disguised Horn functions, read-once functions, nested differences of concepts, threshold functions, and 2-monotonic functions. In particular, 1-decision lists coincide with fragments of the mentioned classes. We further investigate the recognition problem for this class, as well as the extension problem in the context of partially defined Boolean functions (pdBfs). We show that finding an extension of a given pdBf in the class of 1-decision lists is possible in linear time. This improves on previous results. Moreover, we present an algorithm for enumerating all such extensions with polynomial delay
    corecore