694 research outputs found

    Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

    Get PDF
    Consider the following heuristic for building a decision tree for a function f:{0,1}n→{±1}f : \{0,1\}^n \to \{\pm 1\}. Place the most influential variable xix_i of ff at the root, and recurse on the subfunctions fxi=0f_{x_i=0} and fxi=1f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an Δ\varepsilon-approximation of ff. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: ∘\circ Upper bound: For every ff with decision tree size ss and every Δ∈(0,12)\varepsilon \in (0,\frac1{2}), this heuristic builds a decision tree of size at most sO(log⁥(s/Δ)log⁥(1/Δ))s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}. ∘\circ Lower bound: For every Δ∈(0,12)\varepsilon \in (0,\frac1{2}) and s≀2O~(n)s \le 2^{\tilde{O}(\sqrt{n})}, there is an ff with decision tree size ss such that this heuristic builds a decision tree of size sΩ~(log⁥s)s^{\tilde{\Omega}(\log s)}. We also obtain upper and lower bounds for monotone functions: sO(log⁥s/Δ)s^{O(\sqrt{\log s}/\varepsilon)} and sΩ~(log⁥s4)s^{\tilde{\Omega}(\sqrt[4]{\log s } )} respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

    Tools and Techniques for Decision Tree Learning

    Get PDF

    Tools and Techniques for Decision Tree Learning

    Get PDF
    Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning. First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development..

    Learning Stochastic Decision Trees

    Get PDF

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    Get PDF
    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mĂ©moire est une courte prĂ©sentation d’une direction de recherche, Ă  laquelle j’ai activementparticipĂ©, sur l’apprentissage pour les bases de donnĂ©es Ă  partir d’exemples. Cette recherches’est concentrĂ©e sur les modĂšles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schĂ©ma d’une base de donnĂ©es. Concernantles schĂ©mas, nos contributions consistent en plusieurs langages de schĂ©mas pour les nouveaumodĂšles de bases de donnĂ©es que sont XML non-ordonnĂ© et RDF. Nous avons ainsi Ă©tudiĂ©l’apprentissage Ă  partir d’exemples des schĂ©mas pour XML non-ordonnĂ©, des schĂ©mas pour RDF,des requĂȘtes twig pour XML, les requĂȘtes de jointure pour bases de donnĂ©es relationnelles et lestransformations XML dĂ©finies par un nouveau modĂšle de transducteurs arbre-Ă -mot.Pour explorer si les langages proposĂ©s peuvent ĂȘtre appris, nous avons Ă©tĂ© obligĂ©s d’examinerde prĂšs un certain nombre de leurs propriĂ©tĂ©s fondamentales, souvent souvent intĂ©ressantespar elles-mĂȘmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohĂ©rence d’un ensemble d’exemples et la caractĂ©risation finie. Une bonne comprĂ©hension de cespropriĂ©tĂ©s nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement trĂšs vaste grĂące Ă  un ensemble d’opĂ©rations de gĂ©nĂ©ralisation adaptĂ© Ă la recherche d’une solution appropriĂ©e.L’apprentissage (ou l’infĂ©rence) est un problĂšme Ă  deux paramĂštres : la classe prĂ©cise delangage que nous souhaitons infĂ©rer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placĂ©s dans le cas oĂč l’utilisateur fournit des exemples positifs, c’est-Ă -dire desĂ©lĂ©ments qui appartiennent au langage cible, ainsi que des exemples nĂ©gatifs, c’est-Ă -dire qui n’enfont pas partie. En gĂ©nĂ©ral l’utilisation Ă  la fois d’exemples positifs et nĂ©gatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples nĂ©gatifs est souvent difficile parce que les exemples positifs et nĂ©gatifspeuvent rendre la forme de l’espace de recherche trĂšs complexe, et par consĂ©quent, son explorationinfaisable

    Probabilistic models of language processing and acquisition

    Get PDF
    Probabilistic methods are providing new explanatory approaches to fundamental cognitive science questions of how humans structure, process and acquire language. This review examines probabilistic models defined over traditional symbolic structures. Language comprehension and production involve probabilistic inference in such models; and acquisition involves choosing the best model, given innate constraints and linguistic and other input. Probabilistic models can account for the learning and processing of language, while maintaining the sophistication of symbolic models. A recent burgeoning of theoretical developments and online corpus creation has enabled large models to be tested, revealing probabilistic constraints in processing, undermining acquisition arguments based on a perceived poverty of the stimulus, and suggesting fruitful links with probabilistic theories of categorization and ambiguity resolution in perception

    Tensor decompositions for learning latent variable models

    Get PDF
    This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models

    Logic Programs as Declarative and Procedural Bias in Inductive Logic Programming

    Get PDF
    Machine Learning is necessary for the development of Artificial Intelligence, as pointed out by Turing in his 1950 article ``Computing Machinery and Intelligence''. It is in the same article that Turing suggested the use of computational logic and background knowledge for learning. This thesis follows a logic-based machine learning approach called Inductive Logic Programming (ILP), which is advantageous over other machine learning approaches in terms of relational learning and utilising background knowledge. ILP uses logic programs as a uniform representation for hypothesis, background knowledge and examples, but its declarative bias is usually encoded using metalogical statements. This thesis advocates the use of logic programs to represent declarative and procedural bias, which results in a framework of single-language representation. We show in this thesis that using a logic program called the top theory as declarative bias leads to a sound and complete multi-clause learning system MC-TopLog. It overcomes the entailment-incompleteness of Progol, thus outperforms Progol in terms of predictive accuracies on learning grammars and strategies for playing Nim game. MC-TopLog has been applied to two real-world applications funded by Syngenta, which is an agriculture company. A higher-order extension on top theories results in meta-interpreters, which allow the introduction of new predicate symbols. Thus the resulting ILP system Metagol can do predicate invention, which is an intrinsically higher-order logic operation. Metagol also leverages the procedural semantic of Prolog to encode procedural bias, so that it can outperform both its ASP version and ILP systems without an equivalent procedural bias in terms of efficiency and accuracy. This is demonstrated by the experiments on learning Regular, Context-free and Natural grammars. Metagol is also applied to non-grammar learning tasks involving recursion and predicate invention, such as learning a definition of staircases and robot strategy learning. Both MC-TopLog and Metagol are based on a ⊀\top-directed framework, which is different from other multi-clause learning systems based on Inverse Entailment, such as CF-Induction, XHAIL and IMPARO. Compared to another ⊀\top-directed multi-clause learning system TAL, Metagol allows the explicit form of higher-order assumption to be encoded in the form of meta-rules.Open Acces
    • 

    corecore