694 research outputs found
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Tools and Techniques for Decision Tree Learning
Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning. First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development..
: Méthodes d'Inférence Symbolique pour les Bases de Données
This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mĂ©moire est une courte prĂ©sentation dâune direction de recherche, Ă laquelle jâai activementparticipĂ©, sur lâapprentissage pour les bases de donnĂ©es Ă partir dâexemples. Cette recherchesâest concentrĂ©e sur les modĂšles et les langages, aussi bien traditionnels quâĂ©mergents, pourlâinterrogation, la transformation et la description du schĂ©ma dâune base de donnĂ©es. Concernantles schĂ©mas, nos contributions consistent en plusieurs langages de schĂ©mas pour les nouveaumodĂšles de bases de donnĂ©es que sont XML non-ordonnĂ© et RDF. Nous avons ainsi Ă©tudiĂ©lâapprentissage Ă partir dâexemples des schĂ©mas pour XML non-ordonnĂ©, des schĂ©mas pour RDF,des requĂȘtes twig pour XML, les requĂȘtes de jointure pour bases de donnĂ©es relationnelles et lestransformations XML dĂ©finies par un nouveau modĂšle de transducteurs arbre-Ă -mot.Pour explorer si les langages proposĂ©s peuvent ĂȘtre appris, nous avons Ă©tĂ© obligĂ©s dâexaminerde prĂšs un certain nombre de leurs propriĂ©tĂ©s fondamentales, souvent souvent intĂ©ressantespar elles-mĂȘmes, y compris les formes normales, la minimisation, lâinclusion et lâĂ©quivalence, lacohĂ©rence dâun ensemble dâexemples et la caractĂ©risation finie. Une bonne comprĂ©hension de cespropriĂ©tĂ©s nous a permis de concevoir des algorithmes dâapprentissage qui explorent un espace derecherche potentiellement trĂšs vaste grĂące Ă un ensemble dâopĂ©rations de gĂ©nĂ©ralisation adaptĂ© Ă la recherche dâune solution appropriĂ©e.Lâapprentissage (ou lâinfĂ©rence) est un problĂšme Ă deux paramĂštres : la classe prĂ©cise delangage que nous souhaitons infĂ©rer et le type dâinformations que lâutilisateur peut fournir. Nousnous sommes placĂ©s dans le cas oĂč lâutilisateur fournit des exemples positifs, câest-Ă -dire desĂ©lĂ©ments qui appartiennent au langage cible, ainsi que des exemples nĂ©gatifs, câest-Ă -dire qui nâenfont pas partie. En gĂ©nĂ©ral lâutilisation Ă la fois dâexemples positifs et nĂ©gatifs permet dâapprendredes classes de langages plus riches que lâutilisation uniquement dâexemples positifs. Toutefois,lâutilisation des exemples nĂ©gatifs est souvent difficile parce que les exemples positifs et nĂ©gatifspeuvent rendre la forme de lâespace de recherche trĂšs complexe, et par consĂ©quent, son explorationinfaisable
Probabilistic models of language processing and acquisition
Probabilistic methods are providing new explanatory approaches to fundamental cognitive science questions of how humans structure, process and acquire language. This review examines probabilistic models defined over traditional symbolic structures. Language comprehension and production involve probabilistic inference in such models; and acquisition involves choosing the best model, given innate constraints and linguistic and other input. Probabilistic models can account for the learning and processing of language, while maintaining the sophistication of symbolic models. A recent burgeoning of theoretical developments and online corpus creation has enabled large models to be tested, revealing probabilistic constraints in processing, undermining acquisition arguments based on a perceived poverty of the stimulus, and suggesting fruitful links with probabilistic theories of categorization and ambiguity resolution in perception
Tensor decompositions for learning latent variable models
This work considers a computationally and statistically efficient parameter
estimation method for a wide class of latent variable models---including
Gaussian mixture models, hidden Markov models, and latent Dirichlet
allocation---which exploits a certain tensor structure in their low-order
observable moments (typically, of second- and third-order). Specifically,
parameter estimation is reduced to the problem of extracting a certain
(orthogonal) decomposition of a symmetric tensor derived from the moments; this
decomposition can be viewed as a natural generalization of the singular value
decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors
can be efficiently obtained by a variety of approaches, including power
iterations and maximization approaches (similar to the case of matrices). A
detailed analysis of a robust tensor power method is provided, establishing an
analogue of Wedin's perturbation theorem for the singular vectors of matrices.
This implies a robust and computationally tractable estimation approach for
several popular latent variable models
Logic Programs as Declarative and Procedural Bias in Inductive Logic Programming
Machine Learning is necessary for the development of Artificial Intelligence, as pointed out by Turing in his 1950 article ``Computing Machinery and Intelligence''. It is in the same article that Turing suggested the use of computational logic and background knowledge for learning. This thesis follows a logic-based machine learning approach called Inductive Logic Programming (ILP), which is advantageous over other machine learning approaches in terms of relational learning and utilising background knowledge. ILP uses logic programs as a uniform representation for hypothesis, background knowledge and examples, but its declarative bias is usually encoded using metalogical statements. This thesis advocates the use of logic programs to represent declarative and procedural bias, which results in a framework of single-language representation.
We show in this thesis that using a logic program called the top theory as declarative bias leads to a sound and complete multi-clause learning system MC-TopLog. It overcomes the entailment-incompleteness of Progol, thus outperforms Progol in terms of predictive accuracies on learning grammars and strategies for playing Nim game. MC-TopLog has been applied to two real-world applications funded by Syngenta, which is an agriculture company.
A higher-order extension on top theories results in meta-interpreters, which allow the introduction of new predicate symbols. Thus the resulting ILP system Metagol can do predicate invention, which is an intrinsically higher-order logic operation. Metagol also leverages the procedural semantic of Prolog to encode procedural bias, so that it can outperform both its ASP version and ILP systems without an equivalent procedural bias in terms of efficiency and accuracy. This is demonstrated by the experiments on learning Regular, Context-free and Natural grammars. Metagol is also applied to non-grammar learning tasks involving recursion and predicate invention, such as learning a definition of staircases and robot strategy learning. Both MC-TopLog and Metagol are based on a -directed framework, which is different from other multi-clause learning systems based on Inverse Entailment, such as CF-Induction, XHAIL and IMPARO. Compared to another -directed multi-clause learning system TAL, Metagol allows the explicit form of higher-order assumption to be encoded in the form of meta-rules.Open Acces
- âŠ