Search CORE

9 research outputs found

: Méthodes d'Inférence Symbolique pour les Bases de Données

Author: Staworko Slawomir
Publication venue: HAL CCSD
Publication date: 14/12/2015
Field of study

This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mémoire est une courte présentation d’une direction de recherche, à laquelle j’ai activementparticipé, sur l’apprentissage pour les bases de données à partir d’exemples. Cette recherches’est concentrée sur les modèles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schéma d’une base de données. Concernantles schémas, nos contributions consistent en plusieurs langages de schémas pour les nouveaumodèles de bases de données que sont XML non-ordonné et RDF. Nous avons ainsi étudiél’apprentissage à partir d’exemples des schémas pour XML non-ordonné, des schémas pour RDF,des requêtes twig pour XML, les requêtes de jointure pour bases de données relationnelles et lestransformations XML définies par un nouveau modèle de transducteurs arbre-à-mot.Pour explorer si les langages proposés peuvent être appris, nous avons été obligés d’examinerde près un certain nombre de leurs propriétés fondamentales, souvent souvent intéressantespar elles-mêmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohérence d’un ensemble d’exemples et la caractérisation finie. Une bonne compréhension de cespropriétés nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement très vaste grâce à un ensemble d’opérations de généralisation adapté àla recherche d’une solution appropriée.L’apprentissage (ou l’inférence) est un problème à deux paramètres : la classe précise delangage que nous souhaitons inférer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placés dans le cas où l’utilisateur fournit des exemples positifs, c’est-à-dire deséléments qui appartiennent au langage cible, ainsi que des exemples négatifs, c’est-à-dire qui n’enfont pas partie. En général l’utilisation à la fois d’exemples positifs et négatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples négatifs est souvent difficile parce que les exemples positifs et négatifspeuvent rendre la forme de l’espace de recherche très complexe, et par conséquent, son explorationinfaisable

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Learning categorial grammars

Author: Costa Florêncio C.
Publication venue
Publication date: 14/11/2003
Field of study

In 1967 E. M. Gold published a paper in which the language classes from the Chomsky-hierarchy were analyzed in terms of learnability, in the technical sense of identification in the limit. His results were mostly negative, and perhaps because of this his work had little impact on linguistics. In the early eighties there was renewed interest in the paradigm, mainly because of work by Angluin and Wright. Around the same time, Arikawa and his co-workers refined the paradigm by applying it to so-called Elementary Formal Systems. By making use of this approach Takeshi Shinohara was able to come up with an impressive result; any class of context-sensitive grammars with a bound on its number of rules is learnable. Some linguistically motivated work on learnability also appeared from this point on, most notably Wexler & Culicover 1980 and Kanazawa 1994. The latter investigates the learnability of various classes of categorial grammar, inspired by work by Buszkowski and Penn, and raises some interesting questions. We follow up on this work by exploring complexity issues relevant to learning these classes, answering an open question from Kanazawa 1994, and applying the same kind of approach to obtain (non)learnable classes of Combinatory Categorial Grammars, Tree Adjoining Grammars, Minimalist grammars, Generalized Quantifiers, and some variants of Lambek Grammars. We also discuss work on learning tree languages and its application to learning Dependency Grammars. Our main conclusions are: - formal learning theory is relevant to linguistics, - identification in the limit is feasible for non-trivial classes, - the `Shinohara approach' -i.e., placing a numerical bound on the complexity of a grammar- can lead to a learnable class, but this completely depends on the specific nature of the formalism and the notion of complexity. We give examples of natural classes of commonly used linguistic formalisms that resist this kind of approach, - learning is hard work. Our results indicate that learning even `simple' classes of languages requires a lot of computational effort, - dealing with structure (derivation-, dependency-) languages instead of string languages offers a useful and promising approach to learnabilty in a linguistic contex

Utrecht University Repository

Model checking infinite-state systems: generic and specific approaches

Author: To Anthony Widjaja
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Model checking is a fully-automatic formal verification method that has been extremely successful in validating and verifying safety-critical systems in the past three decades. In the past fifteen years, there has been a lot of work in extending many model checking algorithms over finite-state systems to finitely representable infinitestate systems. Unlike in the case of finite systems, decidability can easily become a problem in the case of infinite-state model checking. In this thesis, we present generic and specific techniques that can be used to derive decidability with near-optimal computational complexity for various model checking problems over infinite-state systems. Generic techniques and specific techniques primarily differ in the way in which a decidability result is derived. Generic techniques is a “top-down” approach wherein we start with a Turing-powerful formalismfor infinitestate systems (in the sense of being able to generate the computation graphs of Turing machines up to isomorphisms), and then impose semantic restrictions whereby the desired model checking problem becomes decidable. In other words, to show that a subclass of the infinite-state systems that is generated by this formalism is decidable with respect to the model checking problem under consideration, we will simply have to prove that this subclass satisfies the semantic restriction. On the other hand, specific techniques is a “bottom-up” approach in the sense that we restrict to a non-Turing powerful formalism of infinite-state systems at the outset. The main benefit of generic techniques is that they can be used as algorithmic metatheorems, i.e., they can give unified proofs of decidability of various model checking problems over infinite-state systems. Specific techniques are more flexible in the sense they can be used to derive decidability or optimal complexity when generic techniques fail. In the first part of the thesis, we adopt word/tree automatic transition systems as a generic formalism of infinite-state systems. Such formalisms can be used to generate many interesting classes of infinite-state systems that have been considered in the literature, e.g., the computation graphs of counter systems, Turing machines, pushdown systems, prefix-recognizable systems, regular ground-tree rewrite systems, PAprocesses, order-2 collapsible pushdown systems. Although the generality of these formalisms make most interesting model checking problems (even safety) undecidable, they are known to have nice closure and algorithmic properties. We use these nice properties to obtain several algorithmic metatheorems over word/tree automatic systems, e.g., for deriving decidability of various model checking problems including recurrent reachability, and Linear Temporal Logic (LTL) with complex fairness constraints. These algorithmic metatheorems can be used to uniformly prove decidability with optimal (or near-optimal) complexity of various model checking problems over many classes of infinite-state systems that have been considered in the literature. In fact, many of these decidability/complexity results were not previously known in the literature. In the second part of the thesis, we study various model checking problems over subclasses of counter systems that were already known to be decidable. In particular, we consider reversal-bounded counter systems (and their extensions with discrete clocks), one-counter processes, and networks of one-counter processes. We shall derive optimal complexity of various model checking problems including: model checking LTL, EF-logic, and first-order logic with reachability relations (and restrictions thereof). In most cases, we obtain a single/double exponential reduction in the previously known upper bounds on the complexity of the problems

Edinburgh Research Archive

Proceedings of the 19th Amsterdam Colloquium

Author
Publication venue: ILLC, University of Amsterdam
Publication date: 01/01/2013
Field of study

International Migration, Integration and Social Cohesion online publications

32nd International Symposium on Theoretical Aspects of Computer Science: STACS '15, March 4 - 7, 2015, Garching, Germany

Author: STAC <32. 2015, Garching>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2015
Field of study

Digitale Bibliothek Thüringen

Testing, Learning, Sampling, Sketching

Author: Harms Nathaniel
Publication venue: 'University of Waterloo'
Publication date: 19/08/2022
Field of study

We study several problems about sublinear algorithms, presented in two parts. Part I: Property testing and learning. There are two main goals of research in property testing and learning theory. The first is to understand the relationship between testing and learning, and the second is to develop efficient testing and learning algorithms. We present results towards both goals. - An oft-repeated motivation for property testing algorithms is to help with model selection in learning: to efficiently check whether the chosen hypothesis class (i.e. learning model) will successfully learn the target function. We present in this thesis a proof that, for many of the most useful and natural hypothesis classes (including halfspaces, polynomial threshold functions, intersections of halfspaces, etc.), the sample complexity of testing in the distribution-free model is nearly equal to that of learning. This shows that testing does not give a significant advantage in model selection in this setting. - We present a simple and general technique for transforming testing and learning algorithms designed for the uniform distribution over {0, 1}^d or [n]^d into algorithms that work for arbitrary product distributions over R d . This leads to an improvement and simplification of state-of-the-art results for testing monotonicity, learning intersections of halfspaces, learning polynomial threshold functions, and others. Part II. Adjacency and distance sketching for graphs. We initiate the thorough study of adjacency and distance sketching for classes of graphs. Two open problems in sublinear algorithms are: 1) to understand the power of randomization in communication; and 2) to characterize the sketchable distance metrics. We observe that constant-cost randomized communication is equivalent to adjacency sketching in a hereditary graph class, which in turn implies the existence of an efficient adjacency labeling scheme, the subject of a major open problem in structural graph theory. Therefore characterizing the adjacency sketchable graph classes (i.e. the constant-cost communication problems) is the probabilistic equivalent of this open problem, and an essential step towards understanding the power of randomization in communication. This thesis gives the first results towards a combined theory of these problems and uses this connection to obtain optimal adjacency labels for subgraphs of Cartesian products, resolving some questions from the literature. More generally, we begin to develop a theory of graph sketching for problems that generalize adjacency, including different notions of distance sketching. This connects the well-studied areas of distance sketching in sublinear algorithms, and distance labeling in structural graph theory

University of Waterloo's Institutional Repository

LIPIcs, Volume 261, ICALP 2023, Complete Volume

Author: Etessami Kousha
Feige Uriel
Puppis Gabriele
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 261, ICALP 2023, Complete Volum

Dagstuhl Research Online Publication Server

Characterizing PAC-learnability of Semilinear Sets

Author: Naoki Abe
Publication venue
Publication date
Field of study

The learnability of the class of letter-counts of regular languages (semilinear sets) and other related classes of subsets of N d with respect to the distribution-free learning model of Valiant (PAC-learning model) is characterized. Using the notion of reducibility among learning problems due to Pitt and Warmuth called "prediction preserving reducibility," and a special case thereof, a number of positive and partially negative results are obtained. On the positive side the class of semilinear sets of dimension 1 or 2 is shown to be learnable when the integers are encoded in unary. On the neutral to negative side it is shown that when the integers are encoded in binary the learning problem for semilinear sets as well as a class of subsets of Z d much simpler than semilinear sets is as hard as learning DNF, a central open problem in the field. A number of hardness results for related learning problems are also given. 3 Most of the research reported herein was conducted while the aut..

CiteSeerX