326 research outputs found

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    Get PDF
    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mémoire est une courte présentation d’une direction de recherche, à laquelle j’ai activementparticipé, sur l’apprentissage pour les bases de données à partir d’exemples. Cette recherches’est concentrée sur les modèles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schéma d’une base de données. Concernantles schémas, nos contributions consistent en plusieurs langages de schémas pour les nouveaumodèles de bases de données que sont XML non-ordonné et RDF. Nous avons ainsi étudiél’apprentissage à partir d’exemples des schémas pour XML non-ordonné, des schémas pour RDF,des requêtes twig pour XML, les requêtes de jointure pour bases de données relationnelles et lestransformations XML définies par un nouveau modèle de transducteurs arbre-à-mot.Pour explorer si les langages proposés peuvent être appris, nous avons été obligés d’examinerde près un certain nombre de leurs propriétés fondamentales, souvent souvent intéressantespar elles-mêmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohérence d’un ensemble d’exemples et la caractérisation finie. Une bonne compréhension de cespropriétés nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement très vaste grâce à un ensemble d’opérations de généralisation adapté àla recherche d’une solution appropriée.L’apprentissage (ou l’inférence) est un problème à deux paramètres : la classe précise delangage que nous souhaitons inférer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placés dans le cas où l’utilisateur fournit des exemples positifs, c’est-à-dire deséléments qui appartiennent au langage cible, ainsi que des exemples négatifs, c’est-à-dire qui n’enfont pas partie. En général l’utilisation à la fois d’exemples positifs et négatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples négatifs est souvent difficile parce que les exemples positifs et négatifspeuvent rendre la forme de l’espace de recherche très complexe, et par conséquent, son explorationinfaisable

    Set systems: order types, continuous nondeterministic deformations, and quasi-orders

    Get PDF
    By reformulating a learning process of a set system L as a game between Teacher and Learner, we define the order type of L to be the order type of the game tree, if the tree is well-founded. The features of the order type of L (dim L in symbol) are (1) We can represent any well-quasi-order (wqo for short) by the set system L of the upper-closed sets of the wqo such that the maximal order type of the wqo is equal to dim L. (2) dim L is an upper bound of the mind-change complexity of L. dim L is defined iff L has a finite elasticity (fe for short), where, according to computational learning theory, if an indexed family of recursive languages has fe then it is learnable by an algorithm from positive data. Regarding set systems as subspaces of Cantor spaces, we prove that fe of set systems is preserved by any continuous function which is monotone with respect to the set-inclusion. By it, we prove that finite elasticity is preserved by various (nondeterministic) language operators (Kleene-closure, shuffle-closure, union, product, intersection,. . ..) The monotone continuous functions represent nondeterministic computations. If a monotone continuous function has a computation tree with each node followed by at most n immediate successors and the order type of a set system L is {\alpha}, then the direct image of L is a set system of order type at most n-adic diagonal Ramsey number of {\alpha}. Furthermore, we provide an order-type-preserving contravariant embedding from the category of quasi-orders and finitely branching simulations between them, into the complete category of subspaces of Cantor spaces and monotone continuous functions having Girard's linearity between them. Keyword: finite elasticity, shuffle-closur

    Computation with Advice

    Get PDF
    Computation with advice is suggested as generalization of both computation with discrete advice and Type-2 Nondeterminism. Several embodiments of the generic concept are discussed, and the close connection to Weihrauch reducibility is pointed out. As a novel concept, computability with random advice is studied; which corresponds to correct solutions being guessable with positive probability. In the framework of computation with advice, it is possible to define computational complexity for certain concepts of hypercomputation. Finally, some examples are given which illuminate the interplay of uniform and non-uniform techniques in order to investigate both computability with advice and the Weihrauch lattice

    Inklusion von Patternsprachen und verwandte Probleme

    Get PDF
    A pattern is a word that consists of variables and terminal symbols. The pattern language that is generated by a pattern A is the set of all terminal words that can be obtained from A by uniform replacement of variables with terminal words. For example, the pattern A = a x y a x (where x and y are variables, and the letter a is a terminal symbol) generates the set of all words that have some word a x both as prefix and suffix (where these two occurrences of a x do not overlap). Due to their simple definition, pattern languages have various connections to a wide range of other areas in theoretical computer science and mathematics. Among these areas are combinatorics on words, logic, and the theory of free semigroups. On the other hand, many of the canonical questions in formal language theory are surprisingly difficult. The present thesis discusses various aspects of the inclusion problem of pattern languages. It can be divide in two parts. The first one examines the decidability of pattern languages with a limited number of variables and fixed terminal alphabets. In addition to this, the minimizability of regular expressions with repetition operators is studied. The second part deals with descriptive patterns, the smallest generalizations of arbitrary languages through pattern languages ("smallest" with respect to the inclusion relation). Main questions are the existence and the discoverability of descriptive patterns for arbitrary languages.Ein Pattern ist ein Wort aus Variablen und Terminalsymbolen. Die von einem Pattern A erzeugte Patternsprache ist die Menge aller Terminalwörter, die durch eine uniforme Ersetzung der Variablen in A durch Terminalwörter erzeugt werden können. So beschreibt das Pattern A = a x y a x (wobei x und y Variablen sind und a ein Terminal ist) die Menge aller Wörter, die ein Wort der Form a x sowohl als Präfix, als auch als Suffix haben (ohne dass sich diese beiden Vorkommen von a x überlappen). Wegen ihrer einfachen Definition besitzen Patternsprachen eine Vielzahl von Verbindungen zu verschiedenen anderen Gebieten der theoretischen Informatik und Mathematik, unter anderem zur Wortkombinatorik, Logik und der Theorie freier Halbgruppen. Andererseits führen viele der üblichen sprachtheoretischen Fragestellungen bei Patternsprachen zu kombinatorischen Problemen von überraschender Schwierigkeit. Die vorliegende Dissertation widmet sich verschiedenen Aspekten des Inklusionsproblems von Patternsprachen und kann in zwei Teile unterteilt werden. Der erste Teil untersucht die Entscheidbarkeit des Inklusionsproblems für Sprachen, die von Pattern mit beschränkter Variablenzahl über Terminalalphabeten von beschränkter Größe erzeugt werden. Darüber hinaus werden verschiedene Aspekte der Minimierbarkeit von regulären Ausdrücken mit Rückreferenzen betrachtet. Der zweite Teil der Dissertation handelt von deskriptiven Pattern; d.h. denjenigen Pattern, die die (hinsichtlich der Inklusion) kleinsten Verallgemeinerungen einer gegebenen Sprache erzeugen. Hauptfragen sind hierbei die Existenz und die Auffindbarkeit deskriptiver Pattern für beliebige Sprachen

    Levels of discontinuity, limit-computability, and jump operators

    Full text link
    We develop a general theory of jump operators, which is intended to provide an abstraction of the notion of "limit-computability" on represented spaces. Jump operators also provide a framework with a strong categorical flavor for investigating degrees of discontinuity of functions and hierarchies of sets on represented spaces. We will provide a thorough investigation within this framework of a hierarchy of Δ20\Delta^0_2-measurable functions between arbitrary countably based T0T_0-spaces, which captures the notion of computing with ordinal mind-change bounds. Our abstract approach not only raises new questions but also sheds new light on previous results. For example, we introduce a notion of "higher order" descriptive set theoretical objects, we generalize a recent characterization of the computability theoretic notion of "lowness" in terms of adjoint functors, and we show that our framework encompasses ordinal quantifications of the non-constructiveness of Hilbert's finite basis theorem

    IST Austria Thesis

    Get PDF
    This dissertation concerns the automatic verification of probabilistic systems and programs with arrays by statistical and logical methods. Although statistical and logical methods are different in nature, we show that they can be successfully combined for system analysis. In the first part of the dissertation we present a new statistical algorithm for the verification of probabilistic systems with respect to unbounded properties, including linear temporal logic. Our algorithm often performs faster than the previous approaches, and at the same time requires less information about the system. In addition, our method can be generalized to unbounded quantitative properties such as mean-payoff bounds. In the second part, we introduce two techniques for comparing probabilistic systems. Probabilistic systems are typically compared using the notion of equivalence, which requires the systems to have the equal probability of all behaviors. However, this notion is often too strict, since probabilities are typically only empirically estimated, and any imprecision may break the relation between processes. On the one hand, we propose to replace the Boolean notion of equivalence by a quantitative distance of similarity. For this purpose, we introduce a statistical framework for estimating distances between Markov chains based on their simulation runs, and we investigate which distances can be approximated in our framework. On the other hand, we propose to compare systems with respect to a new qualitative logic, which expresses that behaviors occur with probability one or a positive probability. This qualitative analysis is robust with respect to modeling errors and applicable to many domains. In the last part, we present a new quantifier-free logic for integer arrays, which allows us to express counting. Counting properties are prevalent in array-manipulating programs, however they cannot be expressed in the quantified fragments of the theory of arrays. We present a decision procedure for our logic, and provide several complexity results

    Learning algebraic structures with the help of Borel equivalence relations

    Get PDF
    We study algorithmic learning of algebraic structures. In our framework, a learner receives larger and larger pieces of an arbitrary copy of a computable structure and, at each stage, is required to output a conjecture about the isomorphism type of such a structure. The learning is successful if the conjectures eventually stabilize to a correct guess. We prove that a family of structures is learnable if and only if its learning domain is continuously reducible to the relation E0 of eventual agreement on reals. This motivates a novel research program, that is, using descriptive set theoretic tools to calibrate the (learning) complexity of nonlearnable families. Here, we focus on the learning power of well-known benchmark Borel equivalence relations (i.e., E1, E2, E3, Z0, and Eset)

    Inferring noncompensatory choice heuristics

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 121-128).Human decision making is a topic of great interest to marketers, psychologists, economists, and others. People are often modeled as rational utility maximizers with unlimited mental resources. However, due to the structure of the environment as well as cognitive limitations, people frequently use simplifying heuristics for making quick yet accurate decisions. In this research, we apply discrete optimization to infer from observed data if a person is behaving in way consistent with a choice heuristic (e.g., a noncompensatory lexicographic decision rule). We analyze the computational complexity of several inference related problems, showing that while some are easy due to possessing a greedoid language structure, many are hard and likely do not have polynomial time solutions. For the hard problems we develop an exact dynamic programming algorithm that is robust and scalable in practice, as well as analyze several local search heuristics. We conduct an empirical study of SmartPhone preferences and find that the behavior of many respondents can be explained by lexicographic strategies.(cont.) Furthermore, we find that lexicographic decision rules predict better on holdout data than some standard compensatory models. Finally, we look at a more general form of noncompensatory decision process in the context of consideration set formation. Specifically, we analyze the computational complexity of rule-based consideration set formation, develop solution techniques for inferring rules given observed consideration data, and apply the techniques to a real dataset.by Michael J. Yee.Ph.D

    Probabilistic Computability and Choice

    Get PDF
    We study the computational power of randomized computations on infinite objects, such as real numbers. In particular, we introduce the concept of a Las Vegas computable multi-valued function, which is a function that can be computed on a probabilistic Turing machine that receives a random binary sequence as auxiliary input. The machine can take advantage of this random sequence, but it always has to produce a correct result or to stop the computation after finite time if the random advice is not successful. With positive probability the random advice has to be successful. We characterize the class of Las Vegas computable functions in the Weihrauch lattice with the help of probabilistic choice principles and Weak Weak K\H{o}nig's Lemma. Among other things we prove an Independent Choice Theorem that implies that Las Vegas computable functions are closed under composition. In a case study we show that Nash equilibria are Las Vegas computable, while zeros of continuous functions with sign changes cannot be computed on Las Vegas machines. However, we show that the latter problem admits randomized algorithms with weaker failure recognition mechanisms. The last mentioned results can be interpreted such that the Intermediate Value Theorem is reducible to the jump of Weak Weak K\H{o}nig's Lemma, but not to Weak Weak K\H{o}nig's Lemma itself. These examples also demonstrate that Las Vegas computable functions form a proper superclass of the class of computable functions and a proper subclass of the class of non-deterministically computable functions. We also study the impact of specific lower bounds on the success probabilities, which leads to a strict hierarchy of classes. In particular, the classical technique of probability amplification fails for computations on infinite objects. We also investigate the dependency on the underlying probability space.Comment: Information and Computation (accepted for publication
    • …
    corecore