326 research outputs found
: Méthodes d'Inférence Symbolique pour les Bases de Données
This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mémoire est une courte présentation d’une direction de recherche, à laquelle j’ai activementparticipé, sur l’apprentissage pour les bases de données à partir d’exemples. Cette recherches’est concentrée sur les modèles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schéma d’une base de données. Concernantles schémas, nos contributions consistent en plusieurs langages de schémas pour les nouveaumodèles de bases de données que sont XML non-ordonné et RDF. Nous avons ainsi étudiél’apprentissage à partir d’exemples des schémas pour XML non-ordonné, des schémas pour RDF,des requêtes twig pour XML, les requêtes de jointure pour bases de données relationnelles et lestransformations XML définies par un nouveau modèle de transducteurs arbre-à -mot.Pour explorer si les langages proposés peuvent être appris, nous avons été obligés d’examinerde près un certain nombre de leurs propriétés fondamentales, souvent souvent intéressantespar elles-mêmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohérence d’un ensemble d’exemples et la caractérisation finie. Une bonne compréhension de cespropriétés nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement très vaste grâce à un ensemble d’opérations de généralisation adapté à la recherche d’une solution appropriée.L’apprentissage (ou l’inférence) est un problème à deux paramètres : la classe précise delangage que nous souhaitons inférer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placés dans le cas où l’utilisateur fournit des exemples positifs, c’est-à -dire deséléments qui appartiennent au langage cible, ainsi que des exemples négatifs, c’est-à -dire qui n’enfont pas partie. En général l’utilisation à la fois d’exemples positifs et négatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples négatifs est souvent difficile parce que les exemples positifs et négatifspeuvent rendre la forme de l’espace de recherche très complexe, et par conséquent, son explorationinfaisable
Set systems: order types, continuous nondeterministic deformations, and quasi-orders
By reformulating a learning process of a set system L as a game between
Teacher and Learner, we define the order type of L to be the order type of the
game tree, if the tree is well-founded. The features of the order type of L
(dim L in symbol) are (1) We can represent any well-quasi-order (wqo for short)
by the set system L of the upper-closed sets of the wqo such that the maximal
order type of the wqo is equal to dim L. (2) dim L is an upper bound of the
mind-change complexity of L. dim L is defined iff L has a finite elasticity (fe
for short), where, according to computational learning theory, if an indexed
family of recursive languages has fe then it is learnable by an algorithm from
positive data. Regarding set systems as subspaces of Cantor spaces, we prove
that fe of set systems is preserved by any continuous function which is
monotone with respect to the set-inclusion. By it, we prove that finite
elasticity is preserved by various (nondeterministic) language operators
(Kleene-closure, shuffle-closure, union, product, intersection,. . ..) The
monotone continuous functions represent nondeterministic computations. If a
monotone continuous function has a computation tree with each node followed by
at most n immediate successors and the order type of a set system L is
{\alpha}, then the direct image of L is a set system of order type at most
n-adic diagonal Ramsey number of {\alpha}. Furthermore, we provide an
order-type-preserving contravariant embedding from the category of quasi-orders
and finitely branching simulations between them, into the complete category of
subspaces of Cantor spaces and monotone continuous functions having Girard's
linearity between them. Keyword: finite elasticity, shuffle-closur
Computation with Advice
Computation with advice is suggested as generalization of both computation
with discrete advice and Type-2 Nondeterminism. Several embodiments of the
generic concept are discussed, and the close connection to Weihrauch
reducibility is pointed out. As a novel concept, computability with random
advice is studied; which corresponds to correct solutions being guessable with
positive probability. In the framework of computation with advice, it is
possible to define computational complexity for certain concepts of
hypercomputation. Finally, some examples are given which illuminate the
interplay of uniform and non-uniform techniques in order to investigate both
computability with advice and the Weihrauch lattice
Inklusion von Patternsprachen und verwandte Probleme
A pattern is a word that consists of variables and terminal symbols. The pattern language that is generated by a pattern A is the set of all terminal words that can be obtained from A by uniform replacement of variables with terminal words. For example, the pattern A = a x y a x (where x and y are variables, and the letter a is a terminal symbol) generates the set of all words that have some word a x both as prefix and suffix (where these two occurrences of a x do not overlap). Due to their simple definition, pattern languages have various connections to a wide range of other areas in theoretical computer science and mathematics. Among these areas are combinatorics on words, logic, and the theory of free semigroups. On the other hand, many of the canonical questions in formal language theory are surprisingly difficult. The present thesis discusses various aspects of the inclusion problem of pattern languages. It can be divide in two parts. The first one examines the decidability of pattern languages with a limited number of variables and fixed terminal alphabets. In addition to this, the minimizability of regular expressions with repetition operators is studied. The second part deals with descriptive patterns, the smallest generalizations of arbitrary languages through pattern languages ("smallest" with respect to the inclusion relation). Main questions are the existence and the discoverability of descriptive patterns for arbitrary languages.Ein Pattern ist ein Wort aus Variablen und Terminalsymbolen. Die von einem Pattern A erzeugte Patternsprache ist die Menge aller Terminalwörter, die durch eine uniforme Ersetzung der Variablen in A durch Terminalwörter erzeugt werden können. So beschreibt das Pattern A = a x y a x (wobei x und y Variablen sind und a ein Terminal ist) die Menge aller Wörter, die ein Wort der Form a x sowohl als Präfix, als auch als Suffix haben (ohne dass sich diese beiden Vorkommen von a x überlappen). Wegen ihrer einfachen Definition besitzen Patternsprachen eine Vielzahl von Verbindungen zu verschiedenen anderen Gebieten der theoretischen Informatik und Mathematik, unter anderem zur Wortkombinatorik, Logik und der Theorie freier Halbgruppen. Andererseits führen viele der üblichen sprachtheoretischen Fragestellungen bei Patternsprachen zu kombinatorischen Problemen von überraschender Schwierigkeit. Die vorliegende Dissertation widmet sich verschiedenen Aspekten des Inklusionsproblems von Patternsprachen und kann in zwei Teile unterteilt werden. Der erste Teil untersucht die Entscheidbarkeit des Inklusionsproblems für Sprachen, die von Pattern mit beschränkter Variablenzahl über Terminalalphabeten von beschränkter Größe erzeugt werden. Darüber hinaus werden verschiedene Aspekte der Minimierbarkeit von regulären Ausdrücken mit Rückreferenzen betrachtet. Der zweite Teil der Dissertation handelt von deskriptiven Pattern; d.h. denjenigen Pattern, die die (hinsichtlich der Inklusion) kleinsten Verallgemeinerungen einer gegebenen Sprache erzeugen. Hauptfragen sind hierbei die Existenz und die Auffindbarkeit deskriptiver Pattern für beliebige Sprachen
Levels of discontinuity, limit-computability, and jump operators
We develop a general theory of jump operators, which is intended to provide
an abstraction of the notion of "limit-computability" on represented spaces.
Jump operators also provide a framework with a strong categorical flavor for
investigating degrees of discontinuity of functions and hierarchies of sets on
represented spaces. We will provide a thorough investigation within this
framework of a hierarchy of -measurable functions between arbitrary
countably based -spaces, which captures the notion of computing with
ordinal mind-change bounds. Our abstract approach not only raises new questions
but also sheds new light on previous results. For example, we introduce a
notion of "higher order" descriptive set theoretical objects, we generalize a
recent characterization of the computability theoretic notion of "lowness" in
terms of adjoint functors, and we show that our framework encompasses ordinal
quantifications of the non-constructiveness of Hilbert's finite basis theorem
IST Austria Thesis
This dissertation concerns the automatic verification of probabilistic systems and programs with arrays by statistical and logical methods. Although statistical and logical methods are different in nature, we show that they can be successfully combined for system analysis. In the first part of the dissertation we present a new statistical algorithm for the verification of probabilistic systems with respect to unbounded properties, including linear temporal logic. Our algorithm often performs faster than the previous approaches, and at the same time requires less information about the system. In addition, our method can be generalized to unbounded quantitative properties such as mean-payoff bounds. In the second part, we introduce two techniques for comparing probabilistic systems. Probabilistic systems are typically compared using the notion of equivalence, which requires the systems to have the equal probability of all behaviors. However, this notion is often too strict, since probabilities are typically only empirically estimated, and any imprecision may break the relation between processes. On the one hand, we propose to replace the Boolean notion of equivalence by a quantitative distance of similarity. For this purpose, we introduce a statistical framework for estimating distances between Markov chains based on their simulation runs, and we investigate which distances can be approximated in our framework. On the other hand, we propose to compare systems with respect to a new qualitative logic, which expresses that behaviors occur with probability one or a positive probability. This qualitative analysis is robust with respect to modeling errors and applicable to many domains. In the last part, we present a new quantifier-free logic for integer arrays, which allows us to express counting. Counting properties are prevalent in array-manipulating programs, however they cannot be expressed in the quantified fragments of the theory of arrays. We present a decision procedure for our logic, and provide several complexity results
Learning algebraic structures with the help of Borel equivalence relations
We study algorithmic learning of algebraic structures. In our framework, a learner receives larger and larger pieces of an arbitrary copy of a computable structure and, at each stage, is required to output a conjecture about the isomorphism type of such a structure. The learning is successful if the conjectures eventually stabilize to a correct guess. We prove that a family of structures is learnable if and only if its learning domain is continuously reducible to the relation E0 of eventual agreement on reals. This motivates a novel research program, that is, using descriptive set theoretic tools to calibrate the (learning) complexity of nonlearnable families. Here, we focus on the learning power of well-known benchmark Borel equivalence relations (i.e., E1, E2, E3, Z0, and Eset)
Inferring noncompensatory choice heuristics
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 121-128).Human decision making is a topic of great interest to marketers, psychologists, economists, and others. People are often modeled as rational utility maximizers with unlimited mental resources. However, due to the structure of the environment as well as cognitive limitations, people frequently use simplifying heuristics for making quick yet accurate decisions. In this research, we apply discrete optimization to infer from observed data if a person is behaving in way consistent with a choice heuristic (e.g., a noncompensatory lexicographic decision rule). We analyze the computational complexity of several inference related problems, showing that while some are easy due to possessing a greedoid language structure, many are hard and likely do not have polynomial time solutions. For the hard problems we develop an exact dynamic programming algorithm that is robust and scalable in practice, as well as analyze several local search heuristics. We conduct an empirical study of SmartPhone preferences and find that the behavior of many respondents can be explained by lexicographic strategies.(cont.) Furthermore, we find that lexicographic decision rules predict better on holdout data than some standard compensatory models. Finally, we look at a more general form of noncompensatory decision process in the context of consideration set formation. Specifically, we analyze the computational complexity of rule-based consideration set formation, develop solution techniques for inferring rules given observed consideration data, and apply the techniques to a real dataset.by Michael J. Yee.Ph.D
Probabilistic Computability and Choice
We study the computational power of randomized computations on infinite
objects, such as real numbers. In particular, we introduce the concept of a Las
Vegas computable multi-valued function, which is a function that can be
computed on a probabilistic Turing machine that receives a random binary
sequence as auxiliary input. The machine can take advantage of this random
sequence, but it always has to produce a correct result or to stop the
computation after finite time if the random advice is not successful. With
positive probability the random advice has to be successful. We characterize
the class of Las Vegas computable functions in the Weihrauch lattice with the
help of probabilistic choice principles and Weak Weak K\H{o}nig's Lemma. Among
other things we prove an Independent Choice Theorem that implies that Las Vegas
computable functions are closed under composition. In a case study we show that
Nash equilibria are Las Vegas computable, while zeros of continuous functions
with sign changes cannot be computed on Las Vegas machines. However, we show
that the latter problem admits randomized algorithms with weaker failure
recognition mechanisms. The last mentioned results can be interpreted such that
the Intermediate Value Theorem is reducible to the jump of Weak Weak
K\H{o}nig's Lemma, but not to Weak Weak K\H{o}nig's Lemma itself. These
examples also demonstrate that Las Vegas computable functions form a proper
superclass of the class of computable functions and a proper subclass of the
class of non-deterministically computable functions. We also study the impact
of specific lower bounds on the success probabilities, which leads to a strict
hierarchy of classes. In particular, the classical technique of probability
amplification fails for computations on infinite objects. We also investigate
the dependency on the underlying probability space.Comment: Information and Computation (accepted for publication
- …