    Towards Practical Typechecking for Macro Tree Transducers

    Macro tree transducers (mtt) are an important model that both covers many useful XML transformations and allows decidable exact typechecking. This paper reports our first step toward an implementation of mtt typechecker that has a practical efficiency. Our approach is to represent an input type obtained from a backward inference as an alternating tree automaton, in a style similar to Tozawa's XSLT0 typechecking. In this approach, typechecking reduces to checking emptiness of an alternating tree automaton. We propose several optimizations (Cartesian factorization, state partitioning) on the backward inference process in order to produce much smaller alternating tree automata than the naive algorithm, and we present our efficient algorithm for checking emptiness of alternating tree automata, where we exploit the explicit representation of alternation for local optimizations. Our preliminary experiments confirm that our algorithm has a practical performance that can typecheck simple transformations with respect to the full XHTML in a reasonable time

    Frontiers of tractability for typechecking simple XML transformations

    AbstractTypechecking consists of statically verifying whether the output of an XML transformation is always conform to an output type for documents satisfying a given input type. We focus on complete algorithms which always produce the correct answer. We consider top–down XML transformations incorporating XPath expressions and abstract document types by grammars and tree automata. By restricting schema languages and transformations, we identify several practical settings for which typechecking can be done in polynomial time. Moreover, the resulting framework provides a rather complete picture as we show that most scenarios cannot be enlarged without rendering the typechecking problem intractable. So, the present research sheds light on when to use fast complete algorithms and when to reside to sound but incomplete ones

    Static Analysis of Graph Database Transformations

    We investigate graph transformations, defined using Datalog-like rules based on acyclic conjunctive two-way regular path queries (acyclic C2RPQs), and we study two fundamental static analysis problems: type checking and equivalence of transformations in the presence of graph schemas. Additionally, we investigate the problem of target schema elicitation, which aims to construct a schema that closely captures all outputs of a transformation over graphs conforming to the input schema. We show all these problems are in EXPTIME by reducing them to C2RPQ containment modulo schema; we also provide matching lower bounds. We use cycle reversing to reduce query containment to the problem of unrestricted (finite or infinite) satisfiability of C2RPQs modulo a theory expressed in a description logic

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mĂ©moire est une courte prĂ©sentation d’une direction de recherche, Ă  laquelle j’ai activementparticipĂ©, sur l’apprentissage pour les bases de donnĂ©es Ă  partir d’exemples. Cette recherches’est concentrĂ©e sur les modĂšles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schĂ©ma d’une base de donnĂ©es. Concernantles schĂ©mas, nos contributions consistent en plusieurs langages de schĂ©mas pour les nouveaumodĂšles de bases de donnĂ©es que sont XML non-ordonnĂ© et RDF. Nous avons ainsi Ă©tudiĂ©l’apprentissage Ă  partir d’exemples des schĂ©mas pour XML non-ordonnĂ©, des schĂ©mas pour RDF,des requĂȘtes twig pour XML, les requĂȘtes de jointure pour bases de donnĂ©es relationnelles et lestransformations XML dĂ©finies par un nouveau modĂšle de transducteurs arbre-Ă -mot.Pour explorer si les langages proposĂ©s peuvent ĂȘtre appris, nous avons Ă©tĂ© obligĂ©s d’examinerde prĂšs un certain nombre de leurs propriĂ©tĂ©s fondamentales, souvent souvent intĂ©ressantespar elles-mĂȘmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohĂ©rence d’un ensemble d’exemples et la caractĂ©risation finie. Une bonne comprĂ©hension de cespropriĂ©tĂ©s nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement trĂšs vaste grĂące Ă  un ensemble d’opĂ©rations de gĂ©nĂ©ralisation adaptĂ© Ă la recherche d’une solution appropriĂ©e.L’apprentissage (ou l’infĂ©rence) est un problĂšme Ă  deux paramĂštres : la classe prĂ©cise delangage que nous souhaitons infĂ©rer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placĂ©s dans le cas oĂč l’utilisateur fournit des exemples positifs, c’est-Ă -dire desĂ©lĂ©ments qui appartiennent au langage cible, ainsi que des exemples nĂ©gatifs, c’est-Ă -dire qui n’enfont pas partie. En gĂ©nĂ©ral l’utilisation Ă  la fois d’exemples positifs et nĂ©gatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples nĂ©gatifs est souvent difficile parce que les exemples positifs et nĂ©gatifspeuvent rendre la forme de l’espace de recherche trĂšs complexe, et par consĂ©quent, son explorationinfaisable

    Programming Using Automata and Transducers

    Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

    XQTC: A Static Type-Checker for XQuery Using Backward Type Inference

    We present a novel technique and a tool for static type-checking of XQuery programs. The tool looks for errors in the program by jointly analyzing the source code of the program, input and output schemas that respectively describe the sets of documents admissible as input and as output of the program. The crux and the novelty of our results reside in the joint use of backward type inference and a two-way logic to represent inferred tree type portions. This allowed us to design and implement a type-checker for XQuery which is more precise and supports a larger XQuery fragment compared to the approaches previously proposed in the literature; in particular compared to the only few actually implemented static type-checkers such as the one in Galax. The whole system uses compilers and a satisfiability solver for deciding containment for two-way regular tree expressions. Our tool takes an XQuery program and two schemas Sin and Sout as input. If the program is found incorrect, then it automatically generates a counter-example valid w.r.t. Sin and such that the program produces an invalid output w.r.t Sout. This counter-example can be used by the programmer to fix the program.Nous présentons une technique nouvelle et un outil pour le contrôle de type statique des programmes XQuery. L'outil recherche les erreurs dans le programme en analysant à la fois le code source du programme et les schémas d'entrée et de sortie qui décrivent respectivement les ensembles de documents admissibles en entrée et en sortie. L'originalité de nos résultats réside dans l'utilisation conjointe de l'inférence de type arrière et d'une logique avec programmes inverses pour représenter des fragments de types d'arbre. Cela nous a permis de concevoir et de réaliser un contrôleur de type pour XQuery qui est plus précis et supporte un fragment de XQuery plus large que les approches proposées précédemment dans la littérature, en particulier si on se réfère aux quelques contrôleurs de type statiques effectivement réalisés, tel que celui de Galax. L'ensemble du système utilise des compilateurs et un solveur pour décider de l'inclusion des expressions d'arbres régulières bi-directionnelles. Notre outil prend en entrée un programme XQuery et deux schémas Sin et Sout. Si le programme est reconnu incorrect, l'outil engendre automatiquement un contre-exemple valide vis-à-vis de Sin et tel que le programme produise un résultat invalide vis-à-vis de Sout. Ce contre-exemple peut alors être utilisé par le programmeur pour corriger son programme

    Formal aspects of component software

    This is the pre-proceedings of 6th International Workshop on Formal Aspects of Component Software (FACS'09)

    Acta Cybernetica : Volume 18. Number 3.

    Proceedings of the 4th DIKU-IST Joint Workshop on the Foundations of Software

