7 research outputs found

    Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking

    Get PDF
    The inclusion of Regular Expressions (REs) is the kernel of any type-checking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In Colazzo et al. (2009) [1] we introduced a notion of ?conflict-free REs?, which are extended REs with excellent complexity behaviour, including a polynomial inclusion algorithm [1] and linear membership (Ghelli et al., 2008 [2]). Conflict-free REs have interleaving and counting, but the complexity is tamed by the ?conflict-free? limitations, which have been found to be satisfied by the vast majority of the content models published on the Web.However, a type-checking algorithm needs to compare machine-generated subtypes against human-defined supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the subtype. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free.This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [1]). The result is extremely surprising, since we had previously found that symmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here

    Distributed XML Design

    Get PDF
    A distributed XML document is an XML document that spans several machines. We assume that a distribution design of the document tree is given, consisting of an XML kernel-document T[f1,...,fn] where some leaves are "docking points" for external resources providing XML subtrees (f1,...,fn, standing, e.g., for Web services or peers at remote locations). The top-down design problem consists in, given a type (a schema document that may vary from a DTD to a tree automaton) for the distributed document, "propagating" locally this type into a collection of types, that we call typing, while preserving desirable properties. We also consider the bottom-up design which consists in, given a type for each external resource, exhibiting a global type that is enforced by the local types, again with natural desirable properties. In the article, we lay out the fundamentals of a theory of distributed XML design, analyze problems concerning typing issues in this setting, and study their complexity.Comment: "56 pages, 4 figures

    Flexible query processing of SPARQL queries

    Get PDF
    SPARQL is the predominant language for querying RDF data, which is the standard model for representing web data and more specifically Linked Open Data (a collection of heterogeneous connected data). Datasets in RDF form can be hard to query by a user if she does not have a full knowledge of the structure of the dataset. Moreover, many datasets in Linked Data are often extracted from actual web page content which might lead to incomplete or inaccurate data. We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously introduced in the context of regular path queries. Using these operators we are able to support exible querying over the property path queries of SPARQL 1.1. We call this new language SPARQLAR. Using SPARQLAR users are able to query RDF data without fully knowing the structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This means that users can access complex and heterogeneous datasets without the need to know precisely how the data is structured. One of the open problems we address is how to combine the APPROX and RELAX operators with a pragmatic language such as SPARQL. We also devise an implementation of a system that evaluates SPARQLAR queries in order to study the performance of the new language. We begin by defining the semantics of SPARQLAR and the complexity of query evaluation. We then present a query processing technique for evaluating SPARQLAR queries based on a rewriting algorithm and prove its soundness and completeness. During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1 queries that are evaluated against the dataset. Each such query will generate answers with a cost that indicates their distance with respect to the exact form of the original SPARQLAR query. Our prototype implementation incorporates three optimisation techniques that aim to enhance query execution performance: the first optimisation is a pre-computation technique that caches the answers of parts of the queries generated by the rewriting algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard queries that it is known will not return any answer. The third optimisation technique uses the query containment concept to discard queries whose answers would be returned by another query at the same or lower cost. We conclude by conducting a performance study of the system on three different RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia

    Context-free games on strings and nested words

    Get PDF
    Kontextfreie Spiele sind ein formales Modell, welches in seiner einfachsten Form den Ableitungsmechanismus kontextfreier Grammatiken zu einem Spiel für zwei Spieler (genannt Juliet und Romeo) verallgemeinert; dabei wählt in einer gegebenen Satzform (d.h. einer Zeichenkette aus Terminal- und Nichtterminalsymbolen) jeweils Juliet ein zu ersetzendes Nichtterminalsymbol aus, worauf Romeo jeweils entsprechend den Ableitungsregeln entscheidet, wodurch dieses Nichtterminalsymbol ersetzt werden soll. Die Gewinnbedingung für Juliet in einem solchen Spiel ist das Erreichen einer Satzform aus einer gegebenen Zielsprache, wohingegen Romeo die Aufgabe hat, dies zu verhindern. Das zentrale algorithmische Problem in kontextfreien Spielen stellt die Frage, gegeben ein Spiel und eine initiale Satzform, ob Juliet in dem gegebenen Spiel auf der Satzform eine Gewinnstrategie hat. Die zentrale praktische Anwendung kontextfreier Spiele liegt in der Modellierung von Active XML-Dokumenten, d.h. XML-Dokumenten, die Referenzen auf externe Quellen enthalten, welche zur Laufzeit aufgerufen werden können um aktuelle Daten in das Dokument einzufügen. Vor diesem Hintergrund ist es sinnvoll, Erweiterungen der oben genannten kontextfreien Spiele auf verschachtelte Wörter zu betrachten, also auf XML-artige Linearisierungen von Bäumen in Zeichenketten. Weitere praktisch motivierte Verallgemeinerungen beinhalten unter anderem die Modellierung von syntaktischer oder semantischer Behandlung von Aufrufparametern beim Aufruf externer Referenzen. Ziel dieser Dissertation ist, einen weitgehend vollständigen Überblick über den aktuellen Stand der Forschung zu kontextfreien Spielen auf Zeichenketten und verschachtelten Wörtern zu liefern. Dazu liefert sie jeweils komplexitätstheoretische Klassifizierungen des Gewinnproblems (und verwandter Probleme) für diverse Varianten kontextfreie Spiele auf Zeichenketten und verschachtelten Wörtern und gibt einen Überblick über Beweismethoden und algorithmische Techniken zur Behandlung dieses Gewinnproblems. Als Teil dieser Betrachtung stellt sie darüber hinaus grundlegende Ergebnisse zu relevanten Automatenmodellen auf verschachtelten Wörtern dar, darunter Varianten von alternierenden Automaten und Transducern
    corecore