27 research outputs found

    Web and Semantic Web Query Languages

    Get PDF
    A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

    Dokumentverifikation mit Temporaler Beschreibungslogik

    Get PDF
    The thesis proposes a new formal framework for checking the content of web documents along individual reading paths. It is vital for the readability of web documents that their content is consistent and coherent along the possible browsing paths through the document. Manually ensuring the coherence of content along the possibly huge number of different browsing paths in a web document is time-consuming and error-prone. Existing methods for document validation and verification are not sufficiently expressive and efficient. The innovative core idea of this thesis is to combine the temporal logic CTL and description logic ALC for the representation of consistency criteria. The resulting new temporal description logics ALCCTL can - in contrast to existing specification formalisms - compactly represent coherence criteria on documents. Verification of web documents is modelled as a model checking problem of ALCCTL. The decidability and polynomial complexity of the ALCCTL model checking problem is proven and a sound, complete, and optimal model checking algorithm is presented. Case studies on real and realistic web documents demonstrate the performance and adequacy of the proposed methods. Existing methods such as symbolic model checking or XML-based document validation are outperformed in both expressiveness and speed.Die Dissertation stellt ein neues formales Framework für die automatische Prüfung inhaltlich-struktureller Konsistenzkriterien an Web-Dokumente vor. Viele Informationen werden heute in Form von Web-Dokumenten zugänglich gemacht. Komplexe Dokumente wie Lerndokumente oder technische Dokumentationen müssen dabei vielfältige Qualitätskriterien erfüllen. Der Informationsgehalt des Dokuments muss aktuell, vollständig und in sich stimmig sein. Die Präsentationsstruktur muss unterschiedlichen Zielgruppen mit unterschiedlichen Informationsbedürfnissen genügen. Die Sicherstellung grundlegender Konsistenzeigenschaften von Dokumenten ist angesichts der Vielzahl der Anforderungen und Nutzungskontexte eines elektronischen Dokuments nicht trivial. In dieser Arbeit werden aus der Hard-/Softwareverifikation bekannte Model-Checking-Verfahren mit Methoden zur Repräsentation von Ontologien kombiniert, um sowohl die Struktur des Dokuments als auch inhaltliche Zusammenhänge bei der Prüfung von Konsistenzkriterien berücksichtigen zu können. Als Spezifikationssprache für Konsistenzkriterien wird die neue temporale Beschreibungslogik ALCCTL vorgeschlagen. Grundlegende Eigenschaften wie Entscheidbarkeit, Ausdruckskraft und Komplexität werden untersucht. Die Adäquatheit und Praxistauglichkeit des Ansatzes werden in Fallstudien mit eLearning-Dokumenten evaluiert. Die Ergebnisse übertreffen bekannte Ansätze wie symbolisches Model-Checking oder Methoden zur Validierung von XML-Dokumenten in Performanz, Ausdruckskraft hinsichtlich der prüfbaren Kriterien und Flexibilität hinsichtlich des Dokumenttyps und -formats

    Implementation of Web Query Languages Reconsidered

    Get PDF
    Visions of the next generation Web such as the "Semantic Web" or the "Web 2.0" have triggered the emergence of a multitude of data formats. These formats have different characteristics as far as the shape of data is concerned (for example tree- vs. graph-shaped). They are accompanied by a puzzlingly large number of query languages each limited to one data format. Thus, a key feature of the Web, namely to make it possible to access anything published by anyone, is compromised. This thesis is devoted to versatile query languages capable of accessing data in a variety of Web formats. The issue is addressed from three angles: language design, common, yet uniform semantics, and common, yet uniform evaluation. % Thus it is divided in three parts: First, we consider the query language Xcerpt as an example of the advocated class of versatile Web query languages. Using this concrete exemplar allows us to clarify and discuss the vision of versatility in detail. Second, a number of query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated into a common intermediary language, CIQLog. This language has a purely logical semantics, which makes it easily amenable to optimizations. As a side effect, this provides the, to the best of our knowledge, first logical semantics for XQuery and SPARQL. It is a very useful tool for understanding the commonalities and differences of the considered languages. Third, the intermediate logical language is translated into a query algebra, CIQCAG. The core feature of CIQCAG is that it scales from tree- to graph-shaped data and queries without efficiency losses when tree-data and -queries are considered: it is shown that, in these cases, optimal complexities are achieved. CIQCAG is also shown to evaluate each of the aforementioned query languages with a complexity at least as good as the best known evaluation methods so far. For example, navigational XPath is evaluated with space complexity O(q d) and time complexity O(q n) where q is the query size, n the data size, and d the depth of the (tree-shaped) data. CIQCAG is further shown to provide linear time and space evaluation of tree-shaped queries for a larger class of graph-shaped data than any method previously proposed. This larger class of graph-shaped data, called continuous-image graphs, short CIGs, is introduced for the first time in this thesis. A (directed) graph is a CIG if its nodes can be totally ordered in such a manner that, for this order, the children of any node form a continuous interval. CIQCAG achieves these properties by employing a novel data structure, called sequence map, that allows an efficient evaluation of tree-shaped queries, or of tree-shaped cores of graph-shaped queries on any graph-shaped data. While being ideally suited to trees and CIGs, the data structure gracefully degrades to unrestricted graphs. It yields a remarkably efficient evaluation on graph-shaped data that only a few edges prevent from being trees or CIGs

    Complex Event Processing with XChangeEQ

    Get PDF
    The emergence of event-driven architectures, automation of business processes, drastic cost-reductions in sensor technology, and a growing need to monitor IT systems (as well as other systems) due to legal, contractual, or operational considerations lead to an increasing generation of events. This development is accompanied by a growing demand for managing and processing events in an automated and systematic way. Complex Event Processing (CEP) encompasses the (automatable) tasks involved in making sense of all events in a system by deriving higher-level knowledge from lower-level events while the events occur, i.e., in a timely, online fashion and permanently. At the core of CEP are queries which monitor streams of "simple" events for so-called complex events, that is, events or situations that manifest themselves in certain combinations of several events occurring (or not occurring) over time and that cannot be detected from looking only at single events. Querying events is fundamentally different from traditional querying and reasoning with database or Web data, since event queries are standing queries that are evaluated permanently over time against incoming streams of event data. In order to express complex events that are of interest to a particular application or user in a convenient, concise, cost-effective and maintainable manner, special purpose Event Query Languages (EQLs) are needed. This thesis investigates practical and theoretical issues related to querying complex events, covering the spectrum from language design over declarative semantics to operational semantics for incremental query evaluation. Its central topic is the development of the high-level event query language XChangeEQ. In contrast to previous data stream and event query languages, XChangeEQ's language design recognizes the four querying dimensions of data extractions, event composition, temporal relationships, and, for non-monotonic queries involving negation or aggregation, event accumulation. XChangeEQ deals with complex structured data in event messages, thus addressing the need to query events communicated in XML formats over the Web. It supports deductive rules as an abstraction and reasoning mechanism for events. To achieve a full coverage of the four querying dimensions, it builds upon a separation of concerns of the four querying dimensions, which makes it easy-to-use and highly expressive. A recurrent theme in the formal foundations of XChangeEQ is that, despite the fundamental differences between traditional database queries and event queries, many well-known results from databases and logic programming are, with some importance changes, applicable to event queries. Declarative semantics for XChangeEQ are given as a (Tarski-style) model theory with accompanying fixpoint theory. This approach accounts well for (1) data in events and (2) deductive rules defining new events from existing ones, two aspects often neglected in previous work of semantics of EQLs. For the evaluation of event queries, this work introduces operational semantics based on an extended and tailored form of relational algebra and query plans with materialization points. Materialization points account for storing and maintaining information about those received events that are relevant for, i.e., can contribute to, future query answers, as well as for an incremental evaluation that avoids recomputing certain intermediate results. Efficient state maintenance in incremental evaluation is approached by "differentiating" algebra expressions, i.e., by deriving expressions for computing only the changes to materialization points. Knowing how long an event is relevant is a prerequisite for performing garbage collection during event query evaluation and also of central importance for developing cost-based query planners. To this end, this thesis introduces a notion of relevance of events (to a given query plan) and develops methods for determining temporal relevance, a particularly useful form based on time-related information

    Overview of query optimization in XML database systems

    Get PDF

    A Decision Procedure for XPath Containment

    Get PDF
    XPath is the standard language for addressing parts of an XML document. We present a sound and complete decision procedure for containment of XPath queries. The considered XPath fragment covers most of the language features used in practice. Specifically, we show how XPath queries can be translated into equivalent formulas in monadic second-order logic. Using this translation, we construct an optimized logical formulation of the containment problem, which is decided using tree automata. When the containment relation does not hold between two XPath expressions, a counter-example XML tree is generated. We provide a complexity analysis together with practical experiments that illustrate the efficiency of the decision procedure for realistic scenarios

    An Algebraic Approach to XQuery Optimization

    Get PDF
    As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered
    corecore