12 research outputs found
XPath satisfiability in the presence of DTDs
We study the satisfiability problem associated with XPath in the presence of DTDs. This is the problem of determining, given a query p in an XPath fragment and a DTD D, whether or not there exists an XML document T such that T conforms to D and the answer of p on T is nonempty. We consider a variety of XPath fragments widely used in practice, and investigate the impact of different XPath operators on the satisfiability analysis. We first study the problem for negation-free XPath fragments with and without upward axes, recursion and data-value joins, identifying which factors lead to tractability and which to NP-completeness. We then turn to fragments with negation but without data values, establishing lower and upper bounds in the absence and in the presence of upward modalities and recursion. We show that with negation the complexity ranges from PSPACE to EXPTIME. Moreover, when both data values and negation are in place, we find that the complexity ranges from NEXPTIME to undecidable. Furthermore, we give a finer analysis of the problem for particular classes of DTDs, exploring the impact of various DTD constructs, identifying tractable cases, as well as providing the complexity in the query size alone. Finally, we investigate the problem for XPath fragments with sibling axes, exploring the impact of horizontal modalities on the satisfiability analysis. © 2008 ACM
Mu-Calculus Based Resolution of XPath Decision Problems
XPath is the standard declarative notation for navigating XML data and returning a set of matching nodes. In the context of XSLT/XQuery analysis, query optimization, and XML type checking, XPath decision problems arise naturally. They notably include XPath containment (whether or not for any tree the result of a particular query is included in the result of a second one), and XPath satisfiability (whether or not an expression yields a non-empty result), in the presence (or the absence) of XML DTDs. In this paper, we propose a unifying logic for XML, namely the alternation-free modal mu-calculus with converse. We show how to translate major XML concepts such as XPath and DTDs into this logic. Based on these embeddings, we show how XPath decision problems can be solved using a state-of-the-art EXPTIME decision procedure for mu-calculus satisfiability. We provide preliminary experiments which shed light, for the first time, on the cost of solving XPath decision problems in practice
Web and Semantic Web Query Languages
A number of techniques have been developed to facilitate
powerful data retrieval on the Web and Semantic Web. Three categories
of Web query languages can be distinguished, according to the format
of the data they can retrieve: XML, RDF and Topic Maps. This article
introduces the spectrum of languages falling into these categories
and summarises their salient aspects. The languages are introduced using
common sample data and query types. Key aspects of the query
languages considered are stressed in a conclusion
Schemas for Unordered XML on a DIME
We investigate schema languages for unordered XML having no relative order
among siblings. First, we propose unordered regular expressions (UREs),
essentially regular expressions with unordered concatenation instead of
standard concatenation, that define languages of unordered words to model the
allowed content of a node (i.e., collections of the labels of children).
However, unrestricted UREs are computationally too expensive as we show the
intractability of two fundamental decision problems for UREs: membership of an
unordered word to the language of a URE and containment of two UREs.
Consequently, we propose a practical and tractable restriction of UREs,
disjunctive interval multiplicity expressions (DIMEs).
Next, we employ DIMEs to define languages of unordered trees and propose two
schema languages: disjunctive interval multiplicity schema (DIMS), and its
restriction, disjunction-free interval multiplicity schema (IMS). We study the
complexity of the following static analysis problems: schema satisfiability,
membership of a tree to the language of a schema, schema containment, as well
as twig query satisfiability, implication, and containment in the presence of
schema. Finally, we study the expressive power of the proposed schema languages
and compare them with yardstick languages of unordered trees (FO, MSO, and
Presburger constraints) and DTDs under commutative closure. Our results show
that the proposed schema languages are capable of expressing many practical
languages of unordered trees and enjoy desirable computational properties.Comment: Theory of Computing System
XML Schema Mappings: Data Exchange and Metadata Management
Relational schema mappings have been extensively studied in connection with data integration and exchange problems, but mappings between XML schemas have not received the same amount of attention. Our goal is to develop a theory of expressive XML schema mappings. Such mappings should be able to use various forms of navigation in a document, and specify conditions on data values. We develop a language for XML schema mappings, and study both data exchange with such mappings and metadata management problems. Specifically, we concentrate on four types of problems: complexity of mappings, query answering, consistency issues, and composition. We first analyze the complexity of mappings, i.e., recognizing pairs of documents such that one can be mapped into the other, and provide a classification based on sets of features used in mappings. Next, we chart the tractability frontier for the query answering problem. We show that the problem is tractable for expressive schema mappings and simple queries, but not vice versa. Then we move to static analysis. We study the complexity of the consistency problem, i.e., deciding whether it is possible to map some document of a source schema into a document of the target schema. Finally, we look at composition of XML schema mappings. We analyze its complexity and show that it is harder to achieve closure under composition for XML than for relational mappings. Nevertheless, we find a robust class of XML schema mappings that, i
Child Prime Label Approaches to Evaluate XML Structured Queries
The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets
Data Integration on the (Semantic) Web with Rules and Rich Unification
For the last decade a multitude of new data formats for the World Wide Web
have been developed, and a huge amount of heterogeneous semi-structured data
is flourishing online. With the ever increasing number of documents on the
Web, rules have been identified as the means of choice for reasoning about
this
data, transforming and integrating it. Query languages such as SPARQL and rule
languages such as Xcerpt use compound queries that are matched or unified with
semi-structured data. This notion of unification is different from the one
that is known from logic programming engines in that it (i) provides
constructs that allow queries to be incomplete in several ways (ii) in that
variables may have different types, (iii) in that it results in sets of
substitutions for the variables in the query instead of a single substitution
and (iv) in that subsumption between queries is much harder to decide than in
logic programming.
This thesis abstracts from Xcerpt query term simulation, SPARQL graph pattern
matching and XPath XML document matching, and shows that all of them can be
considered as a form of rich unification. Given a set of mappings between
substitution sets of different languages, this abstraction opens up the
possibility for format-versatile querying, i.e. combination of queries in
different formats, or transformation of one format into another format within
a single rule.
To show the superiority of this approach, this thesis introduces an extension
of Xcerpt called Xcrdf, and describes use-cases for the combined querying
and integration of RDF and XML data. With XML being the predominant Web
format, and RDF the predominant Semantic Web format, Xcrdf extends Xcerpt
by a set of RDF query terms and construct terms, including query primitives
for RDF containers collections and reifications. Moreover, Xcrdf includes
an RDF path query language called RPL that is more expressive than previously
proposed polynomial-time RDF path query languages, but can still be evaluated
in polynomial time combined complexity.
Besides the introduction of this framework for data integration based on rich
unification, this thesis extends the theoretical knowledge about Xcerpt in
several ways: We show that Xcerpt simulation unification is decidable, and
give complexity bounds for subsumption in several fragments of Xcerpt query
terms. The proof is based on a set of subsumption monotone query term
transformations, and is only feasible because of the injectivity requirement
on subterms of Xcerpt queries. The proof gives rise to an algorithm for
deciding Xcerpt query term simulation. Moreover, we give a semantics to
locally and weakly stratified Xcerpt programs, but this semantics is
applicable not only to Xcerpt, but to any rule language with rich unification,
including multi-rule SPARQL programs. Finally, we show how Xcerpt grouping
stratification can be reduced to Xcerpt negation stratification, thereby also
introducing the notion of local grouping stratification and weak grouping
stratification
Regular Rooted Graph Grammars
In dieser Arbeit wir ein pragmatischer Ansatz zur Typisierung, statischen Analyse und Optimierung von Web-Anfragespachen, speziell Xcerpt, untersucht. Pragmatisch ist der Ansatz in dem Sinne, dass dem Benutzer keinerlei Einschränkungen aus Entscheidbarkeits- oder Effizienzgründen auf modellierbare Typen gestellt werden. Effizienz und Entscheidbarkeit werden stattdessen, falls nötig, durch Vergröberungen bei der Typprüfung erkauft.
Eine Typsprache zur Typisierung von Graph-strukturierten Daten im Web wird eingeführt. Modellierbare Graphen sind so genannte gewurzelte Graphen, welche aus einem Spannbaum und Querreferenzen aufgebaut sind. Die Typsprache basiert auf
reguläre Baum Grammatiken, welche um typisierte Referenzen erweitert wurde. Neben wie im Web mit XML üblichen geordneten strukturierten Daten, sind auch ungeordnete Daten, wie etwa in Xcerpt oder RDF üblich, modellierbar. Der dazu verwendete Ansatz---ungeordnete Interpretation Regulärer Ausdrücke---ist neu. Eine operationale Semantik für geordnete wie ungeordnete Typen wird auf Basis spezialisierter Baumautomaten und sog. Counting Constraints (welche wiederum auf presburgerarithmetische Ausdrücke) basieren. Es wird ferner statische Typ-Prüfung und -Inferenz von Xcerpt Anfrage- und Konstrukttermen, wie auch Optimierung von Xcerpt Anfragen auf Basis von Typinformation eingeführt.This thesis investigates a pragmatic approach to typing, static analysis and static
optimization of Web query languages, in special the Web query language Xcerpt. The
approach is pragmatic in the sense, that no restriction on the types are made for
decidability or efficiency reasons, instead precision is given up if necessary.
Pragmatics on the dynamic side means to use types not only to ensure validity of objects
operating on, but also influencing query selection based on types.
A typing language for typing of graph structured data on the Web is introduced.
The Graphs in mind are based on spanning trees with references, the typing languages
is based on regular tree grammars with typed reference extensions. Beside ordered data
in the spirit of XML, unordered data (i.e. in the spirit of the Xcerpt data model or
RDF) can be modelled using regular expressions under unordered interpretation – this
approach is new. An operational semantics for ordered and unordered types is given
based on specialized regular tree automata and counting constraints (them again based
on Presburger arithmetic formulae). Static type checking of Xcerpt query and construct
terms is introduced, as well as optimization of Xcerpt query terms based on schema
information
Enhancing systems biology models through semantic data integration
Studying and modelling biology at a systems level requires a large amount of data of different experimental types. Historically, each of these types is stored in its own distinct format, with its own internal structure for holding the data produced by those experiments. While the use of community data standards can reduce the need for specialised, independent formats by providing a common syntax, standards uptake is not universal and a single standard cannot yet describe all biological data. In the work described in this thesis, a variety of integrative methods have been developed to reuse and restructure already extant systems biology data. SyMBA is a simple Web interface which stores experimental metadata in a published, common format. The creation of accurate quantitative SBML models is a time-intensive manual process. Modellers need to understand both the systems they are modelling and the intricacies of the SBML format. However, the amount of relevant data for even a relatively small and well-scoped model can be overwhelming. Saint is a Web application which accesses a number of external Web services and which provides suggested annotation for SBML and CellML models. MFO was developed to formalise all of the knowledge within the multiple SBML specification documents in a manner which is both human and computationally accessible. Rule-based mediation, a form of semantic data integration, is a useful way of reusing and re-purposing heterogeneous datasets which cannot, or are not, structured according to a common standard. This method of ontology-based integration is generic and can be used in any context, but has been implemented specifically to integrate systems biology data and to enrich systems biology models through the creation of new biological annotations. The work described in this thesis is one step towards the formalisation of biological knowledge useful to systems biology. Experimental metadata has been transformed into common structures, a Web application has been created for the retrieval of data appropriate to the annotation of systems biology models and multiple data models have been formalised and made accessible to semantic integration techniques.EThOS - Electronic Theses Online ServiceBBSRCEPSRCGBUnited Kingdo