47 research outputs found

    The Fourth International VLDB Workshop on Management of Uncertain Data

    Get PDF

    QUERY FROM EXAMPLES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Data Integration on the (Semantic) Web with Rules and Rich Unification

    Get PDF
    For the last decade a multitude of new data formats for the World Wide Web have been developed, and a huge amount of heterogeneous semi-structured data is flourishing online. With the ever increasing number of documents on the Web, rules have been identified as the means of choice for reasoning about this data, transforming and integrating it. Query languages such as SPARQL and rule languages such as Xcerpt use compound queries that are matched or unified with semi-structured data. This notion of unification is different from the one that is known from logic programming engines in that it (i) provides constructs that allow queries to be incomplete in several ways (ii) in that variables may have different types, (iii) in that it results in sets of substitutions for the variables in the query instead of a single substitution and (iv) in that subsumption between queries is much harder to decide than in logic programming. This thesis abstracts from Xcerpt query term simulation, SPARQL graph pattern matching and XPath XML document matching, and shows that all of them can be considered as a form of rich unification. Given a set of mappings between substitution sets of different languages, this abstraction opens up the possibility for format-versatile querying, i.e. combination of queries in different formats, or transformation of one format into another format within a single rule. To show the superiority of this approach, this thesis introduces an extension of Xcerpt called Xcrdf, and describes use-cases for the combined querying and integration of RDF and XML data. With XML being the predominant Web format, and RDF the predominant Semantic Web format, Xcrdf extends Xcerpt by a set of RDF query terms and construct terms, including query primitives for RDF containers collections and reifications. Moreover, Xcrdf includes an RDF path query language called RPL that is more expressive than previously proposed polynomial-time RDF path query languages, but can still be evaluated in polynomial time combined complexity. Besides the introduction of this framework for data integration based on rich unification, this thesis extends the theoretical knowledge about Xcerpt in several ways: We show that Xcerpt simulation unification is decidable, and give complexity bounds for subsumption in several fragments of Xcerpt query terms. The proof is based on a set of subsumption monotone query term transformations, and is only feasible because of the injectivity requirement on subterms of Xcerpt queries. The proof gives rise to an algorithm for deciding Xcerpt query term simulation. Moreover, we give a semantics to locally and weakly stratified Xcerpt programs, but this semantics is applicable not only to Xcerpt, but to any rule language with rich unification, including multi-rule SPARQL programs. Finally, we show how Xcerpt grouping stratification can be reduced to Xcerpt negation stratification, thereby also introducing the notion of local grouping stratification and weak grouping stratification

    Doctor of Philosophy

    Get PDF
    dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Topics in Programming Languages, a Philosophical Analysis through the case of Prolog

    Get PDF
    [EN]Programming languages seldom find proper anchorage in philosophy of logic, language and science. is more, philosophy of language seems to be restricted to natural languages and linguistics, and even philosophy of logic is rarely framed into programming languages topics. The logic programming paradigm and Prolog are, thus, the most adequate paradigm and programming language to work on this subject, combining natural language processing and linguistics, logic programming and constriction methodology on both algorithms and procedures, on an overall philosophizing declarative status. Not only this, but the dimension of the Fifth Generation Computer system related to strong Al wherein Prolog took a major role. and its historical frame in the very crucial dialectic between procedural and declarative paradigms, structuralist and empiricist biases, serves, in exemplar form, to treat straight ahead philosophy of logic, language and science in the contemporaneous age as well. In recounting Prolog's philosophical, mechanical and algorithmic harbingers, the opportunity is open to various routes. We herein shall exemplify some: - the mechanical-computational background explored by Pascal, Leibniz, Boole, Jacquard, Babbage, Konrad Zuse, until reaching to the ACE (Alan Turing) and EDVAC (von Neumann), offering the backbone in computer architecture, and the work of Turing, Church, Gödel, Kleene, von Neumann, Shannon, and others on computability, in parallel lines, throughly studied in detail, permit us to interpret ahead the evolving realm of programming languages. The proper line from lambda-calculus, to the Algol-family, the declarative and procedural split with the C language and Prolog, and the ensuing branching and programming languages explosion and further delimitation, are thereupon inspected as to relate them with the proper syntax, semantics and philosophical élan of logic programming and Prolog

    Using Ontologies to Improve Answer Quality in Databases

    Get PDF
    One of the known shortcomings of relational and XML databases is that they overlook the semantics of terms when answering queries. Ontologies constitute a useful tool to convey the semantics of terms in databases. However, the problem of effectively using semantic information from ontologies is challenging. We first address this problem for relational databases by the notion of an ontology extended relation (OER). An OER contains an ordinary relation as well as an associated ontology that conveys semantic meaning about the terms being used. We then extend the relational algebra to query OERs. We build a prototype for the OER model and show that the system scales to handle large datasets. We then propose the concept of a similarity enhanced ontology (SEO), which brings a notion of similarity to a graph ontology. We extend TAX, one of the best known algebras for XML databases, with SEOs. The result is our TOSS system that provides a much higher answer quality than TAX does alone. We experimentally evaluate the TOSS system on the DBLP and SIGMOD bibliographic databases and show that TOSS has acceptable performance. These two projects have involved ontology integration for supporting semantic queries across heterogeneous databases. We show how to efficiently compute the canonical witness to the integrability of graph ontologies given a set of interoperation constraints. We have also developed a polynomial algorithm to compute a minimal witness to the integrability of RDF ontologies under a set of Horn clauses and negative constraints, and experimentally show that our algorithm works very well on real-life ontologies and scales to massive ontologies. We finally present our work on ontology-based similarity measures for finding relationships between ontologies and searching similar objects. These measures are applicable to practical classification systems, where ontologies can be DAG-structured, objects can be labeled with multiple terms, and ambiguity can be introduced by an evolving ontology or classifiers with imperfect knowledge. The experiments on a bioinformatics application show that our measures outperformed previous approaches

    Fuzzy expert systems in civil engineering

    Get PDF
    Imperial Users onl
    corecore