422 research outputs found

    LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

    Full text link
    The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur

    SUMA: A Partial Materialization-Based Scalable Query Answering in OWL 2 DL

    Get PDF
    AbstractOntology-mediated querying (OMQ) provides a paradigm for query answering according to which users not only query records at the database but also query implicit information inferred from ontology. A key challenge in OMQ is that the implicit information may be infinite, which cannot be stored at the database and queried by off -the -shelf query engine. The commonly adopted technique to deal with infinite entailments is query rewriting, which, however, comes at the cost of query rewriting at runtime. In this work, the partial materialization method is proposed to ensure that the extension is always finite. The partial materialization technology does not rewrite query but instead computes partial consequences entailed by ontology before the online query. Besides, a query analysis algorithm is designed to ensure the completeness of querying rooted and Boolean conjunctive queries over partial materialization. We also soundly and incompletely expand our method to support highly expressive ontology language, OWL 2 DL. Finally, we further optimize the materialization efficiency by role rewriting algorithm and implement our approach as a prototype system SUMA by integrating off-the-shelf efficient SPARQL query engine. The experiments show that SUMA is complete on each test ontology and each test query, which is the same as Pellet and outperforms PAGOdA. Besides, SUMA is highly scalable on large datasets

    Doctor of Philosophy

    Get PDF
    dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data

    Peer-based query rewriting in SPARQL for semantic integration of linked data

    Get PDF
    In this proposal we address the problem of ontology-based SPARQL query answering over distributed Linked Data sources, where the ontology is given by conjunctive mappings between the source schemas in a peer-to-peer fashion and by equality constraints between constants. In our setting, the data is not materialised in a single datastore: it is accessed in a distributed environment through SPARQL endpoints. We aim to achieve query answering by generating the perfect rewriting of the original query and then processing the rewritten query over distributed SPARQL endpoints. We identify a subset of ontology constraints that enjoy the first-order rewritability property and we perform preliminary empirical evaluation taking into account such restricted constraints only. For future work, we aim to tackle the query answering problem in the general case

    GUN: An Efficient Execution Strategy for Querying the Web of Data

    Get PDF
    International audienceLocal-As-View (LAV) mediators provide a uniform interface to a federation of heterogeneous data sources, attempting to execute queries against the federation. LAV mediators rely on query rewriters to translate mediator queries into equivalent queries on the federated data sources. The query rewriting problem in LAV mediators has shown to be NP-complete, and there may be an exponential number of rewritings, making unfeasible the execution or even generation of all the rewritings for some queries. The complexity of this problem can be particularly impacted when queries and data sources are described using SPARQL conjunctive queries, for which millions of rewritings could be generated. We aim at providing an efficient solution to the problem of executing LAV SPARQL query rewritings while the gathered answer is as complete as possible. We formulate the Result-Maximal k-Execution problem (ReMakE) as the problem of maximizing the query results obtained from the execution of only k rewritings. Additionally, a novel query execution strategy called GUN is proposed to solve the ReMakE problem. Our experimental evaluation demonstrates that GUN outperforms traditional techniques in terms of answer completeness and execution time

    Optimizing Analytical Queries over Semantic Web Sources

    Get PDF

    Efficient and scalable techniques for minimization and rewriting of conjunctive queries

    Get PDF
    Query rewriting as an approach to query answering has been a challenging issue in database and information integration systems. In general, rewriting of a conjunctive query Q using a set of views in conjunctive form consists of two phases: (1) generating proper building blocks using the views, and (2) combining them to generate a union of conjunctive queries which is maximally contained in Q . While the problem of query rewriting is known to be exponential in the number of subgoals of Q , there is a demand for increased efficiency for practical queries. We revisit this problem for conjunctive queries, and show that Stirling numbers can be used to determine the optimal number of combinations in the second phase, and hence the number of rules in the generated union of conjunctive queries. Based on these numbers, we introduce the notion of combination patterns and develop a rewriting algorithm that uses these numeral patterns to break down the large combinatorial problem in the second phase into several smaller ones. The results of our numerous experiments indicate that the proposed rewriting technique outperforms existing techniques including Minicon algorithm in terms of computation time, memory requirements, and scalability. On a related context, we studied query minimization, motivated by the fact that queries with fewer or no redundant subgoals can be evaluated faster, in general. However, such redundancies are often present in automatically generated queries. We propose an algorithm that, given a conjunctive query, repeatedly identifies and eliminates all the redundant subgoals. We also illustrate its performance superiority over existing minimization algorithms. It has been shown that query rewriting naturally generates queries with redundant subgoals. We also show that redundant subgoals in the input of query rewriting result in redundant rules in its output. Based on this, we investigate the impact of minimization as pre-processing and post-processing phases to query rewriting technique. Our experimental results using different synthetic data show that our query rewriting technique coupled with pre/post minimization phases produces the best quality of rewriting in a more efficient way compared to existing rewriting techniques, including the Treewise algorithm. It has been shown that extending conjunctive queries with constraints adds to the complexity of query rewriting. Previous studies identified classes of conjunctive queries with constraints in the form of arithmetic comparisons for which the complexity of rewriting does not change. Such classes are said to satisfy homomorphism property. We identify new classes of conjunctive queries with linear arithmetic constraints that enjoy this property, and extend our query rewriting algorithm accordingly to support such queries

    Complete and equivalent query rewriting using views.

    Get PDF
    corecore