259 research outputs found

    Semantic Query Reformulation in Social PDMS

    Full text link
    We consider social peer-to-peer data management systems (PDMS), where each peer maintains both semantic mappings between its schema and some acquaintances, and social links with peer friends. In this context, reformulating a query from a peer's schema into other peer's schemas is a hard problem, as it may generate as many rewritings as the set of mappings from that peer to the outside and transitively on, by eventually traversing the entire network. However, not all the obtained rewritings are relevant to a given query. In this paper, we address this problem by inspecting semantic mappings and social links to find only relevant rewritings. We propose a new notion of 'relevance' of a query with respect to a mapping, and, based on this notion, a new semantic query reformulation approach for social PDMS, which achieves great accuracy and flexibility. To find rapidly the most interesting mappings, we combine several techniques: (i) social links are expressed as FOAF (Friend of a Friend) links to characterize peer's friendship and compact mapping summaries are used to obtain mapping descriptions; (ii) local semantic views are special views that contain information about external mappings; and (iii) gossiping techniques improve the search of relevant mappings. Our experimental evaluation, based on a prototype on top of PeerSim and a simulated network demonstrate that our solution yields greater recall, compared to traditional query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM

    View Selection in Semantic Web Databases

    Get PDF
    We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201

    Ontology-based data access with databases: a short course

    Get PDF
    Ontology-based data access (OBDA) is regarded as a key ingredient of the new generation of information systems. In the OBDA paradigm, an ontology defines a high-level global schema of (already existing) data sources and provides a vocabulary for user queries. An OBDA system rewrites such queries and ontologies into the vocabulary of the data sources and then delegates the actual query evaluation to a suitable query answering system such as a relational database management system or a datalog engine. In this chapter, we mainly focus on OBDA with the ontology language OWL 2QL, one of the three profiles of the W3C standard Web Ontology Language OWL 2, and relational databases, although other possible languages will also be discussed. We consider different types of conjunctive query rewriting and their succinctness, different architectures of OBDA systems, and give an overview of the OBDA system Ontop

    Towards Intelligent Databases

    Get PDF
    This article is a presentation of the objectives and techniques of deductive databases. The deductive approach to databases aims at extending with intensional definitions other database paradigms that describe applications extensionaUy. We first show how constructive specifications can be expressed with deduction rules, and how normative conditions can be defined using integrity constraints. We outline the principles of bottom-up and top-down query answering procedures and present the techniques used for integrity checking. We then argue that it is often desirable to manage with a database system not only database applications, but also specifications of system components. We present such meta-level specifications and discuss their advantages over conventional approaches

    Doctor of Philosophy

    Get PDF
    dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data

    Rewriting Complex Queries from Cloud to Fog under Capability Constraints to Protect the Users' Privacy

    Get PDF
    In this paper we show how existing query rewriting and query containment techniques can be used to achieve an efficient and privacy-aware processing of queries. To achieve this, the whole network structure, from data producing sensors up to cloud computers, is utilized to create a database machine consisting of billions of devices from the Internet of Things. Based on previous research in the field of database theory, especially query rewriting, we present a concept to split a query into fragment and remainder queries. Fragment queries can operate on resource limited devices to filter and preaggregate data. Remainder queries take these data and execute the last, complex part of the original queries on more powerful devices. As a result, less data is processed and forwarded in the network and the privacy principle of data minimization is accomplished

    Automatic physical database design : recommending materialized views

    Get PDF
    This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency

    Tree-like Queries in OWL 2 QL: Succinctness and Complexity Results

    Get PDF
    This paper investigates the impact of query topology on the difficulty of answering conjunctive queries in the presence of OWL 2 QL ontologies. Our first contribution is to clarify the worst-case size of positive existential (PE), non-recursive Datalog (NDL), and first-order (FO) rewritings for various classes of tree-like conjunctive queries, ranging from linear queries to bounded treewidth queries. Perhaps our most surprising result is a superpolynomial lower bound on the size of PE-rewritings that holds already for linear queries and ontologies of depth 2. More positively, we show that polynomial-size NDL-rewritings always exist for tree-shaped queries with a bounded number of leaves (and arbitrary ontologies), and for bounded treewidth queries paired with bounded depth ontologies. For FO-rewritings, we equate the existence of polysize rewritings with well-known problems in Boolean circuit complexity. As our second contribution, we analyze the computational complexity of query answering and establish tractability results (either NL- or LOGCFL-completeness) for a range of query-ontology pairs. Combining our new results with those from the literature yields a complete picture of the succinctness and complexity landscapes for the considered classes of queries and ontologies.Comment: This is an extended version of a paper accepted at LICS'15. It contains both succinctness and complexity results and adopts FOL notation. The appendix contains proofs that had to be omitted from the conference version for lack of space. The previous arxiv version (a long version of our DL'14 workshop paper) only contained the succinctness results and used description logic notatio

    The ViP2P Platform: XML Views in P2P

    Get PDF
    The growing volumes of XML data sources on the Web or produced by enterprises, organizations etc. raise many performance challenges for data management applications. In this work, we are concerned with the distributed, peer-to-peer management of large corpora of XML documents, based on distributed hash table (or DHT, in short) overlay networks. We present ViP2P (standing for Views in Peer-to-Peer), a distributed platform for sharing XML documents based on a structured P2P network infrastructure (DHT). At the core of ViP2P stand distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P allows user queries to be evaluated over XML documents published by peers in two modes. First, a long-running subscription mode, when a query can be registered in the system and receive answers incrementally when and if published data matches the query. Second, queries can also be asked in an ad-hoc, snapshot mode, where results are required immediately and must be computed based on the results of other long-running, subscription queries. ViP2P innovates over other similar DHT-based XML sharing platforms by using a very expressive structured XML query language. This expressivity leads to a very flexible distribution of XML content in the ViP2P network, and to efficient snapshot query execution. ViP2P has been tested in real deployments of hundreds of computers. We present the platform architecture, its internal algorithms, and demonstrate its efficiency and scalability through a set of experiments. Our experimental results outgrow by orders of magnitude similar competitor systems in terms of data volumes, network size and data dissemination throughput.Comment: RR-7812 (2011
    corecore