2,564 research outputs found
Semantic Query Reformulation in Social PDMS
We consider social peer-to-peer data management systems (PDMS), where each
peer maintains both semantic mappings between its schema and some
acquaintances, and social links with peer friends. In this context,
reformulating a query from a peer's schema into other peer's schemas is a hard
problem, as it may generate as many rewritings as the set of mappings from that
peer to the outside and transitively on, by eventually traversing the entire
network. However, not all the obtained rewritings are relevant to a given
query. In this paper, we address this problem by inspecting semantic mappings
and social links to find only relevant rewritings. We propose a new notion of
'relevance' of a query with respect to a mapping, and, based on this notion, a
new semantic query reformulation approach for social PDMS, which achieves great
accuracy and flexibility. To find rapidly the most interesting mappings, we
combine several techniques: (i) social links are expressed as FOAF (Friend of a
Friend) links to characterize peer's friendship and compact mapping summaries
are used to obtain mapping descriptions; (ii) local semantic views are special
views that contain information about external mappings; and (iii) gossiping
techniques improve the search of relevant mappings. Our experimental
evaluation, based on a prototype on top of PeerSim and a simulated network
demonstrate that our solution yields greater recall, compared to traditional
query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM
Functional Dependencies Unleashed for Scalable Data Exchange
We address the problem of efficiently evaluating target functional
dependencies (fds) in the Data Exchange (DE) process. Target fds naturally
occur in many DE scenarios, including the ones in Life Sciences in which
multiple source relations need to be structured under a constrained target
schema. However, despite their wide use, target fds' evaluation is still a
bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL
approach typically do not support target fds unless additional information is
provided. Alternatively, DE engines that do include these dependencies
typically pay the price of a significant drop in performance and scalability.
In this paper, we present a novel chase-based algorithm that can efficiently
handle arbitrary fds on the target. Our approach essentially relies on
exploiting the interactions between source-to-target (s-t) tuple-generating
dependencies (tgds) and target fds. This allows us to tame the size of the
intermediate chase results, by playing on a careful ordering of chase steps
interleaving fds and (chosen) tgds. As a direct consequence, we importantly
diminish the fd application scope, often a central cause of the dramatic
overhead induced by target fds. Moreover, reasoning on dependency interaction
further leads us to interesting parallelization opportunities, yielding
additional scalability gains. We provide a proof-of-concept implementation of
our chase-based algorithm and an experimental study aiming at gauging its
scalability with respect to a number of parameters, among which the size of
source instances and the number of dependencies of each tested scenario.
Finally, we empirically compare with the latest DE engines, and show that our
algorithm outperforms them
Inconsistency-tolerant Query Answering in Ontology-based Data Access
Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them. In this paper, we address the problem of dealing with inconsistencies in OBDA. Our general goal is both to study DL semantical frameworks that are inconsistency-tolerant, and to devise techniques for answering unions of conjunctive queries under such inconsistency-tolerant semantics. Our work is inspired by the approaches to consistent query answering in databases, which are based on the idea of living with inconsistencies in the database, but trying to obtain only consistent information during query answering, by relying on the notion of database repair. We first adapt the notion of database repair to our context, and show that, according to such a notion, inconsistency-tolerant query answering is intractable, even for very simple DLs. Therefore, we propose a different repair-based semantics, with the goal of reaching a good compromise between the expressive power of the semantics and the computational complexity of inconsistency-tolerant query answering. Indeed, we show that query answering under the new semantics is first-order rewritable in OBDA, even if the ontology is expressed in one of the most expressive members of the DL-Lite family
View Selection in Semantic Web Databases
We consider the setting of a Semantic Web database, containing both explicit
data encoded in RDF triples, and implicit data, implied by the RDF semantics.
Based on a query workload, we address the problem of selecting a set of views
to be materialized in the database, minimizing a combination of query
processing, view storage, and view maintenance costs. Starting from an existing
relational view selection method, we devise new algorithms for recommending
view sets, and show that they scale significantly beyond the existing
relational ones when adapted to the RDF context. To account for implicit
triples in query answers, we propose a novel RDF query reformulation algorithm
and an innovative way of incorporating it into view selection in order to avoid
a combinatorial explosion in the complexity of the selection process. The
interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201
Confidentiality-Preserving Publish/Subscribe: A Survey
Publish/subscribe (pub/sub) is an attractive communication paradigm for
large-scale distributed applications running across multiple administrative
domains. Pub/sub allows event-based information dissemination based on
constraints on the nature of the data rather than on pre-established
communication channels. It is a natural fit for deployment in untrusted
environments such as public clouds linking applications across multiple sites.
However, pub/sub in untrusted environments lead to major confidentiality
concerns stemming from the content-centric nature of the communications. This
survey classifies and analyzes different approaches to confidentiality
preservation for pub/sub, from applications of trust and access control models
to novel encryption techniques. It provides an overview of the current
challenges posed by confidentiality concerns and points to future research
directions in this promising field
Algorithms for Core Computation in Data Exchange
We describe the state of the art in the area of core computation for data exchange. Two main approaches are considered: post-processing core computation, applied to a canonical universal solution constructed by chasing a given schema mapping, and direct core computation, where the mapping is first rewritten in order to create core universal solutions by chasing it
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
Conclave: secure multi-party computation on big data (extended TR)
Secure Multi-Party Computation (MPC) allows mutually distrusting parties to
run joint computations without revealing private data. Current MPC algorithms
scale poorly with data size, which makes MPC on "big data" prohibitively slow
and inhibits its practical use.
Many relational analytics queries can maintain MPC's end-to-end security
guarantee without using cryptographic MPC techniques for all operations.
Conclave is a query compiler that accelerates such queries by transforming them
into a combination of data-parallel, local cleartext processing and small MPC
steps. When parties trust others with specific subsets of the data, Conclave
applies new hybrid MPC-cleartext protocols to run additional steps outside of
MPC and improve scalability further.
Our Conclave prototype generates code for cleartext processing in Python and
Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave
scales to data sets between three and six orders of magnitude larger than
state-of-the-art MPC frameworks support on their own. Thanks to its hybrid
protocols, Conclave also substantially outperforms SMCQL, the most similar
existing system.Comment: Extended technical report for EuroSys 2019 pape
- …