211 research outputs found
Comparative Analysis of Five XML Query Languages
XML is becoming the most relevant new standard for data representation and
exchange on the WWW. Novel languages for extracting and restructuring the XML
content have been proposed, some in the tradition of database query languages
(i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query
language has yet been decided, but the discussion is ongoing within the World
Wide Web Consortium and within many academic institutions and Internet-related
major companies. We present a comparison of five, representative query
languages for XML, highlighting their common features and differences.Comment: TeX v3.1415, 17 pages, 6 figures, to be published in ACM Sigmod
Record, March 200
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Functional Dependencies Unleashed for Scalable Data Exchange
We address the problem of efficiently evaluating target functional
dependencies (fds) in the Data Exchange (DE) process. Target fds naturally
occur in many DE scenarios, including the ones in Life Sciences in which
multiple source relations need to be structured under a constrained target
schema. However, despite their wide use, target fds' evaluation is still a
bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL
approach typically do not support target fds unless additional information is
provided. Alternatively, DE engines that do include these dependencies
typically pay the price of a significant drop in performance and scalability.
In this paper, we present a novel chase-based algorithm that can efficiently
handle arbitrary fds on the target. Our approach essentially relies on
exploiting the interactions between source-to-target (s-t) tuple-generating
dependencies (tgds) and target fds. This allows us to tame the size of the
intermediate chase results, by playing on a careful ordering of chase steps
interleaving fds and (chosen) tgds. As a direct consequence, we importantly
diminish the fd application scope, often a central cause of the dramatic
overhead induced by target fds. Moreover, reasoning on dependency interaction
further leads us to interesting parallelization opportunities, yielding
additional scalability gains. We provide a proof-of-concept implementation of
our chase-based algorithm and an experimental study aiming at gauging its
scalability with respect to a number of parameters, among which the size of
source instances and the number of dependencies of each tested scenario.
Finally, we empirically compare with the latest DE engines, and show that our
algorithm outperforms them
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
A Trichotomy for Regular Simple Path Queries on Graphs
Regular path queries (RPQs) select nodes connected by some path in a graph.
The edge labels of such a path have to form a word that matches a given regular
expression. We investigate the evaluation of RPQs with an additional constraint
that prevents multiple traversals of the same nodes. Those regular simple path
queries (RSPQs) find several applications in practice, yet they quickly become
intractable, even for basic languages such as (aa)* or a*ba*.
In this paper, we establish a comprehensive classification of regular
languages with respect to the complexity of the corresponding regular simple
path query problem. More precisely, we identify the fragment that is maximal in
the following sense: regular simple path queries can be evaluated in polynomial
time for every regular language L that belongs to this fragment and evaluation
is NP-complete for languages outside this fragment. We thus fully characterize
the frontier between tractability and intractability for RSPQs, and we refine
our results to show the following trichotomy: Evaluations of RSPQs is either
AC0, NL-complete or NP-complete in data complexity, depending on the regular
language L. The fragment identified also admits a simple characterization in
terms of regular expressions.
Finally, we also discuss the complexity of the following decision problem:
decide, given a language L, whether finding a regular simple path for L is
tractable. We consider several alternative representations of L: DFAs, NFAs or
regular expressions, and prove that this problem is NL-complete for the first
representation and PSPACE-complete for the other two. As a conclusion we extend
our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge
labeled graphs.Comment: 15 pages, conference submissio
Repairing mappings under policy views
The problem of data exchange involves a source schema, a target schema and a
set of mappings from transforming the data between the two schemas. We study
the problem of data exchange in the presence of privacy restrictions on the
source. The privacy restrictions are expressed as a set of policy views
representing the information that is safe to expose over all instances of the
source. We propose a protocol that provides formal privacy guarantees and is
data-independent, i.e., if certain criteria are met, then the protocol
guarantees that the mappings leak no sensitive information independently of the
data that lies in the source. We also propose an algorithm for repairing an
input mapping w.r.t. a set of policy views, in cases where the input mapping
leaks sensitive information. The empirical evaluation of our work shows that
the proposed algorithm is quite efficient, repairing sets of 300 s-t tgds in an
average time of 5s on a commodity machine. To the best of our knowledge, our
work is the first one that studies the problems of exchanging data and
repairing mappings under such privacy restrictions. Furthermore, our work is
the first to provide practical algorithms for a logical privacy-preservation
paradigm, described as an open research challenge in previous work on this
area.Comment: 12 page
Ontological Matchmaking in Recommender Systems
The electronic marketplace offers great potential for the recommendation of
supplies. In the so called recommender systems, it is crucial to apply
matchmaking strategies that faithfully satisfy the predicates specified in the
demand, and take into account as much as possible the user preferences. We
focus on real-life ontology-driven matchmaking scenarios and identify a number
of challenges, being inspired by such scenarios. A key challenge is that of
presenting the results to the users in an understandable and clear-cut fashion
in order to facilitate the analysis of the results. Indeed, such scenarios
evoke the opportunity to rank and group the results according to specific
criteria. A further challenge consists of presenting the results to the user in
an asynchronous fashion, i.e. the 'push' mode, along with the 'pull' mode, in
which the user explicitly issues a query, and displays the results. Moreover,
an important issue to consider in real-life cases is the possibility of
submitting a query to multiple providers, and collecting the various results.
We have designed and implemented an ontology-based matchmaking system that
suitably addresses the above challenges. We have conducted a comprehensive
experimental study, in order to investigate the usability of the system, the
performance and the effectiveness of the matchmaking strategies with real
ontological datasets.Comment: 28 pages, 8 figure
Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queries
Conjunctive database queries have been extended with a mechanism for object
creation to capture important applications such as data exchange, data
integration, and ontology-based data access. Object creation generates new
object identifiers in the result, that do not belong to the set of constants in
the source database. The new object identifiers can be also seen as Skolem
terms. Hence, object-creating conjunctive queries can also be regarded as
restricted second-order tuple-generating dependencies (SO tgds), considered in
the data exchange literature.
In this paper, we focus on the class of single-function object-creating
conjunctive queries, or sifo CQs for short. We give a new characterization for
oid-equivalence of sifo CQs that is simpler than the one given by Hull and
Yoshikawa and places the problem in the complexity class NP. Our
characterization is based on Cohen's equivalence notions for conjunctive
queries with multiplicities. We also solve the logical entailment problem for
sifo CQs, showing that also this problem belongs to NP. Results by Pichler et
al. have shown that logical equivalence for more general classes of SO tgds is
either undecidable or decidable with as yet unknown complexity upper bounds.Comment: This revised version has been accepted on 11 January 2016 for
publication in The VLDB Journa
Semantic Query Reformulation in Social PDMS
We consider social peer-to-peer data management systems (PDMS), where each
peer maintains both semantic mappings between its schema and some
acquaintances, and social links with peer friends. In this context,
reformulating a query from a peer's schema into other peer's schemas is a hard
problem, as it may generate as many rewritings as the set of mappings from that
peer to the outside and transitively on, by eventually traversing the entire
network. However, not all the obtained rewritings are relevant to a given
query. In this paper, we address this problem by inspecting semantic mappings
and social links to find only relevant rewritings. We propose a new notion of
'relevance' of a query with respect to a mapping, and, based on this notion, a
new semantic query reformulation approach for social PDMS, which achieves great
accuracy and flexibility. To find rapidly the most interesting mappings, we
combine several techniques: (i) social links are expressed as FOAF (Friend of a
Friend) links to characterize peer's friendship and compact mapping summaries
are used to obtain mapping descriptions; (ii) local semantic views are special
views that contain information about external mappings; and (iii) gossiping
techniques improve the search of relevant mappings. Our experimental
evaluation, based on a prototype on top of PeerSim and a simulated network
demonstrate that our solution yields greater recall, compared to traditional
query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM
- …