36,905 research outputs found
Using Ontologies for Semantic Data Integration
While big data analytics is considered as one of the most important paths to competitive advantage of today’s enterprises, data scientists spend a comparatively large amount of time in the data preparation and data integration phase of a big data project. This shows that data integration is still a major challenge in IT applications. Over the past two decades, the idea of using semantics for data integration has become increasingly crucial, and has received much attention in the AI, database, web, and data mining communities. Here, we focus on a specific paradigm for semantic data integration, called Ontology-Based Data Access (OBDA). The goal of this paper is to provide an overview of OBDA, pointing out both the techniques that are at the basis of the paradigm, and the main challenges that remain to be addressed
Secure Querying of Recursive XML Views: A Standard XPath-based Technique
Most state-of-the art approaches for securing XML documents allow users to
access data only through authorized views defined by annotating an XML grammar
(e.g. DTD) with a collection of XPath expressions. To prevent improper
disclosure of confidential information, user queries posed on these views need
to be rewritten into equivalent queries on the underlying documents. This
rewriting enables us to avoid the overhead of view materialization and
maintenance. A major concern here is that query rewriting for recursive XML
views is still an open problem. To overcome this problem, some works have been
proposed to translate XPath queries into non-standard ones, called Regular
XPath queries. However, query rewriting under Regular XPath can be of
exponential size as it relies on automaton model. Most importantly, Regular
XPath remains a theoretical achievement. Indeed, it is not commonly used in
practice as translation and evaluation tools are not available. In this paper,
we show that query rewriting is always possible for recursive XML views using
only the expressive power of the standard XPath. We investigate the extension
of the downward class of XPath, composed only by child and descendant axes,
with some axes and operators and we propose a general approach to rewrite
queries under recursive XML views. Unlike Regular XPath-based works, we provide
a rewriting algorithm which processes the query only over the annotated DTD
grammar and which can run in linear time in the size of the query. An
experimental evaluation demonstrates that our algorithm is efficient and scales
well.Comment: (2011
Ontology-based data access with databases: a short course
Ontology-based data access (OBDA) is regarded as a key ingredient of the new generation of information systems. In the OBDA paradigm, an ontology defines a high-level global schema of (already existing) data sources and provides a vocabulary for user queries. An OBDA system rewrites such queries and ontologies into the vocabulary of the data sources and then delegates the actual query evaluation to a suitable query answering system such as a relational database management system or a datalog engine. In this chapter, we mainly focus on OBDA with the ontology language OWL 2QL, one of the three profiles of the W3C standard Web Ontology Language OWL 2, and relational databases, although other possible languages will also be discussed. We consider different types of conjunctive query rewriting and their succinctness, different architectures of OBDA systems, and give an overview of the OBDA system Ontop
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
The Bag Semantics of Ontology-Based Data Access
Ontology-based data access (OBDA) is a popular approach for integrating and
querying multiple data sources by means of a shared ontology. The ontology is
linked to the sources using mappings, which assign views over the data to
ontology predicates. Motivated by the need for OBDA systems supporting
database-style aggregate queries, we propose a bag semantics for OBDA, where
duplicate tuples in the views defined by the mappings are retained, as is the
case in standard databases. We show that bag semantics makes conjunctive query
answering in OBDA coNP-hard in data complexity. To regain tractability, we
consider a rather general class of queries and show its rewritability to a
generalisation of the relational calculus to bags
XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme
Query evaluation in an XML database requires reconstructing XML subtrees
rooted at nodes found by an XML query. Since XML subtree reconstruction can be
expensive, one approach to improve query response time is to use reconstruction
views - materialized XML subtrees of an XML document, whose nodes are
frequently accessed by XML queries. For this approach to be efficient, the
principal requirement is a framework for view selection. In this work, we are
the first to formalize and study the problem of XML reconstruction view
selection. The input is a tree , in which every node has a size
and profit , and the size limitation . The target is to find a subset
of subtrees rooted at nodes respectively such that
, and is maximal.
Furthermore, there is no overlap between any two subtrees selected in the
solution. We prove that this problem is NP-hard and present a fully
polynomial-time approximation scheme (FPTAS) as a solution
- …