4,342 research outputs found
Using Ontologies for Semantic Data Integration
While big data analytics is considered as one of the most important paths to competitive advantage of today’s enterprises, data scientists spend a comparatively large amount of time in the data preparation and data integration phase of a big data project. This shows that data integration is still a major challenge in IT applications. Over the past two decades, the idea of using semantics for data integration has become increasingly crucial, and has received much attention in the AI, database, web, and data mining communities. Here, we focus on a specific paradigm for semantic data integration, called Ontology-Based Data Access (OBDA). The goal of this paper is to provide an overview of OBDA, pointing out both the techniques that are at the basis of the paradigm, and the main challenges that remain to be addressed
Inconsistency-tolerant Query Answering in Ontology-based Data Access
Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them. In this paper, we address the problem of dealing with inconsistencies in OBDA. Our general goal is both to study DL semantical frameworks that are inconsistency-tolerant, and to devise techniques for answering unions of conjunctive queries under such inconsistency-tolerant semantics. Our work is inspired by the approaches to consistent query answering in databases, which are based on the idea of living with inconsistencies in the database, but trying to obtain only consistent information during query answering, by relying on the notion of database repair. We first adapt the notion of database repair to our context, and show that, according to such a notion, inconsistency-tolerant query answering is intractable, even for very simple DLs. Therefore, we propose a different repair-based semantics, with the goal of reaching a good compromise between the expressive power of the semantics and the computational complexity of inconsistency-tolerant query answering. Indeed, we show that query answering under the new semantics is first-order rewritable in OBDA, even if the ontology is expressed in one of the most expressive members of the DL-Lite family
Circuit Complexity Meets Ontology-Based Data Access
Ontology-based data access is an approach to organizing access to a database
augmented with a logical theory. In this approach query answering proceeds
through a reformulation of a given query into a new one which can be answered
without any use of theory. Thus the problem reduces to the standard database
setting.
However, the size of the query may increase substantially during the
reformulation. In this survey we review a recently developed framework on
proving lower and upper bounds on the size of this reformulation by employing
methods and results from Boolean circuit complexity.Comment: To appear in proceedings of CSR 2015, LNCS 9139, Springe
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
On the Succinctness of Query Rewriting over OWL 2 QL Ontologies with Shallow Chases
We investigate the size of first-order rewritings of conjunctive queries over
OWL 2 QL ontologies of depth 1 and 2 by means of hypergraph programs computing
Boolean functions. Both positive and negative results are obtained. Conjunctive
queries over ontologies of depth 1 have polynomial-size nonrecursive datalog
rewritings; tree-shaped queries have polynomial positive existential
rewritings; however, in the worst case, positive existential rewritings can
only be of superpolynomial size. Positive existential and nonrecursive datalog
rewritings of queries over ontologies of depth 2 suffer an exponential blowup
in the worst case, while first-order rewritings are superpolynomial unless
. We also analyse rewritings of
tree-shaped queries over arbitrary ontologies and observe that the query
entailment problem for such queries is fixed-parameter tractable
Rewritability in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics
We study rewritability of monadic disjunctive Datalog programs, (the
complements of) MMSNP sentences, and ontology-mediated queries (OMQs) based on
expressive description logics of the ALC family and on conjunctive queries. We
show that rewritability into FO and into monadic Datalog (MDLog) are decidable,
and that rewritability into Datalog is decidable when the original query
satisfies a certain condition related to equality. We establish
2NExpTime-completeness for all studied problems except rewritability into MDLog
for which there remains a gap between 2NExpTime and 3ExpTime. We also analyze
the shape of rewritings, which in the MMSNP case correspond to obstructions,
and give a new construction of canonical Datalog programs that is more
elementary than existing ones and also applies to formulas with free variables
Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+-
We consider a semantic class, weakly-chase-sticky (WChS), and a syntactic
subclass, jointly-weakly-sticky (JWS), of Datalog+- programs. Both extend that
of weakly-sticky (WS) programs, which appear in our applications to data
quality. For WChS programs we propose a practical, polynomial-time query
answering algorithm (QAA). We establish that the two classes are closed under
magic-sets rewritings. As a consequence, QAA can be applied to the optimized
programs. QAA takes as inputs the program (including the query) and semantic
information about the "finiteness" of predicate positions. For the syntactic
subclasses JWS and WS of WChS, this additional information is computable.Comment: To appear in Proc. Alberto Mendelzon WS on Foundations of Data
Management (AMW15
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
- …