4,917 research outputs found
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queries
Conjunctive database queries have been extended with a mechanism for object
creation to capture important applications such as data exchange, data
integration, and ontology-based data access. Object creation generates new
object identifiers in the result, that do not belong to the set of constants in
the source database. The new object identifiers can be also seen as Skolem
terms. Hence, object-creating conjunctive queries can also be regarded as
restricted second-order tuple-generating dependencies (SO tgds), considered in
the data exchange literature.
In this paper, we focus on the class of single-function object-creating
conjunctive queries, or sifo CQs for short. We give a new characterization for
oid-equivalence of sifo CQs that is simpler than the one given by Hull and
Yoshikawa and places the problem in the complexity class NP. Our
characterization is based on Cohen's equivalence notions for conjunctive
queries with multiplicities. We also solve the logical entailment problem for
sifo CQs, showing that also this problem belongs to NP. Results by Pichler et
al. have shown that logical equivalence for more general classes of SO tgds is
either undecidable or decidable with as yet unknown complexity upper bounds.Comment: This revised version has been accepted on 11 January 2016 for
publication in The VLDB Journa
Worst-case Optimal Query Answering for Greedy Sets of Existential Rules and Their Subclasses
The need for an ontological layer on top of data, associated with advanced
reasoning mechanisms able to exploit the semantics encoded in ontologies, has
been acknowledged both in the database and knowledge representation
communities. We focus in this paper on the ontological query answering problem,
which consists of querying data while taking ontological knowledge into
account. More specifically, we establish complexities of the conjunctive query
entailment problem for classes of existential rules (also called
tuple-generating dependencies, Datalog+/- rules, or forall-exists-rules. Our
contribution is twofold. First, we introduce the class of greedy
bounded-treewidth sets (gbts) of rules, which covers guarded rules, and their
most well-known generalizations. We provide a generic algorithm for query
entailment under gbts, which is worst-case optimal for combined complexity with
or without bounded predicate arity, as well as for data complexity and query
complexity. Secondly, we classify several gbts classes, whose complexity was
unknown, with respect to combined complexity (with both unbounded and bounded
predicate arity) and data complexity to obtain a comprehensive picture of the
complexity of existential rule fragments that are based on diverse guardedness
notions. Upper bounds are provided by showing that the proposed algorithm is
optimal for all of them
Multi modal multi-semantic image retrieval
PhDThe rapid growth in the volume of visual information, e.g. image, and video can
overwhelm users’ ability to find and access the specific visual information of interest
to them. In recent years, ontology knowledge-based (KB) image information retrieval
techniques have been adopted into in order to attempt to extract knowledge from these
images, enhancing the retrieval performance. A KB framework is presented to
promote semi-automatic annotation and semantic image retrieval using multimodal
cues (visual features and text captions). In addition, a hierarchical structure for the KB
allows metadata to be shared that supports multi-semantics (polysemy) for concepts.
The framework builds up an effective knowledge base pertaining to a domain specific
image collection, e.g. sports, and is able to disambiguate and assign high level
semantics to ‘unannotated’ images.
Local feature analysis of visual content, namely using Scale Invariant Feature
Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’
model (BVW) as an effective method to represent visual content information and to
enhance its classification and retrieval. Local features are more useful than global
features, e.g. colour, shape or texture, as they are invariant to image scale, orientation
and camera angle. An innovative approach is proposed for the representation,
annotation and retrieval of visual content using a hybrid technique based upon the use
of an unstructured visual word and upon a (structured) hierarchical ontology KB
model. The structural model facilitates the disambiguation of unstructured visual
words and a more effective classification of visual content, compared to a vector
space model, through exploiting local conceptual structures and their relationships.
The key contributions of this framework in using local features for image
representation include: first, a method to generate visual words using the semantic
local adaptive clustering (SLAC) algorithm which takes term weight and spatial
locations of keypoints into account. Consequently, the semantic information is
preserved. Second a technique is used to detect the domain specific ‘non-informative
visual words’ which are ineffective at representing the content of visual data and
degrade its categorisation ability. Third, a method to combine an ontology model with
xi
a visual word model to resolve synonym (visual heterogeneity) and polysemy
problems, is proposed. The experimental results show that this approach can discover
semantically meaningful visual content descriptions and recognise specific events,
e.g., sports events, depicted in images efficiently.
Since discovering the semantics of an image is an extremely challenging problem, one
promising approach to enhance visual content interpretation is to use any associated
textual information that accompanies an image, as a cue to predict the meaning of an
image, by transforming this textual information into a structured annotation for an
image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct
types of information representation and modality, there are some strong, invariant,
implicit, connections between images and any accompanying text information.
Semantic analysis of image captions can be used by image retrieval systems to
retrieve selected images more precisely. To do this, a Natural Language Processing
(NLP) is exploited firstly in order to extract concepts from image captions. Next, an
ontology-based knowledge model is deployed in order to resolve natural language
ambiguities. To deal with the accompanying text information, two methods to extract
knowledge from textual information have been proposed. First, metadata can be
extracted automatically from text captions and restructured with respect to a semantic
model. Second, the use of LSI in relation to a domain-specific ontology-based
knowledge model enables the combined framework to tolerate ambiguities and
variations (incompleteness) of metadata. The use of the ontology-based knowledge
model allows the system to find indirectly relevant concepts in image captions and
thus leverage these to represent the semantics of images at a higher level.
Experimental results show that the proposed framework significantly enhances image
retrieval and leads to narrowing of the semantic gap between lower level machinederived
and higher level human-understandable conceptualisation
The Requirements for Ontologies in Medical Data Integration: A Case Study
Evidence-based medicine is critically dependent on three sources of
information: a medical knowledge base, the patients medical record and
knowledge of available resources, including where appropriate, clinical
protocols. Patient data is often scattered in a variety of databases and may,
in a distributed model, be held across several disparate repositories.
Consequently addressing the needs of an evidence-based medicine community
presents issues of biomedical data integration, clinical interpretation and
knowledge management. This paper outlines how the Health-e-Child project has
approached the challenge of requirements specification for (bio-) medical data
integration, from the level of cellular data, through disease to that of
patient and population. The approach is illuminated through the requirements
elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three
diseases being studied in the EC-funded Health-e-Child project.Comment: 6 pages, 1 figure. Presented at the 11th International Database
Engineering & Applications Symposium (Ideas2007). Banff, Canada September
200
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Special issue on bio-ontologies and phenotypes
The bio-ontologies and phenotypes special issue includes eight papers selected from the 11 papers presented at the Bio-Ontologies SIG (Special Interest Group) and the Phenotype Day at ISMB (Intelligent Systems for Molecular Biology) conference in Boston in 2014. The selected papers span a wide range of topics including the automated re-use and update of ontologies, quality assessment of ontological resources, and the systematic description of phenotype variation, driven by manual, semi- and fully automatic means
- …