1,758 research outputs found
Learning-based SPARQL query performance modeling and prediction
One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting
Learning knowledge graph (KG) embeddings is an emerging technique for a
variety of downstream tasks such as summarization, link prediction, information
retrieval, and question answering. However, most existing KG embedding models
neglect space and, therefore, do not perform well when applied to (geo)spatial
data and tasks. For those models that consider space, most of them primarily
rely on some notions of distance. These models suffer from higher computational
complexity during training while still losing information beyond the relative
distance between entities. In this work, we propose a location-aware KG
embedding model called SE-KGE. It directly encodes spatial information such as
point coordinates or bounding boxes of geographic entities into the KG
embedding space. The resulting model is capable of handling different types of
spatial reasoning. We also construct a geographic knowledge graph as well as a
set of geographic query-answer pairs called DBGeo to evaluate the performance
of SE-KGE in comparison to multiple baselines. Evaluation results show that
SE-KGE outperforms these baselines on the DBGeo dataset for geographic logic
query answering task. This demonstrates the effectiveness of our
spatially-explicit model and the importance of considering the scale of
different geographic entities. Finally, we introduce a novel downstream task
called spatial semantic lifting which links an arbitrary location in the study
area to entities in the KG via some relations. Evaluation on DBGeo shows that
our model outperforms the baseline by a substantial margin.Comment: Accepted to Transactions in GI
Correcting Knowledge Base Assertions
The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Joint Video and Text Parsing for Understanding Events and Answering Queries
We propose a framework for parsing video and text jointly for understanding
events and answering user queries. Our framework produces a parse graph that
represents the compositional structures of spatial information (objects and
scenes), temporal information (actions and events) and causal information
(causalities between events and fluents) in the video and text. The knowledge
representation of our framework is based on a spatial-temporal-causal And-Or
graph (S/T/C-AOG), which jointly models possible hierarchical compositions of
objects, scenes and events as well as their interactions and mutual contexts,
and specifies the prior probabilistic distribution of the parse graphs. We
present a probabilistic generative model for joint parsing that captures the
relations between the input video/text, their corresponding parse graphs and
the joint parse graph. Based on the probabilistic model, we propose a joint
parsing system consisting of three modules: video parsing, text parsing and
joint inference. Video parsing and text parsing produce two parse graphs from
the input video and text respectively. The joint inference module produces a
joint parse graph by performing matching, deduction and revision on the video
and text parse graphs. The proposed framework has the following objectives:
Firstly, we aim at deep semantic parsing of video and text that goes beyond the
traditional bag-of-words approaches; Secondly, we perform parsing and reasoning
across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG
representation; Thirdly, we show that deep joint parsing facilitates subsequent
applications such as generating narrative text descriptions and answering
queries in the forms of who, what, when, where and why. We empirically
evaluated our system based on comparison against ground-truth as well as
accuracy of query answering and obtained satisfactory results
Documenting Knowledge Graph Embedding and Link Prediction using Knowledge Graphs
In recent years, sub-symbolic learning, i.e., Knowledge Graph Embedding (KGE) incorporated with Knowledge Graphs (KGs) has gained significant attention in various downstream tasks (e.g., Link Prediction (LP)). These techniques learn a latent vector representation of KG's semantical structure to infer missing links. Nonetheless, the KGE models remain a black box, and the decision-making process behind them is not clear. Thus, the trustability and reliability of the model's outcomes have been challenged. While many state-of-the-art approaches provide data-driven frameworks to address these issues, they do not always provide a complete understanding, and the interpretations are not machine-readable. That is why, in this work, we extend a hybrid interpretable framework, InterpretME, in the field of the KGE models, especially for translation distance models, which include TransE, TransH, TransR, and TransD. The experimental evaluation on various benchmark KGs supports the validity of this approach, which we term Trace KGE. Trace KGE, in particular, contributes to increased interpretability and understanding of the perplexing KGE model's behavior
- …