85 research outputs found
Keyword Search on RDF Graphs - A Query Graph Assembly Approach
Keyword search provides ordinary users an easy-to-use interface for querying
RDF data. Given the input keywords, in this paper, we study how to assemble a
query graph that is to represent user's query intention accurately and
efficiently. Based on the input keywords, we first obtain the elementary query
graph building blocks, such as entity/class vertices and predicate edges. Then,
we formally define the query graph assembly (QGA) problem. Unfortunately, we
prove theoretically that QGA is a NP-complete problem. In order to solve that,
we design some heuristic lower bounds and propose a bipartite graph
matching-based best-first search algorithm. The algorithm's time complexity is
, where is the number of the keywords and is a
tunable parameter, i.e., the maximum number of candidate entity/class vertices
and predicate edges allowed to match each keyword. Although QGA is intractable,
both and are small in practice. Furthermore, the algorithm's time
complexity does not depend on the RDF graph size, which guarantees the good
scalability of our system in large RDF graphs. Experiments on DBpedia and
Freebase confirm the superiority of our system on both effectiveness and
efficiency
Challenges in Bridging Social Semantics and Formal Semantics on the Web
This paper describes several results of Wimmics, a research lab which names
stands for: web-instrumented man-machine interactions, communities, and
semantics. The approaches introduced here rely on graph-oriented knowledge
representation, reasoning and operationalization to model and support actors,
actions and interactions in web-based epistemic communities. The re-search
results are applied to support and foster interactions in online communities
and manage their resources
Completeness and Consistency Analysis for Evolving Knowledge Bases
Assessing the quality of an evolving knowledge base is a challenging task as
it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data
sources, it is impractical to manually curate the data, and challenging to
continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to
evolving knowledge bases: (i) identification of completeness issues using
knowledge base evolution analysis, and (ii) identification of consistency
issues based on integrity constraints, such as minimum and maximum cardinality,
and range constraints.
For completeness analysis, we use data profiling information from consecutive
knowledge base releases to estimate completeness measures that allow predicting
quality issues. Then, we perform consistency checks to validate the results of
the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a
subset of datasets from both DBpedia and 3cixty knowledge bases. The
performance of the approach is evaluated using precision, recall, and F1 score.
From completeness analysis, we observe a 94% precision for the English DBpedia
KB and 95% precision for the 3cixty Nice KB. We also assessed the performance
of our consistency analysis by using five learning models over three sub-tasks,
namely minimum cardinality, maximum cardinality, and range constraint. We
observed that the best performing model in our experimental setup is the Random
Forest, reaching an F1 score greater than 90% for minimum and maximum
cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic
Semantic Systems. In the Era of Knowledge Graphs
This open access book constitutes the refereed proceedings of the 16th International Conference on Semantic Systems, SEMANTiCS 2020, held in Amsterdam, The Netherlands, in September 2020. The conference was held virtually due to the COVID-19 pandemic
Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis
In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders – consumers, curators, etc. –. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained “change” detection at low-level or using “dynamics” of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
- …