Search CORE

6 research outputs found

Knowledge Base Evolution Analysis: A Case Study in the Tourism Domain

Author: F Zablith
MB Ellefi
N Mihindukulasooriya
R Troncy
R Troncy
T Käfer
V Papavasileiou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Stakeholders -- curator, consumer, etc. -- in the tourism domain routinely need to combine and compare statistical indicators about tourism. In this context, various Knowledge Bases (KBs) have been designed and developed in the Linked Open Data (LOD) cloud in order to support decision-making process in Tourism domain. Such KBs evolve over time: their data (instances) and schemes can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of issues. This paper attempts to address the impact of KB evolution in tourism domain by showing how entity evolves over time using the 3cixty KB. We show that using multiple versions of the KB through time can help to understand inconsistency in the data collection process

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Completeness and Consistency Analysis for Evolving Knowledge Bases

Author: Corcho Oscar
García-Castro Raúl
Mihindukulasooriya Nandana
Rashid Mohammad Rifat Ahmmad
Rizzo Giuseppe
Torchiano Marco
Publication venue: 'Elsevier BV'
Publication date: 30/11/2018
Field of study

Assessing the quality of an evolving knowledge base is a challenging task as it often requires to identify correct quality assessment procedures. Since data is often derived from autonomous, and increasingly large data sources, it is impractical to manually curate the data, and challenging to continuously and automatically assess their quality. In this paper, we explore two main areas of quality assessment related to evolving knowledge bases: (i) identification of completeness issues using knowledge base evolution analysis, and (ii) identification of consistency issues based on integrity constraints, such as minimum and maximum cardinality, and range constraints. For completeness analysis, we use data profiling information from consecutive knowledge base releases to estimate completeness measures that allow predicting quality issues. Then, we perform consistency checks to validate the results of the completeness analysis using integrity constraints and learning models. The approach has been tested both quantitatively and qualitatively by using a subset of datasets from both DBpedia and 3cixty knowledge bases. The performance of the approach is evaluated using precision, recall, and F1 score. From completeness analysis, we observe a 94% precision for the English DBpedia KB and 95% precision for the 3cixty Nice KB. We also assessed the performance of our consistency analysis by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. We observed that the best performing model in our experimental setup is the Random Forest, reaching an F1 score greater than 90% for minimum and maximum cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud

Author: J Schaible
M Schmachtenberg
N Mihindukulasooriya
PY Vandenbussche
R Chawuthai
RV Guha
T Gottron
T Käfer
V Papavassiliou
Y Roussakis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Vocabularies are used for modeling data in Knowledge Graphs (KGs) like the Linked Open Data Cloud and Wikidata. During their lifetime, vocabularies are subject to changes. New terms are coined, while existing terms are modified or deprecated. We first quantify the amount and frequency of changes in vocabularies. Subsequently, we investigate to which extend and when the changes are adopted in the evolution of KGs. We conduct our experiments on three large-scale KGs: the Billion Triples Challenge datasets, the Dynamic Linked Data Observatory dataset, and Wikidata. Our results show that the change frequency of terms is rather low, but can have high impact due to the large amount of distributed graph data on the web. Furthermore, not all coined terms are used and most of the deprecated terms are still used by data publishers. The adoption time of terms coming from different vocabularies ranges from very fast (few days) to very slow (few years). Surprisingly, we could observe some adoptions before the vocabulary changes were published. Understanding the evolution of vocabulary terms is important to avoid wrong assumptions about the modeling status of data published on the web, which may result in difficulties when querying the data from distributed sources

University of Essex Research Repository

Crossref

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis

Author: Rashid MOHAMMAD RIFAT AHMMAD
Publication venue: Politecnico di Torino
Publication date
Field of study

In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders – consumers, curators, etc. –. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained “change” detection at low-level or using “dynamics” of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Collaborative ontology evolution and data quality: an empirical analysis

Author: García-Castro Raúl
Gómez-Pérez A.
Mihindukulasooriya Nandana
Poveda-Villalón María
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Since more than a decade, theoretical research on ontology evolution has been published in literature and several frameworks for managing ontology changes have been proposed. However, there are less studies that analyze widely used ontologies that were developed in a collaborative manner to understand community-driven ontology evolution in practice. In this paper, we perform an empirical analysis on how four well-known ontologies (DBpedia, Schema.org, PROV-O, and FOAF) have evolved through their lifetime and an analysis of the data quality issues caused by some of the ontology changes. To that end, the paper discusses the composition of the communities that developed the aforementioned ontologies and the ontology development process followed. Further, the paper analyses the changes in those ontologies in the 53 versions of them examined in this study. Depending of the use case, the community involved, and other factors different approaches for the ontology development and evolution process are used (e.g., bottom-up approach with high automation or top-down approach with a lot of manual curation). This paper concludes that one model for managing changes does not fit all. Furthermore, it is also clear that none of the selected ontologies follow the theoretical frameworks found in literature. Nevertheless, in communities where industrial participants are dominant more rigorous editorial processes are followed, largely influenced by software development tools and processes. Based on the analysis, the most common quality problems caused by ontology changes include the use of abandoned classes and properties in data and introduction of duplicate classes and properties

Archivo Digital UPM