Search CORE

8 research outputs found

Tracking the History and Evolution of Entities: Entity-centric Temporal Analysis of Large Social Media Archives

Author: Fafalios Pavlos
Iosifidis Vasileios
Ntoutsi Eirini
Stefanidis Kostas
Publication venue
Publication date: 24/10/2018
Field of study

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus meaningful analysis methods over such archived data are of immense value for sociologists, historians and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of four years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.Comment: This is a preprint of an article accepted for publication in the International Journal on Digital Libraries (2018

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

RDF data evolution: efficient detection and semantic representation of changes

Author: Daniel Mercier
Fatiha Saïs
Nathalie Pernelle
Sujeeban Thuraisamy
Publication venue
Publication date: 31/03/2020
Field of study

ABSTRACT Many RDF data sources are constantly changing for both data and vocabulary (ontology) levels. Many integration tasks are impacted by these changes. In this context, it is important to develop approaches to detect and represent these changes. Many studies have focused on the detection, the representation and the management of changes at the ontology level. In this paper, we present an approach which allows to detect and represent elementary and complex changes that can be detected when we focus only on the data level. A first experiment was conducted on different versions of DBpedia

CiteSeerX

On Recommending Evolution Measures : A Human-aware Approach

Author: Kondylakis Haridimos
Stefanidis Kostas
Troullinou Georgia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2019
Field of study

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

DELTA-R: a change detection approach for RDF datasets

Author: Brennan Rob
O’Sullivan Declan
Singh Anuj
Publication venue: CEUR-WS
Publication date: 01/06/2018
Field of study

This paper presents the DELTA-R approach that detects and classifies the changes between two versions of a linked dataset. It contributes to the state of the art firstly: by proposing a more granular classification of the resource level changes, and secondly: by automatically selecting the appropriate resource properties to identify the same resources in different versions of a linked dataset with different URIs and similar representation. The paper also presents the DELTA-R change model to represent the changes detected by the DELTA-R approach. This model bridges the gap between resource-centric and triple-centric views of changes in linked datasets. As a result, a single change detection mechanism will be able to support the use cases like interlink maintenance and dataset or replica synchronization. Additionally, the paper describes an experiment conducted to examine the accuracy of the DELTA-R approach in detecting the changes between two versions of a linked dataset. The result indicates that the accuracy of DELTA-R approach outperforms the state of the art approaches by up to 4%. It is demonstrated that the proposed more granular classification of changes helped to identifyup to 1529 additional updated resources compered to X.By means of a case study, we demonstrate the support of DELTA-R approach and change model for an interlink maintenance use case. The result shows that 100% of the broken interlinks were repaired between DBpedia person snapshot 3.7 and Freebase

DCU Online Research Access Service

Multi-aspect Entity-Centric Analysis of Big Social Media Archives

Author: Fafalios Pavlos
Iosifidis Vasileios
Ntoutsi Eirini
Stefanidis Kostas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2019
Field of study

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Automated Knowledge Base Quality Assessment and Validation based on Evolution Analysis

Author: Rashid MOHAMMAD RIFAT AHMMAD
Publication venue: Politecnico di Torino
Publication date
Field of study

In recent years, numerous efforts have been put towards sharing Knowledge Bases (KB) in the Linked Open Data (LOD) cloud. These KBs are being used for various tasks, including performing data analytics or building question answering systems. Such KBs evolve continuously: their data (instances) and schemas can be updated, extended, revised and refactored. However, unlike in more controlled types of knowledge bases, the evolution of KBs exposed in the LOD cloud is usually unrestrained, what may cause data to suffer from a variety of quality issues, both at a semantic level and at a pragmatic level. This situation affects negatively data stakeholders – consumers, curators, etc. –. Data quality is commonly related to the perception of the fitness for use, for a certain application or use case. Therefore, ensuring the quality of the data of a knowledge base that evolves is vital. Since data is derived from autonomous, evolving, and increasingly large data providers, it is impractical to do manual data curation, and at the same time, it is very challenging to do a continuous automatic assessment of data quality. Ensuring the quality of a KB is a non-trivial task since they are based on a combination of structured information supported by models, ontologies, and vocabularies, as well as queryable endpoints, links, and mappings. Thus, in this thesis, we explored two main areas in assessing KB quality: (i) quality assessment using KB evolution analysis, and (ii) validation using machine learning models. The evolution of a KB can be analyzed using fine-grained “change” detection at low-level or using “dynamics” of a dataset at high-level. In this thesis, we present a novel knowledge base quality assessment approach using evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. However, the first step in building the quality assessment approach was to identify the quality characteristics. Using high-level change detection as measurement functions, in this thesis we present four quality characteristics: Persistency, Historical Persistency, Consistency and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty Nice. However, high-level changes, being coarse-grained, cannot capture all possible quality issues. In this context, we present a validation strategy whose rationale is twofold. First, using manual validation from qualitative analysis to identify causes of quality issues. Then, use RDF data profiling information to generate integrity constraints. The validation approach relies on the idea of inducing RDF shape by exploiting SHALL constraint components. In particular, this approach will learn, what are the integrity constraints that can be applied to a large KB by instructing a process of statistical analysis, which is followed by a learning model. We illustrate the performance of our validation approach by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. The techniques of quality assessment and validation developed during this work are automatic and can be applied to different knowledge bases independently of the domain. Furthermore, the measures are based on simple statistical operations that make the solution both flexible and scalable

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

An evaluation of the challenges of Multilingualism in Data Warehouse development

Author: Dedić Nedim
STANIER Clare
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we discuss Business Intelligence and define what is meant by support for Multilingualism in a Business Intelligence reporting context. We identify support for Multilingualism as a challenging issue which has implications for data warehouse design and reporting performance. Data warehouses are a core component of most Business Intelligence systems and the star schema is the approach most widely used to develop data warehouses and dimensional Data Marts. We discuss the way in which Multilingualism can be supported in the Star Schema and identify that current approaches have serious limitations which include data redundancy and data manipulation, performance and maintenance issues. We propose a new approach to enable the optimal application of multilingualism in Business Intelligence. The proposed approach was found to produce satisfactory results when used in a proof-of-concept environment. Future work will include testing the approach in an enterprise environmen

Crossref

STORE - Staffordshire Online Repository