Search CORE

9 research outputs found

Co-evolution of RDF Datasets

Author: A Motro
C Buil-Aranda
G Tummarello
L-D Ibáñez
M Saleem
M Schmachtenberg
R Verborgh
S Auer
T Knap
Publication venue
Publication date: 01/01/2016
Field of study

Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

CORE

31. međunarodna konferencija Very Large Data Bases

Author: Galić Zdravko
Publication venue: Croatian geodetic society
Publication date: 01/01/2005
Field of study

Dana je vijest o održanoj 31. međunarodnoj konferenciji Very Large Data Bases

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A Probabilistic Data Fusion Modeling Approach for Extracting True Values from Uncertain and Conflicting Attributes

Author: Al-Ahmad A.
Al-Ahmad A.
Ali O.
Ali O.
Alzoubi Y. I.
Alzoubi Y. I.
Deraman A.
Deraman A.
Jaradat A.
Jaradat A.
Safieddine F.
Safieddine F.
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Real-world data obtained from integrating heterogeneous data sources are often multi-valued, uncertain, imprecise, error-prone, outdated, and have different degrees of accuracy and correctness. It is critical to resolve data uncertainty and conflicts to present quality data that reflect actual world values. This task is called data fusion. In this paper, we deal with the problem of data fusion based on probabilistic entity linkage and uncertainty management in conflict data. Data fusion has been widely explored in the research community. However, concerns such as explicit uncertainty management and on-demand data fusion, which can cope with dynamic data sources, have not been studied well. This paper proposes a new probabilistic data fusion modeling approach that attempts to find true data values under conditions of uncertain or conflicted multi-valued attributes. These attributes are generated from the probabilistic linkage and merging alternatives of multi-corresponding entities. Consequently, the paper identifies and formulates several data fusion cases and sample spaces that require further conditional computation using our computational fusion method. The identification is established to fit with a real-world data fusion problem. In the real world, there is always the possibility of heterogeneous data sources, the integration of probabilistic entities, single or multiple truth values for certain attributes, and different combinations of attribute values as alternatives for each generated entity. We validate our probabilistic data fusion approach through mathematical representation based on three data sources with different reliability scores. The validity of the approach was assessed via implementation into our probabilistic integration system to show how it can manage and resolve different cases of data conflicts and inconsistencies. The outcome showed improved accuracy in identifying true values due to the association of constructive evidence

UEL Research Repository at University of East London

Automated conflict resolution in collaborative data sharing systems using community feedbacks

Author: Abdelmounaam Rezgui
Fayez Khazalah
Zaki Malik
Publication venue
Publication date: 23/04/2020
Field of study

a b s t r a c t In collaborative data sharing systems, groups of users usually work on disparate schemas and database instances, and agree to share the related data among them (periodically). Each group can extend, curate, and revise its own database instance in a disconnected mode. At some later point, the group can publish its updates to other groups and get updates of other ones (if any). The reconciliation operation in the CDSS engine is responsible for propagating updates and handling any data disagreements between the different groups. If a conflict is found, any involved updates are rejected temporally and marked as deferred. Deferred updates are not accepted by the reconciliation operation until a user resolves the conflict manually. In this paper, we propose an automated conflict resolution approach that depends on community feedbacks, to handle the conflicts that may arise in collaborative data sharing communities, with potentially disparate schemas and data instances. The experiment results show that extending the CDSS by our proposed approach can resolve such conflicts in an accurate and efficient manner

CiteSeerX

An Ontology-Oriented Architecture for Dealing With Heterogeneous Data Applied to Telemedicine Systems

Author: Ferrández Antonio
Gil David
Mora Higinio
Muñoz Terol Rafael
Peral Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Current trends in medicine regarding issues of accessibility to and the quantity and quality of information and quality of service are very different compared to former decades. The current state requires new methods for addressing the challenge of dealing with enormous amounts of data present and growing on the Web and other heterogeneous data sources such as sensors and social networks and unstructured data, normally referred to as big data. Traditional approaches are not enough, at least on their own, although they were frequently used in hybrid architectures in the past. In this paper, we propose an architecture to process big data, including heterogeneous sources of information. We have defined an ontology-oriented architecture, where a core ontology has been used as a knowledge base and allows data integration of different heterogeneous sources. We have used natural language processing and artificial intelligence methods to process and mine data in the health sector to uncover the knowledge hidden in diverse data sources. Our approach has been applied to the field of personalized medicine (study, diagnosis, and treatment of diseases customized for each patient) and it has been used in a telemedicine system. A case study focused on diabetes is presented to prove the validity of the proposed model.This work was supported in part by the Spanish Ministry of Economy and Competitiveness (MINECO) under Project SEQUOIA-UA (TIN2015-63502-C3-3-R) and Project RESCATA (TIN2015-65100-R) and in part by the Spanish Research Agency (AEI) and the European Regional Development Fund (FEDER) under Project CloudDriver4Industry (TIN2017-89266-R)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Data quality evaluation through data quality rules and data provenance.

Author: Zanzi Antonella
Publication venue: Italy
Publication date
Field of study

The application and exploitation of large amounts of data play an ever-increasing role in today’s research, government, and economy. Data understanding and decision making heavily rely on high quality data; therefore, in many different contexts, it is important to assess the quality of a dataset in order to determine if it is suitable to be used for a specific purpose. Moreover, as the access to and the exchange of datasets have become easier and more frequent, and as scientists increasingly use the World Wide Web to share scientific data, there is a growing need to know the provenance of a dataset (i.e., information about the processes and data sources that lead to its creation) in order to evaluate its trustworthiness. In this work, data quality rules and data provenance are used to evaluate the quality of datasets. Concerning the first topic, the applied solution consists in the identification of types of data constraints that can be useful as data quality rules and in the development of a software tool to evaluate a dataset on the basis of a set of rules expressed in the XML markup language. We selected some of the data constraints and dependencies already considered in the data quality field, but we also used order dependencies and existence constraints as quality rules. In addition, we developed some algorithms to discover the types of dependencies used in the tool. To deal with the provenance of data, the Open Provenance Model (OPM) was adopted, an experimental query language for querying OPM graphs stored in a relational database was implemented, and an approach to design OPM graphs was proposed

InsubriaSPACE - Thesis PhD Repository

Handling metadata in the scope of coreference detection in data collections

Author: Szymczak Marcin
Publication venue: Polisch Academy of Sciences. Systems Research Institute ; Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Automatic Data Fusion with HumMer

Author: Alexander Bilke
Christoph Böhm
Felix Naumann
Jens Bleiholder
Karsten Draba
Melanie Weis
Publication venue
Publication date: 01/09/2005
Field of study

Heterogeneous and dirty data is abundant. It is store

CiteSeerX

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin