Search CORE

67 research outputs found

A vocabulary-independent generation framework for DBpedia and beyond

Author: De Meester Ben
Dimou Anastasia
Hellman S.
Kontokostas D
Lehmann J.
Mannens Erik
Maroy Wouter
Verborgh Ruben
Publication venue
Publication date: 01/01/2017
Field of study

The dbpedia Extraction Framework, the generation framework behind one of the Linked Open Data cloud’s central hubs, has limitations which lead to quality issues with the dbpedia dataset. Therefore, we provide a new take on its Extraction Framework that allows for a sustainable and general-purpose Linked Data generation framework by adapting a semantic-driven approach. The proposed approach decouples, in a declarative manner, the extraction, transformation, and mapping rules execution. This way, among others, interchanging different schema annotations is supported, instead of being coupled to a certain ontology as it is now, because the dbpedia Extraction Framework allows only generating a certain dataset with a single semantic representation. In this paper, we shed more light to the added value that this aspect brings. We provide an extracted dbpedia dataset using a different vocabulary, and give users the opportunity to generate a new dbpedia dataset using a custom combination of vocabularies

Ghent University Academic Bibliography

Fraunhofer-ePrints

Test-driven assessment of [r2]rml mappings to improve dataset quality

Author: Dimou Anastasia
Freudenberg M.
Hellmann S.
Kontokostas D.
Lehmann J.
Mannens Erik
Van de Walle Rik
Verborgh Ruben
Publication venue
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Co-evolution of RDF Datasets

Author: A Motro
C Buil-Aranda
G Tummarello
L-D Ibáñez
M Saleem
M Schmachtenberg
R Verborgh
S Auer
T Knap
Publication venue
Publication date: 01/01/2016
Field of study

Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

CORE

Assessing and refining mappings to RDF to improve dataset quality

Author: Dimou Anastasia
Freudenberg Markus
Hellmann Sebastian
Kontokostas Dimitirs
Lehmann Jens
Mannens Erik
Van de Walle Rik
Verborgh Ruben
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases

Ghent University Academic Bibliography

Automated metadata generation for linked data generation and publishing workflows

Author: De Nies Tom
Dimou Anastasia
Mannens Erik
Mechant Peter
Van de Walle Rik
Verborgh Ruben
Publication venue: CEUR-WS.org
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Opportunistic linked data querying through approximate membership metadata

Author: BH Bloom
C Buil-Aranda
E Oren
G Aluç
I Ermilov
I Filali
M Schmachtenberg
R Gallager
R Verborgh
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface

Crossref

Ghent University Academic Bibliography

Blocking for Entity Resolution in the Web of Data : Challenges and Algorithms

Author: Kostas Stefanidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2019
Field of study

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University