Search CORE

49 research outputs found

Interest-based RDF Update Propagation

Author: B Schandl
G Tummarello
K Voruganti
L Pellegrino
N Popitsch
P-A Chirita
R Verborgh
S Tramp
Publication venue
Publication date: 01/01/2015
Field of study

Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to `subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.Comment: 16 pages, Keywords: Change Propagation, Dataset Dynamics, Linked Data, Replicatio

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?

Author: Fafalios Pavlos
Harth Andreas
Heath Tom
Luczak-Roesch Markus
Miranker Daniel P
Tzitzikas Y.
Verborgh Ruben
Yannakis T.
Publication venue
Publication date: 13/12/2018
Field of study

The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation method is through link traversal, where a query is answered by dereferencing online web resources (URIs) at real time. While several approaches for such a lookup-based query evaluation method have been proposed, there exists no analysis of the types (patterns) of queries that can be directly answered on the live Web, without accessing local or remote endpoints and without a-priori knowledge of available data sources. In this paper, we first provide a method for checking if a SPARQL query (to be evaluated on a SPARQL endpoint) can be answered through zero-knowledge link traversal (without accessing the endpoint), and analyse a large corpus of real SPARQL query logs for finding the frequency and distribution of answerable and non-answerable query patterns. Subsequently, we provide an algorithm for transforming answerable queries to SPARQL-LD queries that bypass the endpoints. We report experimental results about the efficiency of the transformed queries and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

arXiv.org e-Print Archive

Crossref

Using an existing website as a queryable low-cost LOD publishing interface

Author: JD Fernández
P Mika
R Taelman
R Verborgh
RT Fielding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In this demonstrator, we advised the data owner to annotate a digital heritage website with JSON-LD snippets, resulting in a dataset of more than three million triples that is now available and officially maintained. The website itself is paged, and thus hydra partial collection view controls were added in the snippets. We then extended the modular query engine Comunica to support following page controls and extracting data from HTML documents while querying. This way, a SPARQL or GraphQL query over multiple heterogeneous data sources can power automated data reuse. While the query performance on such an interface is visibly poor, it becomes easy to create composite data dumps. As a result of implementing these building blocks in Comunica, any paged collection and enriched HTML page now becomes queryable by the query engine. This enables heterogenous data interfaces to share functionality and become technically interoperable

Crossref

Ghent University Academic Bibliography

Effect of heuristics on serendipity in path-based storytelling with linked data

Author: A Aizawa
A Foster
B Aleman-Meza
D Kumar
F Godin
G Cheng
L De Vocht
L Fang
L Mazuel
P Hart
R Verborgh
RL Cilibrasi
V Franzoni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Path-based storytelling with Linked Data on the Web provides users the ability to discover concepts in an entertaining and educational way. Given a query context, many state-of-the-art pathfinding approaches aim at telling a story that coincides with the user's expectations by investigating paths over Linked Data on the Web. By taking into account serendipity in storytelling, we aim at improving and tailoring existing approaches towards better fitting user expectations so that users are able to discover interesting knowledge without feeling unsure or even lost in the story facts. To this end, we propose to optimize the link estimation between - and the selection of facts in a story by increasing the consistency and relevancy of links between facts through additional domain delineation and refinement steps. In order to address multiple aspects of serendipity, we propose and investigate combinations of weights and heuristics in paths forming the essential building blocks for each story. Our experimental findings with stories based on DBpedia indicate the improvements when applying the optimized algorithm

Crossref

Ghent University Academic Bibliography

Publikationsserver der RWTH Aachen University

Weaving the Web(VTT) of Data

Author: Champin P.-A.
Encelle B.
Mühleisen H.F. (Hannes)
Prié Y.
Steiner T.
Verborgh R.
Publication venue: CEUR-WS
Publication date: 01/01/2014
Field of study

International audienceVideo has become a first class citizen on the Web with broad support in all common Web browsers. Where with struc- tured mark-up on webpages we have made the vision of the Web of Data a reality, in this paper, we propose a new vi- sion that we name the Web(VTT) of Data, alongside with concrete steps to realize this vision. It is based on the evolving standards WebVTT for adding timed text tracks to videos and JSON-LD, a JSON-based format to serial- ize Linked Data. Just like the Web of Data that is based on the relationships among structured data, the Web(VTT) of Data is based on relationships among videos based on WebVTT files, which we use as Web-native spatiotemporal Linked Data containers with JSON-LD payloads. In a first step, we provide necessary background information on the technologies we use. In a second step, we perform a large- scale analysis of the 148 terabyte size Common Crawl corpus in order to get a better understanding of the status quo of Web video deployment and address the challenge of integrat- ing the detected videos in the Common Crawl corpus into the Web(VTT) of Data. In a third step, we open-source an online video annotation creation and consumption tool, targeted at videos not contained in the Common Crawl cor- pus and for integrating future video creations, allowing for weaving the Web(VTT) of Data tighter, video by video

CWI's Institutional Repository

Ghent University Academic Bibliography

HAL

Hal-Diderot

HDTQ: Managing RDF Datasets in Compressed Space

Author: A Zimmermann
JD Fernández
JM Banda
MA Martínez-Prieto
P Boncz
R Verborgh
W Beek
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems

Crossref

Elektronische Publikationen der Wirtschaftsuniversität Wien

Facilitating the analysis of COVID-19 literature through a knowledge graph

Author: JP McCusker
KG Andersen
LL Haak
M Färber
M Schlichtkrull
NF Noy
P Heyvaert
P Ristoski
R Taelman
R Verborgh
S Auer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

At the end of 2019, Chinese authorities alerted the World Health Organization (WHO) of the outbreak of a new strain of the coronavirus, called SARS-CoV-2, which struck humanity by an unprecedented disaster a few months later. In response to this pandemic, a publicly available dataset was released on Kaggle which contained information of over 63,000 papers. In order to facilitate the analysis of this large mass of literature, we have created a knowledge graph based on this dataset. Within this knowledge graph, all information of the original dataset is linked together, which makes it easier to search for relevant information. The knowledge graph is also enriched with additional links to appropriate, already existing external resources. In this paper, we elaborate on the different steps performed to construct such a knowledge graph from structured documents. Moreover, we discuss, on a conceptual level, several possible applications and analyses that can be built on top of this knowledge graph. As such, we aim to provide a resource that allows people to more easily build applications that give more insights into the COVID-19 pandemic

Crossref

Ghent University Academic Bibliography

A Decentralized Architecture for Sharing and Querying Semantic Data

Author: A Grall
C Buil-Aranda
E Adar
G Montoya
JD Fernández
M Schmidt
O Hartig
P Folz
R Verborgh
S Voulgaris
T Minier
Z Kaoudi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/05/2019
Field of study

Crossref

VBN

The smartAPI ecosystem for making Web APIs FAIR

Author: Afrasiabe C.
Assis P.
Availlach P.
Dastgheib S.
De Pons J.
Dumontier M.
Jagodnik K.
Korodi G.
Pilarczyk M.
Schürer S.
Terryn R.
Verborgh Ruben
Whetzel T.
Wu C
Zaveri A.
Publication venue
Publication date: 01/01/2017
Field of study

Maastricht University Research Portal

Ghent University Academic Bibliography

EcoDaLo : federating advertisement targeting with linked data

Author: A Nill
C Bizer
E Thomas
HJ Pandit
HT Zheng
JD Fernández
MA Martínez-Prieto
MW van Someren
P Heyvaert
R Taelman
R Verborgh
S Kirrane
SC Boerman
T Thanapalasingam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

A key source of revenue for the media and entertainmentdomain isad targeting: serving advertisements to a select set of visitorsbased on various captured visitor traits. Compared to global media com-panies such as Google and Facebook that aggregate data from varioussources (and the privacy concerns these aggregations bring), local compa-nies only capture a small number of (high-quality) traits and retrieve anunbalanced small amount of revenue. To increase these local publishers’competitive advantage, they need to join forces, whilst taking the visi-tors’ privacy concerns into account. The EcoDaLo consortium, located in Belgium and consisting of Adlogix, Pebble Media, and Roularta MediaGroup as founding partners, aims to combine local publishers’ data without requiring these partners to share this data across the consortium.Usage of Semantic Web technologies enables a decentralized approachwhere federated querying allows local companies to combine their captured visitor traits, and better target visitors, without aggregating alldata. To increase potential uptake, technical complexity to join this consortium is kept minimal, and established technology is used where possible. This solution was showcased in Belgium which provided the participating partners valuable insights and suggests future research challenges. Perspectives are to enlarge the consortium and provide measurable impact in ad targeting to local publishers

Crossref

Ghent University Academic Bibliography