Search CORE

419 research outputs found

Substring filtering for low-cost linked data interfaces

Author: E Minack
I Ermilov
J Van Herwegen
L Rietveld
M Arias Gallego
M Nelson
MP Ferguson
NR Brisaboa
O Erling
R Li
R Verborgh
S van Hooland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries

Crossref

Ghent University Academic Bibliography

Emergent relational schemas for RDF

Author: Pham M.-D. (Minh-Duc)
Publication venue
Publication date: 06/09/2018
Field of study

CWI's Institutional Repository

Modelling and Querying Lists in RDF. A Pragmatic Study

Author: Daga Enrico
Meroño-Peñuela Albert
Motta Enrico
Publication venue
Publication date: 01/11/2019
Field of study

Many Linked Data datasets model elements in their domains in the form of lists: a countable number of ordered resources. When pub- lishing these lists in RDF, an important concern is making them easy to consume. Therefore, a well-known recommendation is to find an existing list modelling solution, and reuse it. However, a specific domain model can be implemented in different ways and vocabularies may provide al- ternative solutions. In this paper, we argue that a wrong decision could have a significant impact in terms of performance and, ultimately, the availability of the data. We take the case of RDF Lists and make the hy- pothesis that the efficiency of retrieving sequential linked data depends primarily on how they are modelled (triple-store invariance hypothe- sis). To demonstrate this, we survey different solutions for modelling sequences in RDF, and propose a pragmatic approach for assessing their impact on data availability. Finally, we derive good (and bad) practices on how to publish lists as linked open data. By doing this, we sketch the foundations of an empirical, task-oriented methodology for benchmark- ing linked data modelling solutions

VU Research Portal

Open Research Online (The Open University)

Storing and querying evolving knowledge graphs on the web

Author: Taelman Ruben
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

S2ST: A Relational RDF Database Management System

Author: Piazza Anthony T.
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/12/2009
Field of study

The explosive growth of RDF data on the Semantic Web drives the need for novel database systems that can efficiently store and query large RDF datasets. To achieve good performance and scalability of query processing, most existing RDF storage systems use a relational database management system as a backend to manage RDF data. In this paper, we describe the design and implementation of a Relational RDF Database Management System. Our main research contributions are: (1) We propose a formal model of a Relational RDF Database Management System (RRDBMS), (2) We propose generic algorithms for schema, data and query mapping, (3) We implement the first and only RRDBMS, S2ST, that supports multiple relational database management systems, user-customizable schema mapping, schema-independent data mapping, and semantics-preserving query translation

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains

Author: Aerts Jan
Akune Yukie
Antezana Erick
Aoki-Kinoshita Kiyoko F
Arakawa Kazuharu
Aranda Bruno
Baran Joachim
Bolleman Jerven
Bonnal Raoul JP
Bono Hidemasa
Buttigieg Pier Luigi
Campbell Matthew P
Chen Yi-an
Chiba Hirokazu
Cock Peter JA
Cohen K Bretonnel
Constantin Alexandru
Duck Geraint
Dumontier Michel
Fujisawa Takatomo
Fujiwara Toyofumi
Goto Naohisa
Hoehndorf Robert
Igarashi Yoshinobu
Itaya Hidetoshi
Ito Maori
Iwasaki Wataru
Kalaš Matúš
Kano Yoshinobu
Katayama Toshiaki
Katoda Takeo
Kawamoto Shoko
Kawano Shin
Kawashima Shuichi
Kim Jin-Dong
Kim Taehong
Kocbek Simon
Kokubu Anna
Komiyama Yusuke
Kotera Masaaki
Laibe Camille
Lapp Hilmar
Lütteke Thomas
Marshall M Scott
Mori Hiroshi
Mori Takaaki
Morita Mizuki
Murakami Katsuhiko
Nakao Mitsuteru
Narimatsu Hisashi
Nishide Hiroyo
Nishimura Yosuke
Nystrom-Persson Johan
Ogishima Soichi
Okamoto Shinobu
Okamura Yasunobu
Okuda Shujiro
Ono Hiromasa
Oshita Kazuki
Packer Nicki H
Prins Pjotr
Ranzinger Rene
Rocca-Serra Philippe
Sansone Susanna
Sawaki Hiromichi
Shin Sung-Ho
Splendiani Andrea
Strozzi Francesco
Tadaka Shu
Takagi Toshihisa
Toukach Philip
Uchiyama Ikuo
Umezaki Masahito
Vos Rutger
Wang Yue
Whetzel Patricia L
Wilkinson Mark D
Wu Hongyan
Yamada Issaku
Yamaguchi Atsuko
Yamamoto Yasunori
Yamasaki Chisato
Yamashita Riu
York William S
Zmasek Christian M
Publication venue
Publication date: 01/01/2014
Field of study

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed

Maastricht University Research Portal

University of Bergen

Crossref

Aberystwyth Research Portal

Springer - Publisher Connector

PubMed Central

Electronic Publication Information Center

NORA - Norwegian Open Research Archives

Macquarie University ResearchOnline

Access to Research at National University of Ireland, Galway

Generating public transport data based on population distributions for RDF benchmarking

Author: Colpaert Pieter
Mannens Erik
Taelman Ruben
Verborgh Ruben
Publication venue: 'IOS Press'
Publication date: 01/01/2019
Field of study

When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data

Ghent University Academic Bibliography

Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine

Author: Aidan Hogan
Alani
Andreas Harth
Axel Polleres
Batsakis
Bechhofer
Berners-Lee
Bizer
Boldi
Bonatti
Brin
Broekstra
Caverlee
Chakrabarti
Chen
Cheng
Dietze
Diligenti
Ding
Dong
Ehrig
Elmagarmid
Erdös
Fagin
Fensel
Friendly
Glaser
Harth
Harth
Hatcher
He
Heydon
Hirai
Hitzler
Hogan
Huynh
Jürgen Umbrich
Kleinberg
Lee
Lopez
Meditskos
Najork
Neumann
Newcombe
Oren
Oren
Pant
Polleres
Sheila Kinsella
Stefan Decker
Stonebraker
ter Horst
Thelwall
Wei
Weiss
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Learning Based Framework for Improving Querying on Web Interfaces of Curated Knowledge Bases

Author: Ali Shemshadi
Altman Naomi S.
Elbassuoni Shady
Han Jiawei
Hasan Rakebul
Kerry Taylor
Lehmann Jens
Lina Yao
Lorey Johannes
Morsey Mohamed
Quan Z. Sheng
Shu Yanfeng
Wei Emma Zhang
Yongrui Qin
Zhang Wei Emma
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Knowledge Bases (KBs) are widely used as one of the fundamental components in Semantic Web applications as they provide facts and relationships that can be automatically understood by machines. Curated knowledge bases usually use Resource Description Framework (RDF) as the data representation model. To query the RDF-presented knowledge in curated KBs, Web interfaces are built via SPARQL Endpoints. Currently, querying SPARQL Endpoints has problems like network instability and latency, which affect the query efficiency. To address these issues, we propose a client-side caching framework, SPARQL Endpoint Caching Framework (SECF), aiming at accelerating the overall querying speed over SPARQL Endpoints. SECF identifies the potential issued queries by leveraging the querying patterns learned from clients’ historical queries and prefecthes/caches these queries. In particular, we develop a distance function based on graph edit distance to measure the similarity of SPARQL queries. We propose a feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms. A time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement. Extensive experiments performed on real-world queries showcase the effectiveness of our approach, which outperforms the state-of-the-art work in terms of the overall querying speed

Crossref

Adelaide Research & Scholarship

University of Huddersfield Repository

Huddersfield Research Portal