314 research outputs found

    On correctness in RDF stream processor benchmarking

    Get PDF
    Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method

    Heaven Test Stand: Towards Comparative Research on RSP Engines

    Get PDF

    Expressive Stream Reasoning with Laser

    Full text link
    An increasing number of use cases require a timely extraction of non-trivial knowledge from semantically annotated data streams, especially on the Web and for the Internet of Things (IoT). Often, this extraction requires expressive reasoning, which is challenging to compute on large streams. We propose Laser, a new reasoner that supports a pragmatic, non-trivial fragment of the logic LARS which extends Answer Set Programming (ASP) for streams. At its core, Laser implements a novel evaluation procedure which annotates formulae to avoid the re-computation of duplicates at multiple time points. This procedure, combined with a judicious implementation of the LARS operators, is responsible for significantly better runtimes than the ones of other state-of-the-art systems like C-SPARQL and CQELS, or an implementation of LARS which runs on the ASP solver Clingo. This enables the application of expressive logic-based reasoning to large streams and opens the door to a wider range of stream reasoning use cases.Comment: 19 pages, 5 figures. Extended version of accepted paper at ISWC 201

    Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

    Full text link
    OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc

    Filtering Real-Time Linked Data Streams

    Get PDF
    Viimastel aastetel on veebis kiiresti kasvanud lingitud andmete hulk. Lingitud andmeid, mis on tihti kodeeritud RDF-formaadis, peetakse “viie tĂ€rni” andmeteks avatud andmete kontekstis tĂ€nu nende kasutatavusele ja potentsiaalile. Kuigi on mĂ€rgata progressi lingitud andmete tehnoloogiate arengus ja nende töötlemises, pole veel suudetud nende tĂ€it potentsiaali saavutada. Üks vĂ€ljakutsetest on lingitud andmevoogude peal jĂ€relduste tegemine, mis on alles hiljuti hakkanud uuringutes koguma hoogu. Nende tulemusena on pakutud vĂ€lja pĂ€ringu keeled nagu C-SPARQL ja loodud tuletusmootorite implementatsioonid. Neid mootoreid on senini testitud ainult akadeemilistes keskkondades. Selle töö eesmĂ€rk on luua tĂ€ielikult töötav prototĂŒĂŒp lingitud andmevoogude töötlemiseks sĂ”numipĂ”histes sĂŒsteemides, mis suudab lingitud andmetest koosnevat sĂ”numite jĂ€rjekorda nĂ€ha kui andmevoogu ja filtreerida seda C-SPARQL-i mootoriga, mis oli ĂŒks esimesi omalaadseid. Selle sĂŒsteemi sĂŒdames olevat C-SPARQL-i mootorit testisime CityBench vĂ”rdlusuuringu programmiga vĂ”ttes arvesse Ă€rivaldkonda kuuluvat reaalaja rakendust Inforegister NOW!, mis on veel arendusfaasis.The amount of linked data in the Web has increased rapidly in recent years. Linked data, often encoded in RDF, is considered as five-star data in the context of open data due to its usability and potential. Although there has been progress in development of linked data technologies and data processing models, still the full potential of linked data has not been realized. One of the challenges is reasoning over linked data streams, which has just recently gained momentum in research. As a result query languages, such as C-SPARQL, have been proposed and corresponding stream reasoning engines have been implemented. However, such implementations have been evaluated so far mostly in academic settings. This work describes a fully functional proof of concept implementation of a stream reasoning system for message-oriented systems, which is capable of exposing a message queue as a linked data stream, which can be filtered by using C-SPARQL - one of the earliest linked data processing engines. The performance of the C-SPARQL engine, which lies at the heart of the implementation, is evaluated by using CityBench benchmark with settings of an enterprise-scale real-time economy application Inforegister NOW!, which is currently under development

    SRBench: A streaming RDF/SPARQL benchmark

    Full text link
    We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art

    Stream WatDiv - A Streaming RDF Benchmark

    Get PDF
    Modern applications are required to process stream data which are semantically tagged. Sometimes static background data interlinked with stream data are also needed to answer the query. To meet these requirements, streaming RDF processing (SRP) engines emerged in recent years. Although most SRP engines adopt the same streaming RDF data model in which a streaming RDF triple is an RDF triple annotated with a timestamp, there is no standard query language, which means every engine has their own language syntax. In addition, these engines are quite primitive, different engines support limited and different query operation sets. What's more, they are fragile in face of complex query, high stream rate or large static dataset. This poses a lot of challenges to evaluate the SRP engines. In our work, we show that previous streaming RDF benchmarks do not have a sufficient workload to understand engine's performance. The queries in those workloads are either not executable on existing engines, or very limited in terms of number. The goal of this work is to propose a benchmark which provides diversified datasets and workloads. In our work, we extend WatDiv to generate streaming data and streaming query, and propose a new streaming RDF benchmark, called Stream WatDiv. WatDiv is an RDF benchmark designed for diversified stress testing of RDF data management engines. It introduces a collection of query features, which is used to assess the diversity of dataset and workloads. Through proper data schema design and query generation, WatDiv shows a good coverage of values of these query features. We demonstrate the feasibility of applying the same idea in streaming RDF domain. Stream WatDiv benchmark suits contain a data generator to generate scalable streaming data and static data, a query generator to generate scalable workloads, and a testbed to monitor the engine's output. We evaluate two engines, C-SPARQL and CQELS, and measure the correctness of engine output, latency and memory consumption. The findings contain two parts. First, we validate the result related to these two engines in previous works. (1) CQELS is more robust and efficient than C-SPARQL at processing streaming RDF queries in most cases. (2) increasing streaming rate and integrating static data will significantly degrade C-SPARQL's performance, while CQELS is marginally affected. (3) C-SPARQL is more memory-efficient than CQELS. Second, the diversity of Stream WatDiv workloads helps detect engines' issues that are not captured before. Queries can be grouped into different types based on the query features. These types of queries can be used to evaluate a specific engine features. (1) Triple pattern count of a query influences C-SPARQL's performance. (2) Both C-SPARQL and CQELS show a significant latency increase when the query has larger result cardinality. (3) Neither of these two engines are biased toward processing linear, star or snowflake queries. (4) CQELS is more efficient at handling queries with variously selective triple patterns, while C-SPARQL performs better for queries with equally selective triple patterns than queries with variously selective triple patterns

    RDF Stream Processing: Let's React

    Get PDF
    Stream processing has recently gained a prominent role in Computer Science research. From networks or databases to information theory or programming languages, a lot of work has been dedicated to conceive ways of representing, transmitting, processing and understanding infinite sequences of data. Nevertheless, there are still aspects that need time to reach a mature state. In particular, heterogeneity in stream data management and event processing is both a challenging topic and a key enabler for the rising Web of Things, where smart devices continuously sense properties of the surrounding world. Different proposals on RDF and Linked Data streams have shown promising results for managing this type of data, while keeping explicit semantics on the data streams, and linking them to other datasets in a web-friendly way. With time, these efforts led to the emergence of initiatives such as the RDF Stream Processing (RSP) W3C community group, aiming at specifying a base RDF stream model and query language for that model. Although these works produced interest results in defining overarching model definitions, there are still multiple orthogonal challenges that need to be addressed. In this work we identify some of these challenges, and we link them to the characteristics of what are nowadays called reactive systems. This paradigm includes natively supporting event-driven asynchronous message passing, non-blocking data communication and processing through all layers, and on-demand flexible scalability. We argue that RDF stream systems, combined with reactive techniques can lead to powerful, resilient and interoperable systems at Web scale
    • 

    corecore