Search CORE

2,321 research outputs found

An On-the-fly Provenance Tracking Mechanism for Stream Processing Systems

Author: Moreau Luc
Sansrimahachai Watsawee
Weal Mark J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Applications that operate over streaming data withhigh-volume and real-time processing requirements are becomingincreasingly important. These applications process streamingdata in real-time and deliver instantaneous responses to supportprecise and on-time decisions. In such systems, traceability -the ability to verify and investigate the source of a particularoutput - in real-time is extremely important. This ability allowsraw streaming data to be checked and processing steps to beverified and validated in timely manner. Therefore, it is crucialthat stream systems have a mechanism for dynamically trackingprovenance - the process that produced result data - at executiontime, which we refer to as on-the-fly stream provenance tracking.In this paper, we propose a novel on-the-fly provenance trackingmechanism that enables provenance queries to be performeddynamically without requiring provenance assertions to be storedpersistently. We demonstrate how our provenance mechanismworks by means of an on-the-fly provenance tracking algorithm.The experimental evaluation shows that our provenance solutiondoes not have a significant effect on the normal processing ofstream systems given a 7% overhead. Moreover, our provenancesolution offers low-latency processing (0.3 ms per additionalcomponent) with reasonable memory consumption.<br/

Southampton (e-Prints Soton)

Data Provenance Inference in Logic Programming: Reducing Effort of Instance-driven Debugging

Author: Huq Mohammad Rezwanul
Mileo Alessandra
Wombacher Andreas
Publication venue: University of Twente, Centre for Telematica and Information Technology (CTIT)
Publication date: 01/01/2013
Field of study

Data provenance allows scientists in different domains validating their models and algorithms to find out anomalies and unexpected behaviors. In previous works, we described on-the-fly interpretation of (Python) scripts to build workflow provenance graph automatically and then infer fine-grained provenance information based on the workflow provenance graph and the availability of data. To broaden the scope of our approach and demonstrate its viability, in this paper we extend it beyond procedural languages, to be used for purely declarative languages such as logic programming under the stable model semantics. For experiments and validation, we use the Answer Set Programming solver oClingo, which makes it possible to formulate and solve stream reasoning problems in a purely declarative fashion. We demonstrate how the benefits of the provenance inference over the explicit provenance still holds in a declarative setting, and we briefly discuss the potential impact for declarative programming, in particular for instance-driven debugging of the model in declarative problem solving

University of Twente Research Information

Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

Author: Frank Matthias T.
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2021
Field of study

The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data

KITopen

Directory of Open Access Books (DOAB)

CloudNotes: Annotation Management in Cloud-Based Platforms

Author: Lu Yue
Publication venue: Digital WPI
Publication date: 24/04/2014
Field of study

We present an annotation management system for cloud-based platforms, which is called â€œCloudNotesâ€�. CloudNotes enables the annotation management feature in the scalable Hadoop and MapRedue platforms. In CloudNotes system, every piece of data may have one or more annotations associate with it, and these annotations will be propagated when the data is being transformed through the MapReduce jobs. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data at unprecedented scale and complexity. We propose several extensions to the Hadoop platform that allow end-users to add and retrieve annotations seamlessly. Annotations in CloudNotes will be generated, propagated and managed in a distributed manner. We address several challenges that include attaching annotations to data at various granularities in Hadoop, annotating data in flat files with no known schema until query time, and creating and storing the annotations is a distributed fashion. We also present new storage mechanisms and novel indexing techniques that enable adding the annotations in small increments although Hadoopâ€™s file system is optimized for large batch processing

DigitalCommons@WPI

A Three Tier Architecture Applied to LiDAR Processing and Monitoring

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2006
Field of study

Crossref

Tracing Distributed Data Stream Processing Systems

Author: Benczúr András
Hermann Gábor
Szabo PGN
Zvara Zoltán
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Crossref

SZTAKI Publication Repository