Search CORE

44,166 research outputs found

Designing Traceability into Big Data Systems

Author: Branson Andrew
Consortium the CRISTAL-ISE
Kovacs Zsolt
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2015
Field of study

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

arXiv.org e-Print Archive

CERN Document Server

Design discussion on the ISDA Common Domain Model

Author: Clack Christopher D.
Publication venue
Publication date: 01/01/2018
Field of study

A new initiative from the International Swaps and Derivatives Association (ISDA) aims to establish a "Common Domain Model" (ISDA CDM): a new standard for data and process representation across the full range of derivatives instruments. Design of the ISDA CDM is at an early stage and the draft definition contains considerable complexity. This paper contributes by offering insight, analysis and discussion relating to key topics in the design space such as data lineage, timestamps, consistency, operations, events, state and state transitions.Comment: 19 page

arXiv.org e-Print Archive

UCL Discovery

Recording accurate process documentation in the presence of failures

Author: Chen Zheng
Moreau Luc
Publication venue
Publication date: 03/06/2007
Field of study

Scientific and business communities present unprecedented requirements on provenance, where the provenance of some data item is the process that led to that data item. Previous work has conceived a computer-based representation of past executions for determining provenance, termed process documentation, and has developed a protocol, PReP, to record process documentation in service oriented architectures. However, PReP assumes a failure free environment. The presence of failures may lead to inaccurate process documentation, which does not reflect reality and hence cannot be trustful and utilized. This paper outlines our solution, F-PReP, a protocol for recording accurate process documentation in the presence of failures

Southampton (e-Prints Soton)

A Blockchain-based Approach for Data Accountability and Provenance Tracking

Author: Azaria A.
Nakamoto S.
Neisse R.
Wood G.
Publication venue
Publication date: 14/06/2017
Field of study

The recent approval of the General Data Protection Regulation (GDPR) imposes new data protection requirements on data controllers and processors with respect to the processing of European Union (EU) residents' data. These requirements consist of a single set of rules that have binding legal status and should be enforced in all EU member states. In light of these requirements, we propose in this paper the use of a blockchain-based approach to support data accountability and provenance tracking. Our approach relies on the use of publicly auditable contracts deployed in a blockchain that increase the transparency with respect to the access and usage of data. We identify and discuss three different models for our approach with different granularity and scalability requirements where contracts can be used to encode data usage policies and provenance tracking information in a privacy-friendly way. From these three models we designed, implemented, and evaluated a model where contracts are deployed by data subjects for each data controller, and a model where subjects join contracts deployed by data controllers in case they accept the data handling conditions. Our implementations show in practice the feasibility and limitations of contracts for the purposes identified in this paper

arXiv.org e-Print Archive

Crossref

An On-the-fly Provenance Tracking Mechanism for Stream Processing Systems

Author: Moreau Luc
Sansrimahachai Watsawee
Weal Mark J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Applications that operate over streaming data withhigh-volume and real-time processing requirements are becomingincreasingly important. These applications process streamingdata in real-time and deliver instantaneous responses to supportprecise and on-time decisions. In such systems, traceability -the ability to verify and investigate the source of a particularoutput - in real-time is extremely important. This ability allowsraw streaming data to be checked and processing steps to beverified and validated in timely manner. Therefore, it is crucialthat stream systems have a mechanism for dynamically trackingprovenance - the process that produced result data - at executiontime, which we refer to as on-the-fly stream provenance tracking.In this paper, we propose a novel on-the-fly provenance trackingmechanism that enables provenance queries to be performeddynamically without requiring provenance assertions to be storedpersistently. We demonstrate how our provenance mechanismworks by means of an on-the-fly provenance tracking algorithm.The experimental evaluation shows that our provenance solutiondoes not have a significant effect on the normal processing ofstream systems given a 7% overhead. Moreover, our provenancesolution offers low-latency processing (0.3 ms per additionalcomponent) with reasonable memory consumption.<br/

Southampton (e-Prints Soton)

Crossref

Provenance Threat Modeling

Author: Brooks Richard R.
Hambolu Oluwakemi
Mukhopadhyay Ujan
Oakley Jon
Skjellum Anthony
Yu Lu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/03/2017
Field of study

Provenance systems are used to capture history metadata, applications include ownership attribution and determining the quality of a particular data set. Provenance systems are also used for debugging, process improvement, understanding data proof of ownership, certification of validity, etc. The provenance of data includes information about the processes and source data that leads to the current representation. In this paper we study the security risks provenance systems might be exposed to and recommend security solutions to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc

arXiv.org e-Print Archive

Crossref