44,166 research outputs found
Designing Traceability into Big Data Systems
Providing an appropriate level of accessibility and traceability to data or
process elements (so-called Items) in large volumes of data, often
Cloud-resident, is an essential requirement in the Big Data era.
Enterprise-wide data systems need to be designed from the outset to support
usage of such Items across the spectrum of business use rather than from any
specific application view. The design philosophy advocated in this paper is to
drive the design process using a so-called description-driven approach which
enriches models with meta-data and description and focuses the design process
on Item re-use, thereby promoting traceability. Details are given of the
description-driven design of big data systems at CERN, in health informatics
and in business process management. Evidence is presented that the approach
leads to design simplicity and consequent ease of management thanks to loose
typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International
Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore
July 2015. arXiv admin note: text overlap with arXiv:1402.5764,
arXiv:1402.575
Design discussion on the ISDA Common Domain Model
A new initiative from the International Swaps and Derivatives Association
(ISDA) aims to establish a "Common Domain Model" (ISDA CDM): a new standard for
data and process representation across the full range of derivatives
instruments. Design of the ISDA CDM is at an early stage and the draft
definition contains considerable complexity. This paper contributes by offering
insight, analysis and discussion relating to key topics in the design space
such as data lineage, timestamps, consistency, operations, events, state and
state transitions.Comment: 19 page
Recording accurate process documentation in the presence of failures
Scientific and business communities present unprecedented requirements on provenance, where the provenance of some data item is the process that led to that data item. Previous work has conceived a computer-based representation of past executions for determining provenance, termed process documentation, and has developed a protocol, PReP, to record process documentation in service oriented architectures. However, PReP assumes a failure free environment. The presence of failures may lead to inaccurate process documentation, which does not reflect reality and hence cannot be trustful and utilized. This paper outlines our solution, F-PReP, a protocol for recording accurate process documentation in the presence of failures
A Blockchain-based Approach for Data Accountability and Provenance Tracking
The recent approval of the General Data Protection Regulation (GDPR) imposes
new data protection requirements on data controllers and processors with
respect to the processing of European Union (EU) residents' data. These
requirements consist of a single set of rules that have binding legal status
and should be enforced in all EU member states. In light of these requirements,
we propose in this paper the use of a blockchain-based approach to support data
accountability and provenance tracking. Our approach relies on the use of
publicly auditable contracts deployed in a blockchain that increase the
transparency with respect to the access and usage of data. We identify and
discuss three different models for our approach with different granularity and
scalability requirements where contracts can be used to encode data usage
policies and provenance tracking information in a privacy-friendly way. From
these three models we designed, implemented, and evaluated a model where
contracts are deployed by data subjects for each data controller, and a model
where subjects join contracts deployed by data controllers in case they accept
the data handling conditions. Our implementations show in practice the
feasibility and limitations of contracts for the purposes identified in this
paper
An On-the-fly Provenance Tracking Mechanism for Stream Processing Systems
Applications that operate over streaming data withhigh-volume and real-time processing requirements are becomingincreasingly important. These applications process streamingdata in real-time and deliver instantaneous responses to supportprecise and on-time decisions. In such systems, traceability -the ability to verify and investigate the source of a particularoutput - in real-time is extremely important. This ability allowsraw streaming data to be checked and processing steps to beverified and validated in timely manner. Therefore, it is crucialthat stream systems have a mechanism for dynamically trackingprovenance - the process that produced result data - at executiontime, which we refer to as on-the-fly stream provenance tracking.In this paper, we propose a novel on-the-fly provenance trackingmechanism that enables provenance queries to be performeddynamically without requiring provenance assertions to be storedpersistently. We demonstrate how our provenance mechanismworks by means of an on-the-fly provenance tracking algorithm.The experimental evaluation shows that our provenance solutiondoes not have a significant effect on the normal processing ofstream systems given a 7% overhead. Moreover, our provenancesolution offers low-latency processing (0.3 ms per additionalcomponent) with reasonable memory consumption.<br/
Provenance Threat Modeling
Provenance systems are used to capture history metadata, applications include
ownership attribution and determining the quality of a particular data set.
Provenance systems are also used for debugging, process improvement,
understanding data proof of ownership, certification of validity, etc. The
provenance of data includes information about the processes and source data
that leads to the current representation. In this paper we study the security
risks provenance systems might be exposed to and recommend security solutions
to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc
- âŠ