Search CORE

14 research outputs found

Data, Responsibly: Fairness, Neutrality and Transparency in Data Analysis

Author: Gerome Miklau
Julia Stoyanovich
Serge Abiteboul
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT Big data technology holds incredible promise of improving people's lives, accelerating scientific discovery and innovation, and bringing about positive societal change. Yet, if not used responsibly, this technology can propel economic inequality, destabilize global markets and affirm systemic bias. While the potential benefits of big data are well-accepted, the importance of using these techniques in a fair and transparent manner is rarely considered. The primary goal of this tutorial is to draw the attention of the data management community to the important emerging subject of responsible data management and analysis. We will offer our perspective on the issue, will give an overview of existing technical work, primarily from the data mining and algorithms communities, and will motivate future research directions

CiteSeerX

Data, Responsibly: Fairness, Neutrality and Transparency in Data Analysis

Author: Abiteboul Serge
Miklau Gerome
Stoyanovich Julia
Publication venue: HAL CCSD
Publication date: 15/03/2016
Field of study

International audienceBig data technology holds incredible promise of improving people's lives, accelerating scientific discovery and innovation , and bringing about positive societal change. Yet, if not used responsibly, this technology can propel economic inequality , destabilize global markets and affirm systemic bias. While the potential benefits of big data are well-accepted, the importance of using these techniques in a fair and transparent manner is rarely considered. The primary goal of this tutorial is to draw the attention of the data management community to the important emerging subject of responsible data management and analysis. We will offer our perspective on the issue, will give an overview of existing technical work, primarily from the data mining and algorithms communities, and will motivate future research directions

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Provenance and Probabilities in Relational Databases: From Theory to Practice

Author: Senellart Pierre
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/2017
Field of study

International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

INRIA a CCSD electronic archive server

Data Provenance: What next?

Author: Buneman Peter
Tan Wang-Chiew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/02/2019
Field of study

Edinburgh Research Explorer

Provenance Tools

Author: Auge Tanja
Flach Rocco
Lamster Maximilian
Röhrs Chris
Scharlau Nic
Publication venue: Universität Rostock, Institut für Informatik
Publication date: 01/01/2021
Field of study

The importance of provenance has arose for all kinds of sciences over the recent years. During research on data provenance, several tools have been developed to use provenance in a practical way. We chose seven of those tools and exhaustingly tested five of them: Trio, ORCHESTRA, Perm, GProM, and ProvSQL. In this article, we first introduce the basics of data provenance, especially where-, why-, and how-provenance. After that, we present the results of our tool tests

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository

On scaling up sensitive data auditing

Author: Agrawal R.
Amsterdamer Yael
Bhagwat D.
Geerts F.
Glavic B.
Glavic B.
Glavic B.
Green Todd J.
Kaushik R.
Lampson Butler
Machanavajjhala A.
Miklau G.
Motwani R.
Sarma A. Das
Seshadri P.
Suciu D.
Weitzner D. J.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models

Author: Cook R Dennis
Krishnan Sanjay
Kumar Raunak
LeCun Yann A
Rahm Erhard
She Jennifer
Todd
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/02/2020
Field of study

The ubiquitous use of machine learning algorithms brings new challenges to traditional database problems such as incremental view update. Much effort is being put in better understanding and debugging machine learning models, as well as in identifying and repairing errors in training datasets. Our focus is on how to assist these activities when they have to retrain the machine learning model after removing problematic training samples in cleaning or selecting different subsets of training data for interpretability. This paper presents an efficient provenance-based approach, PrIU, and its optimized version, PrIU-opt, for incrementally updating model parameters without sacrificing prediction accuracy. We prove the correctness and convergence of the incrementally updated model parameters, and validate it experimentally. Experimental results show that up to two orders of magnitude speed-ups can be achieved by PrIU-opt compared to simply retraining the model from scratch, yet obtaining highly similar models.Comment: 28 Pages, published in 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD 2020

arXiv.org e-Print Archive

Crossref

Conceptual Modeling of Data with Provenance

Author: Archer David William
Publication venue: PDXScholar
Publication date: 01/01/2011
Field of study

Traditional database systems manage data, but often do not address its provenance. In the past, users were often implicitly familiar with data they used, how it was created (and hence how it might be appropriately used), and from which sources it came. Today, users may be physically and organizationally remote from the data they use, so this information may not be easily accessible to them. In recent years, several models have been proposed for recording provenance of data. Our work is motivated by opportunities to make provenance easy to manage and query. For example, current approaches model provenance as expressions that may be easily stored alongside data, but are difficult to parse and reconstruct for querying, and are difficult to query with available languages. We contribute a conceptual model for data and provenance, and evaluate how well it addresses these opportunities. We compare the expressive power of our model\u27s language to that of other models. We also define a benchmark suite with which to study performance of our model, and use this suite to study key model aspects implemented on existing software platforms. We discover some salient performance bottlenecks in these implementations, and suggest future work to explore improvements. Finally, we show that our implementations can comprise a logical model that faithfully supports our conceptual model

PDXScholar (Portland State University)

Content sensitivity based access control model for big data

Author: Thandapani Kumarasamy Ashwin Kumar
Publication venue
Publication date: 01/07/2017
Field of study

Big data technologies have seen tremendous growth in recent years. They are being widely used in both industry and academia. In spite of such exponential growth, these technologies lack adequate measures to protect the data from misuse or abuse. Corporations that collect data from multiple sources are at risk of liabilities due to exposure of sensitive information. In the current implementation of Hadoop, only file level access control is feasible. Providing users, the ability to access data based on attributes in a dataset or based on their role is complicated due to the sheer volume and multiple formats (structured, unstructured and semi-structured) of data. In this dissertation an access control framework, which enforces access control policies dynamically based on the sensitivity of the data is proposed. This framework enforces access control policies by harnessing the data context, usage patterns and information sensitivity. Information sensitivity changes over time with the addition and removal of datasets, which can lead to modifications in the access control decisions and the proposed framework accommodates these changes. The proposed framework is automated to a large extent and requires minimal user intervention. The experimental results show that the proposed framework is capable of enforcing access control policies on non-multimedia datasets with minimal overhea

SHAREOK repository