Search CORE

16 research outputs found

Security and Privacy for Big Data: A Systematic Literature Review

Author: Nelson Boel
Olovsson Tomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

Chalmers Research

SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication

Author: Nelson Boel
Reuben Jenni
Publication venue
Publication date: 01/01/2020
Field of study

Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy

Chalmers Research

Private Graphon Estimation for Sparse Graphs

Author: Borgs Christian
Chayes Jennifer T.
Smith Adam
Publication venue
Publication date: 01/01/2015
Field of study

We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph

G

, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If

G

is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon

W

, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to

W

in an appropriate version of the

L_2

norm. In particular, this means we can estimate the sizes of all multi-way cuts in

G

. Our results hold as long as

W

is bounded, the average degree of

G

grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.Comment: 36 page

arXiv.org e-Print Archive

CiteSeerX

Revealing Network Structure, Confidentially: Improved Rates for Node-Private Graphon Estimation

Author: Borgs Christian
Chayes Jennifer
Smith Adam
Zadik Ilias
Publication venue
Publication date: 01/01/2018
Field of study

Motivated by growing concerns over ensuring privacy on social networks, we develop new algorithms and impossibility results for fitting complex statistical models to network data subject to rigorous privacy guarantees. We consider the so-called node-differentially private algorithms, which compute information about a graph or network while provably revealing almost no information about the presence or absence of a particular node in the graph. We provide new algorithms for node-differentially private estimation for a popular and expressive family of network models: stochastic block models and their generalization, graphons. Our algorithms improve on prior work, reducing their error quadratically and matching, in many regimes, the optimal nonprivate algorithm. We also show that for the simplest random graph models (

G(n,p)

and

G(n,m)

), node-private algorithms can be qualitatively more accurate than for more complex models---converging at a rate of

\frac{1}{\epsilon^2 n^{3}}

instead of

\frac{1}{\epsilon^2 n^2}

. This result uses a new extension lemma for differentially private algorithms that we hope will be broadly useful

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Security and Privacy for Big Data: A Systematic Literature Review

Author: Boel Nelson
Tomas Olovsson
Publication venue
Publication date: 23/04/2020
Field of study

Abstract-Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to provide meaningful, proven, privacy guarantees if the data can be linked to individuals. To the best of our knowledge, there exists no systematic overview of the overlap between big data and the area of security and privacy. Consequently, this review aims to explore security and privacy research within big data, by outlining and providing structure to what research currently exists. Moreover, we investigate which papers connect security and privacy with big data, and which categories these papers cover. Ultimately, is security and privacy research for big data different from the rest of the research within the security and privacy domain? To answer these questions, we perform a systematic literature review (SLR), where we collect recent papers from top conferences, and categorize them in order to provide an overview of the security and privacy topics present within the context of big data. Within each category we also present a qualitative analysis of papers representative for that specific area. Furthermore, we explore and visualize the relationship between the categories. Thus, the objective of this review is to provide a snapshot of the current state of security and privacy research for big data, and to discover where further research is required

CiteSeerX

Differential Privacy - A Balancing Act

Author: Nelson Boel
Publication venue
Publication date: 01/01/2021
Field of study

Data privacy is an ever important aspect of data analyses. Historically, a plethora of privacy techniques have been introduced to protect data, but few have stood the test of time. From investigating the overlap between big data research, and security and privacy research, I have found that differential privacy presents itself as a promising defender of data privacy.Differential privacy is a rigorous, mathematical notion of privacy. Nevertheless, privacy comes at a cost. In order to achieve differential privacy, we need to introduce some form of inaccuracy (i.e. error) to our analyses. Hence, practitioners need to engage in a balancing act between accuracy and privacy when adopting differential privacy. As a consequence, understanding this accuracy/privacy trade-off is vital to being able to use differential privacy in real data analyses.In this thesis, I aim to bridge the gap between differential privacy in theory, and differential privacy in practice. Most notably, I aim to convey a better understanding of the accuracy/privacy trade-off, by 1) implementing tools to tweak accuracy/privacy in a real use case, 2) presenting a methodology for empirically predicting error, and 3) systematizing and analyzing known accuracy improvement techniques for differentially private algorithms. Additionally, I also put differential privacy into context by investigating how it can be applied in the automotive domain. Using the automotive domain as an example, I introduce the main challenges that constitutes the balancing act, and provide advice for moving forward

Chalmers Research