14,764 research outputs found
PUBA: Privacy-Preserving User-Data Bookkeeping and Analytics
In this paper we propose Privacy-preserving User-data Bookkeeping & Analytics (PUBA), a building block destined to enable the implementation of business models (e.g., targeted advertising) and regulations (e.g., fraud detection) requiring user-data analysis in a privacy-preserving way. In PUBA, users keep an unlinkable but authenticated cryptographic logbook containing their historic data on their device. This logbook can only be updated by the operator while its content is not revealed. Users can take part in a privacy-preserving analytics computation, where it is ensured that their logbook is up-to-date and authentic while the potentially secret analytics function is verified to be privacy-friendly. Taking constrained devices into account, users may also outsource analytic computations (to a potentially malicious proxy not colluding with the operator).We model our novel building block in the Universal Composability framework and provide a practical protocol instantiation. To demonstrate the flexibility of PUBA, we sketch instantiations of privacy-preserving fraud detection and targeted advertising, although it could be used in many more scenarios, e.g. data analytics for multi-modal transportation systems. We implemented our bookkeeping protocols and an exemplary outsourced analytics computation based on logistic regression using the MP-SPDZ MPC framework. Performance evaluations using a smartphone as user device and more powerful hardware for operator and proxy suggest that PUBA for smaller logbooks can indeed be practical
LDP-IDS: Local Differential Privacy for Infinite Data Streams
Streaming data collection is essential to real-time data analytics in various
IoTs and mobile device-based systems, which, however, may expose end users'
privacy. Local differential privacy (LDP) is a promising solution to
privacy-preserving data collection and analysis. However, existing few LDP
studies over streams are either applicable to finite streams only or suffering
from insufficient protection. This paper investigates this problem by proposing
LDP-IDS, a novel -event LDP paradigm to provide practical privacy guarantee
for infinite streams at users end, and adapting the popular budget division
framework in centralized differential privacy (CDP). By constructing a unified
error analysi for LDP, we first develop two adatpive budget division-based LDP
methods for LDP-IDS that can enhance data utility via leveraging the
non-deterministic sparsity in streams. Beyond that, we further propose a novel
population division framework that can not only avoid the high sensitivity of
LDP noise to budget division but also require significantly less communication.
Based on the framework, we also present two adaptive population division
methods for LDP-IDS with theoretical analysis. We conduct extensive experiments
on synthetic and real-world datasets to evaluate the effectiveness and
efficiency pf our proposed frameworks and methods. Experimental results
demonstrate that, despite the effectiveness of the adaptive budget division
methods, the proposed population division framework and methods can further
achieve much higher effectiveness and efficiency.Comment: accepted to SIGMOD'2
PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
Preserving privacy of users is a key requirement of web-scale analytics and
reporting applications, and has witnessed a renewed focus in light of recent
data breaches and new regulations such as GDPR. We focus on the problem of
computing robust, reliable analytics in a privacy-preserving manner, while
satisfying product requirements. We present PriPeARL, a framework for
privacy-preserving analytics and reporting, inspired by differential privacy.
We describe the overall design and architecture, and the key modeling
components, focusing on the unique challenges associated with privacy,
coverage, utility, and consistency. We perform an experimental study in the
context of ads analytics and reporting at LinkedIn, thereby demonstrating the
tradeoffs between privacy and utility needs, and the applicability of
privacy-preserving mechanisms to real-world data. We also highlight the lessons
learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information
and Knowledge Management (CIKM 2018
Supporting Regularized Logistic Regression Privately and Efficiently
As one of the most popular statistical and machine learning models, logistic
regression with regularization has found wide adoption in biomedicine, social
sciences, information technology, and so on. These domains often involve data
of human subjects that are contingent upon strict privacy regulations.
Increasing concerns over data privacy make it more and more difficult to
coordinate and conduct large-scale collaborative studies, which typically rely
on cross-institution data sharing and joint analysis. Our work here focuses on
safeguarding regularized logistic regression, a widely-used machine learning
model in various disciplines while at the same time has not been investigated
from a data security and privacy perspective. We consider a common use scenario
of multi-institution collaborative studies, such as in the form of research
consortia or networks as widely seen in genetics, epidemiology, social
sciences, etc. To make our privacy-enhancing solution practical, we demonstrate
a non-conventional and computationally efficient method leveraging distributing
computing and strong cryptography to provide comprehensive protection over
individual-level and summary data. Extensive empirical evaluation on several
studies validated the privacy guarantees, efficiency and scalability of our
proposal. We also discuss the practical implications of our solution for
large-scale studies and applications from various disciplines, including
genetic and biomedical studies, smart grid, network analysis, etc
Scather: programming with multi-party computation and MapReduce
We present a prototype of a distributed computational infrastructure, an associated high level programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-preserving properties. Our architecture allows a data analyst unfamiliar with MPC to: (1) author an analysis algorithm that is agnostic with regard to data privacy policies, (2) to use an automated process to derive algorithm implementation variants that have different privacy and performance properties, and (3) to compile those implementation variants so that they can be deployed on an infrastructures that allows computations to take place locally within each participant’s MapReduce cluster as well as across all the participants’ clusters using an MPC protocol. We describe implementation details of the architecture, discuss and demonstrate how the formal framework enables the exploration of tradeoffs between the efficiency and privacy properties of an analysis algorithm, and present two example applications that illustrate how such an infrastructure can be utilized in practice.This work was supported in part by NSF Grants: #1430145, #1414119, #1347522, and #1012798
Big Data Privacy Context: Literature Effects On Secure Informational Assets
This article's objective is the identification of research opportunities in
the current big data privacy domain, evaluating literature effects on secure
informational assets. Until now, no study has analyzed such relation. Its
results can foster science, technologies and businesses. To achieve these
objectives, a big data privacy Systematic Literature Review (SLR) is performed
on the main scientific peer reviewed journals in Scopus database. Bibliometrics
and text mining analysis complement the SLR. This study provides support to big
data privacy researchers on: most and least researched themes, research
novelty, most cited works and authors, themes evolution through time and many
others. In addition, TOPSIS and VIKOR ranks were developed to evaluate
literature effects versus informational assets indicators. Secure Internet
Servers (SIS) was chosen as decision criteria. Results show that big data
privacy literature is strongly focused on computational aspects. However,
individuals, societies, organizations and governments face a technological
change that has just started to be investigated, with growing concerns on law
and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions
and the only consistent country between literature and SIS adoption is the
United States. Countries in the lowest ranking positions represent future
research opportunities.Comment: 21 pages, 9 figure
Secure multi-party computation for analytics deployed as a lightweight web application
We describe the definition, design, implementation, and deployment of a secure multi-party computation protocol and web application. The protocol and application allow groups of cooperating parties with minimal expertise and no specialized resources to compute basic statistical analytics on their collective data sets without revealing the contributions of individual participants. The application was developed specifically to support a Boston Women’s Workforce Council (BWWC) study of wage disparities within employer organizations in the Greater Boston Area. The application has been deployed successfully to support two data collection sessions (in 2015 and in 2016) to obtain data pertaining to compensation levels across genders and demographics. Our experience provides insights into the particular security and usability requirements (and tradeoffs) a successful “MPC-as-a-service” platform design and implementation must negotiate.We would like to acknowledge all the members of the Boston Women’s Workforce Council, and to thank in particular MaryRose Mazzola, Christina M. Knowles, and Katie A. Johnston, who led the efforts to organize participants and deploy the protocol as part of the 100% Talent: The Boston Women’s Compact [31], [32] data collections. We also thank the Boston University Initiative on Cities (IOC), and in particular Executive Director Katherine Lusk, who brought this potential application of secure multi-party computation to our attention. The BWWC, the IOC, and several sponsors contributed funding to complete this work. Support was also provided in part by Smart-city Cloud-based Open Platform and Ecosystem (SCOPE), an NSF Division of Industrial Innovation and Partnerships PFI:BIC project under award #1430145, and by Modular Approach to Cloud Security (MACS), an NSF CISE CNS SaTC Frontier project under award #1414119
- …