Search CORE

28 research outputs found

Advances in privacy-preserving machine learning

Author: Thakkar Om Dipakbhai
Publication venue
Publication date: 24/02/2020
Field of study

Building useful predictive models often involves learning from personal data. For instance, companies use customer data to target advertisements, online education platforms collect student data to recommend content and improve user engagement, and medical researchers fit diagnostic models to patient data. A recent line of research aims to design learning algorithms that provide rigorous privacy guarantees for user data, in the sense that their outputs---models or predictions---leak as little information as possible about individuals in the training data. The goal of this dissertation is to design private learning algorithms with performance comparable to the best possible non-private ones. We quantify privacy using \emph{differential privacy}, a well-studied privacy notion that limits how much information is leaked about an individual by the output of an algorithm. Training a model using a differentially private algorithm prevents an adversary from confidently determining whether a specific person's data was used for training the model. We begin by presenting a technique for practical differentially private convex optimization that can leverage any off-the-shelf optimizer as a black box. We also perform an extensive empirical evaluation of the state-of-the-art algorithms on a range of publicly available datasets, as well as in an industry application. Next, we present a learning algorithm that outputs a private classifier when given black-box access to a non-private learner and a limited amount of unlabeled public data. We prove that the accuracy guarantee of our private algorithm in the PAC model of learning is comparable to that of the underlying non-private learner. Such a guarantee is not possible, in general, without public data. Lastly, we consider building recommendation systems, which we model using matrix completion. We present the first algorithm for matrix completion with provable user-level privacy and accuracy guarantees. Our algorithm consistently outperforms the state-of-the-art private algorithms on a suite of datasets. Along the way, we give an optimal algorithm for differentially private singular vector computation which leads to significant savings in terms of space and time when operating on sparse matrices. It can also be used for private low-rank approximation

Boston University Institutional Repository (OpenBU)

Unintended Memorization in Large ASR Models, and How to Mitigate It

Author: Mathews Rajiv
Thakkar Om
Wang Lun
Publication venue
Publication date: 18/10/2023
Field of study

It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in large ASR models without the extra compute overhead. Concretely, we speed up randomly-generated utterances to create a mapping between vocal and text information that is difficult to learn from typical training examples. Hence, accurate predictions only for sped-up training examples can serve as clear evidence for memorization, and the corresponding accuracy can be used to measure memorization. Using the proposed method, we showcase memorization in the state-of-the-art ASR models. To mitigate memorization, we tried gradient clipping during training to bound the influence of any individual example on the final model. We empirically show that clipping each example's gradient can mitigate memorization for sped-up training examples with up to 16 repetitions in the training set. Furthermore, we show that in large-scale distributed training, clipping the average gradient on each compute core maintains neutral model quality and compute cost while providing strong privacy protection

arXiv.org e-Print Archive

Performance Analysis of Selection Diversity Combining Using Improved Energy Detection over Rayleigh Fading Channel

Author: IEEE
Lopez-Benitez Miguel
Patel Dhaval K
Raval Shivam
Thakkar Om
Publication venue
Publication date: 01/01/2018
Field of study

University of Liverpool Repository

Guaranteed validity for empirical approaches to adaptive data analysis

Author: Rogers Ryan
Roth Aaron
Smith Adam
Srebro Nathan
Thakkar Om
Woodworth Blake
Publication venue
Publication date: 09/03/2020
Field of study

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings — often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.http://proceedings.mlr.press/v108/rogers20a.htm

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Revealing and protecting labels in distributed training

Author: Beaufays Françoise
Chin Sang Peter
Dang Trung
Mathews Rajiv
Ramaswamy Swaroop
Thakkar Om
Publication venue
Publication date: 31/10/2021
Field of study

Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (e.g., ResNet), or they can be reconstructed jointly with model inputs by using Gradients Matching [1] with additional knowledge about the current state of the model. In this work, we propose a method to discover the set of labels of training samples from only the gradient of the last layer and the id to label mapping. Our method is applicable to a wide variety of model architectures across multiple domains. We demonstrate the effectiveness of our method for model training in two domains - image classification, and automatic speech recognition. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method. Conversely, we demonstrate that gradient quantization and sparsification can significantly reduce the success of the attack.Published versio

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Sex differences in the Simon task help to interpret sex differences in selective attention.

Author: A Christakou
AP Bayliss
B Hommel
B Pletzer
B Pletzer
BA Eriksen
CP Brötzner
CP Brötzner
D Kimura
D Navon
D Navon
D Roalf
DF Halpern
DJ Plude
DM Buss
E Chajut
E Hampson
EC Dalrymple-Alford
G Stoet
G Stoet
Gijsbert Stoet
J Biederman
J Driver
J Judge
J Lee
JR Simon
JR Simon
K Evans
K Lee
KN Thakkar
LS Colzato
LS Colzato
M Eimer
M Gaub
M Hausmann
MI Posner
MJC Crump
N Alwall
OM Razumnikova
Open Science Collaboration
P Merritt
P Rabbitt
PE Clayson
R Hübner
R Kimchi
RA Abrams
RL Majeres
RM Klein
S Baron-Cohen
S Kastner
S Kornblum
S Rubichi
S Trent
SJC Gaulin
SP Tipper
T Hatta
W Elst Van der
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In the last decade, a number of studies have reported sex differences in selective attention, but a unified explanation for these effects is still missing. This study aims to better understand these differences and put them in an evolutionary psychological context. 418 adult participants performed a computer-based Simon task, in which they responded to the direction of a left or right pointing arrow appearing left or right from a fixation point. Women were more strongly influenced by task-irrelevant spatial information than men (i.e., the Simon effect was larger in women, Cohen's d = 0.39). Further, the analysis of sex differences in behavioral adjustment to errors revealed that women slow down more than men following mistakes (d = 0.53). Based on the combined results of previous studies and the current data, it is proposed that sex differences in selective attention are caused by underlying sex differences in core abilities, such as spatial or verbal cognition

University of Essex Research Repository

Crossref

Springer - Publisher Connector

Enlighten

Leeds Beckett Repository