Search CORE

7 research outputs found

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Author: Nguyen Lam M.
Nguyen Nhuong V.
Nguyen Phuong Ha
Nguyen Toan N.
Tran-Dinh Quoc
van Dijk Marten
Publication venue
Publication date: 26/02/2021
Field of study

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show

O(\sqrt{K}

) communication rounds for heterogeneous data for strongly convex problems, where

K

is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.Comment: arXiv admin note: substantial text overlap with arXiv:2007.09208 AISTATS 202

arXiv.org e-Print Archive

CWI's Institutional Repository

Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training Rounds

Author: Dijk M.E. (Marten) van
Nguyen L. M. (Lam)
Nguyen N. V. (Nhuong)
Nguyen P.H. (Phuong Ha)
Nguyen T. N. (Toan)
Publication venue
Publication date: 01/02/2022
Field of study

CWI's Institutional Repository

On the Tightness of the Moment Accountant for DP-SGD

Author: Dijk M.E. (Marten) van
Nguyen L. M. (Lam)
Nguyen N. V. (Nhuong)
Nguyen P.H. (Phuong Ha)
Nguyen T. N. (Toan)
Publication venue
Publication date: 30/05/2023
Field of study

In order to provide differential privacy, Gaussian noise with standard deviation σ is added to local SGD updates after performing a clipping operation in Differential Private SGD (DP-SGD). By non-trivially improving the moment account method we prove a closed form (ϵ, δ)-DP guarantee: DP-SGD is (ϵ ≤ 1/2, δ = 1/N )-DP if σ = p2(ϵ + ln(1/δ))/ϵ with T at least ≈ 2k2/ϵ and (2/e)2k2 − 1/2 ≥ ln(N ), where T is the total number of rounds, and K = kN is the total number of gradient computations where k measures K in number of epochs of size N of the local data set. We prove that our expression is close to tight in that if T is more than a constant factor ≈ 8 smaller than the lower bound ≈ 2k2/ϵ, then the (ϵ, δ)-DP guarantee is violated. Choosing the smallest possible value T ≈ 2k2/ϵ not only leads to a close to tight DP guarantee, but also minimizes the total number of communicated updates and this means that the least amount of noise is aggregated into the global model and in addition accuracy is optimized as confirmed by simulations

CWI's Institutional Repository

Hogwild! over distributed local data sets with linearly increasing mini-batch sizes

Author: Dijk M.E. (Marten) van
Nguyen L. M. (Lam)
Nguyen N. V. (Nhuong)
Nguyen P.H. (Phuong Ha)
Nguyen T. N. (Toan)
Tran-Dinh Q. (Quoc)
Publication venue
Publication date: 27/02/2021
Field of study

CWI's Institutional Repository

Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training Rounds

Author: Nguyen Lam M.
Nguyen Nhuong V.
Nguyen Phuong Ha
Nguyen Toan N.
van Dijk Marten
Publication venue
Publication date: 31/01/2022
Field of study

In DP-SGD each round communicates a local SGD update which leaks some new information about the underlying local data set to the outside world. In order to provide privacy, Gaussian noise with standard deviation

\sigma

is added to local SGD updates after performing a clipping operation. We show that for attaining

(\epsilon,\delta)

-differential privacy

\sigma

can be chosen equal to

\sqrt{2(\epsilon +\ln(1/\delta))/\epsilon}

for

\epsilon=\Omega(T/N^2)

, where

T

is the total number of rounds and

N

is equal to the size of the local data set. In many existing machine learning problems,

N

is always large and

T=O(N)

. Hence,

\sigma

becomes "independent" of any

T=O(N)

choice with

\epsilon=\Omega(1/N)

. This means that our

\sigma

only depends on

N

rather than

T

. As shown in our paper, this differential privacy characterization allows one to {\it a-priori} select parameters of DP-SGD based on a fixed privacy budget (in terms of

\epsilon

and

\delta

) in such a way to optimize the anticipated utility (test accuracy) the most. This ability of planning ahead together with

\sigma

's independence of

T

(which allows local gradient computations to be split among as many rounds as needed, even for large

T

as usually happens in practice) leads to a {\it proactive DP-SGD algorithm} that allows a client to balance its privacy budget with the accuracy of the learned global model based on local test data. We notice that the current state-of-the art differential privacy accountant method based on

f

-DP has a closed form for computing the privacy loss for DP-SGD. However, due to its interpretation complexity, it cannot be used in a simple way to plan ahead. Instead, accountant methods are only used for keeping track of how privacy budget has been spent (after the fact).Comment: arXiv admin note: text overlap with arXiv:2007.0920

arXiv.org e-Print Archive

Global profiling of protein–DNA and protein–nucleosome binding affinities using quantitative mass spectrometry

Author: Benjamin M. Foster
Cathrin Gräwe
Matthew M. Makowski
Michiel Vermeulen
Nhuong V. Nguyen
Till Bartke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Quantitative mass spectrometry enables the proteome-wide assessment of biomolecular binding affinities. While previous approaches mainly focused on protein–small molecule interactions, the authors here present a method to probe protein–DNA and protein–nucleosome binding affinities at proteome scale

Crossref

Directory of Open Access Journals

PuSH

The University of Manchester - Institutional Repository