19 research outputs found
Tempered Sigmoid Activations for Deep Learning with Differential Privacy
Because learning sometimes involves sensitive data, machine learning
algorithms have been extended to offer privacy for training data. In practice,
this has been mostly an afterthought, with privacy-preserving models obtained
by re-running training with a different optimizer, but using the model
architectures that already performed well in a non-privacy-preserving setting.
This approach leads to less than ideal privacy/utility tradeoffs, as we show
here. Instead, we propose that model architectures are chosen ab initio
explicitly for privacy-preserving training.
To provide guarantees under the gold standard of differential privacy, one
must bound as strictly as possible how individual training points can possibly
affect model updates. In this paper, we are the first to observe that the
choice of activation function is central to bounding the sensitivity of
privacy-preserving deep learning. We demonstrate analytically and
experimentally how a general family of bounded activation functions, the
tempered sigmoids, consistently outperform unbounded activation functions like
ReLU. Using this paradigm, we achieve new state-of-the-art accuracy on MNIST,
FashionMNIST, and CIFAR10 without any modification of the learning procedure
fundamentals or differential privacy analysis
Robust PAC: Training Ensemble Models Under Model Misspecification and Outliers
Standard Bayesian learning is known to have suboptimal generalization
capabilities under model misspecification and in the presence of outliers.
PAC-Bayes theory demonstrates that the free energy criterion minimized by
Bayesian learning is a bound on the generalization error for Gibbs predictors
(i.e., for single models drawn at random from the posterior) under the
assumption of sampling distributions uncontaminated by outliers. This viewpoint
provides a justification for the limitations of Bayesian learning when the
model is misspecified, requiring ensembling, and when data is affected by
outliers. In recent work, PAC-Bayes bounds - referred to as PAC - were
derived to introduce free energy metrics that account for the performance of
ensemble predictors, obtaining enhanced performance under misspecification.
This work presents a novel robust free energy criterion that combines the
generalized logarithm score function with PAC ensemble bounds. The proposed
free energy training criterion produces predictive distributions that are able
to concurrently counteract the detrimental effects of model misspecification
and outliers.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
System