Training machine learning models with differential privacy (DP) has received
increasing interest in recent years. One of the most popular algorithms for
training differentially private models is differentially private stochastic
gradient descent (DPSGD) and its variants, where at each step gradients are
clipped and combined with some noise. Given the increasing usage of DPSGD, we
ask the question: is DPSGD alone sufficient to find a good minimizer for every
dataset under privacy constraints? As a first step towards answering this
question, we show that even for the simple case of linear classification,
unlike non-private optimization, (private) feature preprocessing is vital for
differentially private optimization. In detail, we first show theoretically
that there exists an example where without feature preprocessing, DPSGD incurs
a privacy error proportional to the maximum norm of features over all samples.
We then propose an algorithm called DPSGD-F, which combines DPSGD with feature
preprocessing and prove that for classification tasks, it incurs a privacy
error proportional to the diameter of the features maxx,xβ²βDββ₯xβxβ²β₯2β. We then demonstrate the practicality of our algorithm on image
classification benchmarks