As machine learning has been deployed ubiquitously across applications in
modern data science, algorithmic fairness has become a great concern and
varieties of fairness criteria have been proposed. Among them, imposing
fairness constraints during learning, i.e. in-processing fair training, has
been a popular type of training method because they don't require accessing
sensitive attributes during test time in contrast to post-processing methods.
Although imposing fairness constraints have been studied extensively for
classical machine learning models, the effect these techniques have on deep
neural networks is still unclear. Recent research has shown that adding
fairness constraints to the objective function leads to severe over-fitting to
fairness criteria in large models, and how to solve this challenge is an
important open question. To address this challenge, we leverage the wisdom and
power of pre-training and fine-tuning and develop a simple but novel framework
to train fair neural networks in an efficient and inexpensive way. We conduct
comprehensive experiments on two popular image datasets with state-of-art
architectures under different fairness notions to show that last-layer
fine-tuning is sufficient for promoting fairness of the deep neural network.
Our framework brings new insights into representation learning in training fair
neural networks