Removing information from a machine learning model is a non-trivial task that
requires to partially revert the training process. This task is unavoidable
when sensitive data, such as credit card numbers or passwords, accidentally
enter the model and need to be removed afterwards. Recently, different concepts
for machine unlearning have been proposed to address this problem. While these
approaches are effective in removing individual data points, they do not scale
to scenarios where larger groups of features and labels need to be reverted. In
this paper, we propose the first method for unlearning features and labels. Our
approach builds on the concept of influence functions and realizes unlearning
through closed-form updates of model parameters. It enables to adapt the
influence of training data on a learning model retrospectively, thereby
correcting data leaks and privacy issues. For learning models with strongly
convex loss functions, our method provides certified unlearning with
theoretical guarantees. For models with non-convex losses, we empirically show
that unlearning features and labels is effective and significantly faster than
other strategies.Comment: Network and Distributed System Security Symposium (NDSS) 202