403 research outputs found
Netflix and Forget: Efficient and Exact Machine Unlearning from Bi-linear Recommendations
People break up, miscarry, and lose loved ones. Their online streaming and
shopping recommendations, however, do not necessarily update, and may serve as
unhappy reminders of their loss. When users want to renege on their past
actions, they expect the recommender platforms to erase selective data at the
model level. Ideally, given any specified user history, the recommender can
unwind or "forget", as if the record was not part of training. To that end,
this paper focuses on simple but widely deployed bi-linear models for
recommendations based on matrix completion. Without incurring the cost of
re-training, and without degrading the model unnecessarily, we develop
Unlearn-ALS by making a few key modifications to the fine-tuning procedure
under Alternating Least Squares optimisation, thus applicable to any bi-linear
models regardless of the training procedure. We show that Unlearn-ALS is
consistent with retraining without \emph{any} model degradation and exhibits
rapid convergence, making it suitable for a large class of existing
recommenders.Comment: 8 pages, 8 figure
Machine Unlearning: A Survey
Machine learning has attracted widespread attention and evolved into an
enabling technology for a wide range of highly successful applications, such as
intelligent computer vision, speech recognition, medical diagnosis, and more.
Yet a special need has arisen where, due to privacy, usability, and/or the
right to be forgotten, information about some specific samples needs to be
removed from a model, called machine unlearning. This emerging technology has
drawn significant interest from both academics and industry due to its
innovation and practicality. At the same time, this ambitious problem has led
to numerous research efforts aimed at confronting its challenges. To the best
of our knowledge, no study has analyzed this complex topic or compared the
feasibility of existing unlearning solutions in different kinds of scenarios.
Accordingly, with this survey, we aim to capture the key concepts of unlearning
techniques. The existing solutions are classified and summarized based on their
characteristics within an up-to-date and comprehensive review of each
category's advantages and limitations. The survey concludes by highlighting
some of the outstanding issues with unlearning techniques, along with some
feasible directions for new research opportunities
Ticketed Learning-Unlearning Schemes
We consider the learning--unlearning paradigm defined as follows. First given
a dataset, the goal is to learn a good predictor, such as one minimizing a
certain loss. Subsequently, given any subset of examples that wish to be
unlearnt, the goal is to learn, without the knowledge of the original training
dataset, a good predictor that is identical to the predictor that would have
been produced when learning from scratch on the surviving examples.
We propose a new ticketed model for learning--unlearning wherein the learning
algorithm can send back additional information in the form of a small-sized
(encrypted) ``ticket'' to each participating training example, in addition to
retaining a small amount of ``central'' information for later. Subsequently,
the examples that wish to be unlearnt present their tickets to the unlearning
algorithm, which additionally uses the central information to return a new
predictor. We provide space-efficient ticketed learning--unlearning schemes for
a broad family of concept classes, including thresholds, parities,
intersection-closed classes, among others.
En route, we introduce the count-to-zero problem, where during unlearning,
the goal is to simply know if there are any examples that survived. We give a
ticketed learning--unlearning scheme for this problem that relies on the
construction of Sperner families with certain properties, which might be of
independent interest.Comment: Conference on Learning Theory (COLT) 202
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency (i.e., they should effectively “unlearn” deleted data, but in a way that does not require excessive computational effort (e.g., a full retraining) for a small amount of deletions). Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of “the right to be forgotten” have given rise to requirements for certifiability (i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model). In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for logistic regression and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing this study, we extend some of the existing works and describe a common unlearning pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retraining of the ML model
An Introduction to Machine Unlearning
Removing the influence of a specified subset of training data from a machine
learning model may be required to address issues such as privacy, fairness, and
data quality. Retraining the model from scratch on the remaining data after
removal of the subset is an effective but often infeasible option, due to its
computational expense. The past few years have therefore seen several novel
approaches towards efficient removal, forming the field of "machine
unlearning", however, many aspects of the literature published thus far are
disparate and lack consensus. In this paper, we summarise and compare seven
state-of-the-art machine unlearning algorithms, consolidate definitions of core
concepts used in the field, reconcile different approaches for evaluating
algorithms, and discuss issues related to applying machine unlearning in
practice
- …