When facing an unsatisfactory prediction from a machine learning model, it is
crucial to investigate the underlying reasons and explore the potential for
reversing the outcome. We ask: can we result in the flipping of a test
prediction xtβ by relabeling the smallest subset Stβ of the
training data before the model is trained? We propose an efficient procedure to
identify and relabel such a subset via an extended influence function. We find
that relabeling fewer than 1% of the training points can often flip the model's
prediction. This mechanism can serve multiple purposes: (1) providing an
approach to challenge a model prediction by recovering influential training
subsets; (2) evaluating model robustness with the cardinality of the subset
(i.e., β£Stββ£); we show that β£Stββ£ is highly related to
the noise ratio in the training set and β£Stββ£ is correlated with
but complementary to predicted probabilities; (3) revealing training points
lead to group attribution bias. To the best of our knowledge, we are the first
to investigate identifying and relabeling the minimal training subset required
to flip a given prediction.Comment: Under revie