Relabeling Minimal Training Subset to Flip a Prediction

Abstract

When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: can we result in the flipping of a test prediction xtx_t by relabeling the smallest subset St\mathcal{S}_t of the training data before the model is trained? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 1% of the training points can often flip the model's prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by recovering influential training subsets; (2) evaluating model robustness with the cardinality of the subset (i.e., ∣St∣|\mathcal{S}_t|); we show that ∣St∣|\mathcal{S}_t| is highly related to the noise ratio in the training set and ∣St∣|\mathcal{S}_t| is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.Comment: Under revie

    Similar works

    Full text

    thumbnail-image

    Available Versions