2,827 research outputs found
Towards Efficient Data Valuation Based on the Shapley Value
"How much is my data worth?" is an increasingly common question posed by
organizations and individuals alike. An answer to this question could allow,
for instance, fairly distributing profits among multiple data contributors and
determining prospective compensation when data breaches happen. In this paper,
we study the problem of data valuation by utilizing the Shapley value, a
popular notion of value which originated in coopoerative game theory. The
Shapley value defines a unique payoff scheme that satisfies many desiderata for
the notion of data value. However, the Shapley value often requires exponential
time to compute. To meet this challenge, we propose a repertoire of efficient
algorithms for approximating the Shapley value. We also demonstrate the value
of each training instance for various benchmark datasets
Axiomatic Characterization of Data-Driven Influence Measures for Classification
We study the following problem: given a labeled dataset and a specific
datapoint x, how did the i-th feature influence the classification for x? We
identify a family of numerical influence measures - functions that, given a
datapoint x, assign a numeric value phi_i(x) to every feature i, corresponding
to how altering i's value would influence the outcome for x. This family, which
we term monotone influence measures (MIM), is uniquely derived from a set of
desirable properties, or axioms. The MIM family constitutes a provably sound
methodology for measuring feature influence in classification domains; the
values generated by MIM are based on the dataset alone, and do not make any
queries to the classifier. While this requirement naturally limits the scope of
our framework, we demonstrate its effectiveness on data
- …