1,312 research outputs found
Towards Efficient Data Valuation Based on the Shapley Value
"How much is my data worth?" is an increasingly common question posed by
organizations and individuals alike. An answer to this question could allow,
for instance, fairly distributing profits among multiple data contributors and
determining prospective compensation when data breaches happen. In this paper,
we study the problem of data valuation by utilizing the Shapley value, a
popular notion of value which originated in coopoerative game theory. The
Shapley value defines a unique payoff scheme that satisfies many desiderata for
the notion of data value. However, the Shapley value often requires exponential
time to compute. To meet this challenge, we propose a repertoire of efficient
algorithms for approximating the Shapley value. We also demonstrate the value
of each training instance for various benchmark datasets
DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation
Many machine learning problems require performing dataset valuation, i.e. to
quantify the incremental gain, to some relevant pre-defined utility, of
aggregating an individual dataset to others. As seminal examples, dataset
valuation has been leveraged in collaborative and federated learning to create
incentives for data sharing across several data owners. The Shapley value has
recently been proposed as a principled tool to achieve this goal due to formal
axiomatic justification. Since its computation often requires exponential time,
standard approximation strategies based on Monte Carlo integration have been
considered. Such generic approximation methods, however, remain expensive in
some cases. In this paper, we exploit the knowledge about the structure of the
dataset valuation problem to devise more efficient Shapley value estimators. We
propose a novel approximation of the Shapley value, referred to as discrete
uniform Shapley (DU-Shapley) which is expressed as an expectation under a
discrete uniform distribution with support of reasonable size. We justify the
relevancy of the proposed framework via asymptotic and non-asymptotic
theoretical guarantees and show that DU-Shapley tends towards the Shapley value
when the number of data owners is large. The benefits of the proposed framework
are finally illustrated on several dataset valuation benchmarks. DU-Shapley
outperforms other Shapley value approximations, even when the number of data
owners is small.Comment: 22 page
Optimal Transfers and Participation Decisions in International Environmental Agreements
The literature on international environmental agreements has recognized the role transfers play in encouraging participation in international environmental agreements (IEAs), but the few results achieved so far are overly specific and do not exploit the full potential of transfers for successful treaty-making. Therefore, in this paper, we develop a framework that enables us to study the role of transfers in a more systematic way. We propose a design for transfers using both internal and external financial resources and making “welfare optimal agreements” self-enforcing. To illustrate the relevance of our transfer scheme for actual treaty-making, we use a well-known integrated assessment model of climate change to show how appropriate transfers may be able to induce almost all countries into signing a self-enforcing climate treaty.Self-enforcing international environmental agreements, Climate policy, Transfers
2D-Shapley: A Framework for Fragmented Data Valuation
Data valuation -- quantifying the contribution of individual data sources to
certain predictive behaviors of a model -- is of great importance to enhancing
the transparency of machine learning and designing incentive systems for data
sharing. Existing work has focused on evaluating data sources with the shared
feature or sample space. How to valuate fragmented data sources of which each
only contains partial features and samples remains an open question. We start
by presenting a method to calculate the counterfactual of removing a fragment
from the aggregated data matrix. Based on the counterfactual calculation, we
further propose 2D-Shapley, a theoretical framework for fragmented data
valuation that uniquely satisfies some appealing axioms in the fragmented data
context. 2D-Shapley empowers a range of new use cases, such as selecting useful
data fragments, providing interpretation for sample-wise data values, and
fine-grained data issue diagnosis.Comment: ICML 202
Improving Fairness for Data Valuation in Horizontal Federated Learning
Federated learning is an emerging decentralized machine learning scheme that
allows multiple data owners to work collaboratively while ensuring data
privacy. The success of federated learning depends largely on the participation
of data owners. To sustain and encourage data owners' participation, it is
crucial to fairly evaluate the quality of the data provided by the data owners
and reward them correspondingly. Federated Shapley value, recently proposed by
Wang et al. [Federated Learning, 2020], is a measure for data value under the
framework of federated learning that satisfies many desired properties for data
valuation. However, there are still factors of potential unfairness in the
design of federated Shapley value because two data owners with the same local
data may not receive the same evaluation. We propose a new measure called
completed federated Shapley value to improve the fairness of federated Shapley
value. The design depends on completing a matrix consisting of all the possible
contributions by different subsets of the data owners. It is shown under mild
conditions that this matrix is approximately low-rank by leveraging concepts
and tools from optimization. Both theoretical analysis and empirical evaluation
verify that the proposed measure does improve fairness in many circumstances
- …