18 research outputs found
The Design of Arbitrage-Free Data Pricing Schemes
Motivated by a growing market that involves buying and selling data over the
web, we study pricing schemes that assign value to queries issued over a
database. Previous work studied pricing mechanisms that compute the price of a
query by extending a data seller's explicit prices on certain queries, or
investigated the properties that a pricing function should exhibit without
detailing a generic construction. In this work, we present a formal framework
for pricing queries over data that allows the construction of general families
of pricing functions, with the main goal of avoiding arbitrage. We consider two
types of pricing schemes: instance-independent schemes, where the price depends
only on the structure of the query, and answer-dependent schemes, where the
price also depends on the query output. Our main result is a complete
characterization of the structure of pricing functions in both settings, by
relating it to properties of a function over a lattice. We use our
characterization, together with information-theoretic methods, to construct a
variety of arbitrage-free pricing functions. Finally, we discuss various
tradeoffs in the design space and present techniques for efficient computation
of the proposed pricing functions.Comment: full pape
A Cooperative Game Approach
νμλ
Όλ¬Έ (μμ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ°μ
곡νκ³Ό, 2021. 2. μ΄λμ£Ό.As machine learning thrives in both academia and industry at the moment, data plays a salient role in training and validating machines. Meanwhile, few works have been developed on the economic evaluation of the data in data exchange market. The contribution of our work is two-fold. First, we take advantage of semi-values from cooperative game theory to model revenue distribution problem. Second, we construct a model consisting of provider, firm, and market while considering the privacy and fairness of machine learning. We showed Banzhaf value could be a reliable alternative to Shapley value in calculating the contribution of each datum. Also, we formulate the firms revenue maximization problem and present numerical analysis in the case of binary classifier with classical data examples. By assuming the firm only uses high quality data, we analyze its behavior in four different scenarios varying the datas fairness and compensating cost for data providers privacy. It turned out that the Banzhaf value is more sensitive to the fairness of data than the Shapley value. We analyzed the maximum revenue proportion which the firm gives away to data providers, as well as the range of number of data the firm would acquire.κΈ°κ³νμ΅μ΄ νμ¬ μ΄λ‘ κ³Ό μ€μν μ μ© λͺ¨λμμ λ°μ ν¨μ λ°λΌ λ°μ΄ν°λ μΈκ³΅μ§λ₯ λͺ¨λΈμ νλ ¨νκ³ κ²μ¦νλ λ° μ€μν μν μ νκ³ μλ€. ννΈ, λ°μ΄ν° κ΅ν μμ₯μμ λ°μ΄ν°μ κ²½μ μ± νκ°μ λν μ°κ΅¬λ μ΄κΈ° λ¨κ³μ΄λ€. λ³Έ λ
Όλ¬Έμ κΈ°μ¬λ λ κ°μ§ κ΄μ μμ μ κ·Όν μ μλ€. 첫째, νλ κ²μ μ΄λ‘ μ κ°λ
μΈ semi-valueλ₯Ό λͺ¨λΈ μμ΅ λΆλ°° λ¬Έμ μ νμ©νλ€. λμ§Έ, μΈκ³΅μ§λ₯ λͺ¨λΈμ 곡μ μ±κ³Ό κ°μΈμ 보보νΈμ±μ κ³ λ €ν λ°μ΄ν° μ 곡μ, κΈ°μ
, μμ₯μΌλ‘ ꡬμ±λ λͺ¨λΈμ μ μνλ€. λ³Έ μ°κ΅¬μμ Banzhaf κ°μ κ° λ°μ΄ν°μ κΈ°μ¬λλ₯Ό κ³μ°ν λ Shapley κ°μ λμμ΄ λ μ μμμ νμΈνμλ€. λν νμ¬μ μμ΅ κ·Ήλν λ¬Έμ λ₯Ό λͺ¨λΈλ§νμκ³ , μΆκ°μ μΌλ‘ λ°μ΄ν° μμ λ₯Ό μ¬μ©νμ¬ μ΄μ§ λΆλ₯ λͺ¨λΈμ κ²½μ° μμΉ λΆμμ μ μνμλ€. μ΄λ₯Ό ν΅ν΄, Banzhaf κ°μ Shapley κ°λ³΄λ€ λ°μ΄ν°μ 곡μ μ±μ λ λ―Όκ°νλ€λ κ²μ νμΈνμλ€. λμκ° κΈ°μ
μ΄ κ³ νμ§ λ°μ΄ν°λ§μ μ¬μ©νλ€λ κ°μ νμ λ°μ΄ν°μ 곡μ μ±κ³Ό λ°μ΄ν° μ 곡μμ κ°μΈμ 보μ λν 보μλΉμ©μ λ¬λ¦¬νλ λ€ κ°μ§ μλ리μ€μμ κΈ°μ
μ νλμ λΆμνμλ€. κΈ°μ
μ λ°μ΄ν°κ° 곡μ ν μλ‘ λ°μ΄ν° μ 곡μμκ² λ ν° μμ΅μ 보μ₯ν΄μ£Όμκ³ , κ³ μ λΉμ©μ΄ μμμ§μλ‘ κ°λ³λΉμ©μ ν΅ν΄μ λ°μ΄ν° μ 곡μμκ² μμ΅μ λλ μ£Όλ κ²μ νμΈνμλ€.Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Problem Description 2
1.3 Organization of the Thesis 3
Chapter 2 Literature Review 4
2.1 Fair Machine Learning 4
2.2 Private Machine Learning 5
2.3 Data Valuation 6
2.3.1 Dataset Price Estimation 6
2.3.2 Equitable Price Estimation 7
Chapter 3 Data Market Model 8
3.1 Basic Assumptions and Model Settings 8
3.2 Firms Profit Maximizing Problem 10
3.3 Data Valuation 12
3.4 Binary Classification Setting 14
Chapter 4 Analysis 17
4.1 Semi-value Approximation 17
4.1.1 Convergence Analysis 17
4.1.2 Group Data Calculation 20
4.2 Binary Classification 22
4.2.1 Parameter Analysis 22
4.2.2 Scenario Analysis 24
4.2.2.1 Description 24
4.2.2.2 Synthetic Data 25
4.2.2.3 Shapley Value Based Valuation 26
4.2.2.4 Banzhaf Value Based Valuation 28
4.2.2.5 Comparative Analysis 30
4.3 Data Pricing 33
Chapter 5 Conclusion 35
Bibliography 38
κ΅λ¬Έμ΄λ‘ 43Maste
Improving Fairness for Data Valuation in Horizontal Federated Learning
Federated learning is an emerging decentralized machine learning scheme that
allows multiple data owners to work collaboratively while ensuring data
privacy. The success of federated learning depends largely on the participation
of data owners. To sustain and encourage data owners' participation, it is
crucial to fairly evaluate the quality of the data provided by the data owners
and reward them correspondingly. Federated Shapley value, recently proposed by
Wang et al. [Federated Learning, 2020], is a measure for data value under the
framework of federated learning that satisfies many desired properties for data
valuation. However, there are still factors of potential unfairness in the
design of federated Shapley value because two data owners with the same local
data may not receive the same evaluation. We propose a new measure called
completed federated Shapley value to improve the fairness of federated Shapley
value. The design depends on completing a matrix consisting of all the possible
contributions by different subsets of the data owners. It is shown under mild
conditions that this matrix is approximately low-rank by leveraging concepts
and tools from optimization. Both theoretical analysis and empirical evaluation
verify that the proposed measure does improve fairness in many circumstances
On Shapley Value in Data Assemblage Under Independent Utility
In many applications, an organization may want to acquire data from many data
owners. Data marketplaces allow data owners to produce data assemblage needed
by data buyers through coalition. To encourage coalitions to produce data, it
is critical to allocate revenue to data owners in a fair manner according to
their contributions. Although in literature Shapley fairness and alternatives
have been well explored to facilitate revenue allocation in data assemblage,
computing exact Shapley value for many data owners and large assembled data
sets through coalition remains challenging due to the combinatoric nature of
Shapley value. In this paper, we explore the decomposability of utility in data
assemblage by formulating the independent utility assumption. We argue that
independent utility enjoys many applications. Moreover, we identify interesting
properties of independent utility and develop fast computation techniques for
exact Shapley value under independent utility. Our experimental results on a
series of benchmark data sets show that our new approach not only guarantees
the exactness of Shapley value, but also achieves faster computation by orders
of magnitudes.Comment: Accepted by VLDB 202