18 research outputs found

    The Design of Arbitrage-Free Data Pricing Schemes

    Get PDF
    Motivated by a growing market that involves buying and selling data over the web, we study pricing schemes that assign value to queries issued over a database. Previous work studied pricing mechanisms that compute the price of a query by extending a data seller's explicit prices on certain queries, or investigated the properties that a pricing function should exhibit without detailing a generic construction. In this work, we present a formal framework for pricing queries over data that allows the construction of general families of pricing functions, with the main goal of avoiding arbitrage. We consider two types of pricing schemes: instance-independent schemes, where the price depends only on the structure of the query, and answer-dependent schemes, where the price also depends on the query output. Our main result is a complete characterization of the structure of pricing functions in both settings, by relating it to properties of a function over a lattice. We use our characterization, together with information-theoretic methods, to construct a variety of arbitrage-free pricing functions. Finally, we discuss various tradeoffs in the design space and present techniques for efficient computation of the proposed pricing functions.Comment: full pape

    A Cooperative Game Approach

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 산업곡학과, 2021. 2. 이덕주.As machine learning thrives in both academia and industry at the moment, data plays a salient role in training and validating machines. Meanwhile, few works have been developed on the economic evaluation of the data in data exchange market. The contribution of our work is two-fold. First, we take advantage of semi-values from cooperative game theory to model revenue distribution problem. Second, we construct a model consisting of provider, firm, and market while considering the privacy and fairness of machine learning. We showed Banzhaf value could be a reliable alternative to Shapley value in calculating the contribution of each datum. Also, we formulate the firms revenue maximization problem and present numerical analysis in the case of binary classifier with classical data examples. By assuming the firm only uses high quality data, we analyze its behavior in four different scenarios varying the datas fairness and compensating cost for data providers privacy. It turned out that the Banzhaf value is more sensitive to the fairness of data than the Shapley value. We analyzed the maximum revenue proportion which the firm gives away to data providers, as well as the range of number of data the firm would acquire.κΈ°κ³„ν•™μŠ΅μ΄ ν˜„μž¬ 이둠과 μ‹€μƒν™œ 적용 λͺ¨λ‘μ—μ„œ λ°œμ „ν•¨μ— 따라 λ°μ΄ν„°λŠ” 인곡지λŠ₯ λͺ¨λΈμ„ ν›ˆλ ¨ν•˜κ³  κ²€μ¦ν•˜λŠ” 데 μ€‘μš”ν•œ 역할을 ν•˜κ³  μžˆλ‹€. ν•œνŽΈ, 데이터 κ΅ν™˜ μ‹œμž₯μ—μ„œ λ°μ΄ν„°μ˜ κ²½μ œμ„± 평가에 λŒ€ν•œ μ—°κ΅¬λŠ” 초기 단계이닀. λ³Έ λ…Όλ¬Έμ˜ κΈ°μ—¬λŠ” 두 가지 κ΄€μ μ—μ„œ μ ‘κ·Όν•  수 μžˆλ‹€. 첫째, ν˜‘λ™ κ²Œμž„ 이둠의 κ°œλ…μΈ semi-valueλ₯Ό λͺ¨λΈ 수읡 λΆ„λ°° λ¬Έμ œμ— ν™œμš©ν•œλ‹€. λ‘˜μ§Έ, 인곡지λŠ₯ λͺ¨λΈμ˜ 곡정성과 κ°œμΈμ •λ³΄λ³΄ν˜Έμ„±μ„ κ³ λ €ν•œ 데이터 제곡자, κΈ°μ—…, μ‹œμž₯으둜 κ΅¬μ„±λœ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œ Banzhaf 값은 각 λ°μ΄ν„°μ˜ 기여도λ₯Ό 계산할 λ•Œ Shapley κ°’μ˜ λŒ€μ•ˆμ΄ 될 수 μžˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€. λ˜ν•œ νšŒμ‚¬μ˜ 수읡 κ·ΉλŒ€ν™” 문제λ₯Ό λͺ¨λΈλ§ν•˜μ˜€κ³ , μΆ”κ°€μ μœΌλ‘œ 데이터 예제λ₯Ό μ‚¬μš©ν•˜μ—¬ 이진 λΆ„λ₯˜ λͺ¨λΈμ˜ 경우 수치 뢄석을 μ œμ‹œν•˜μ˜€λ‹€. 이λ₯Ό 톡해, Banzhaf 값은 Shapley 값보닀 λ°μ΄ν„°μ˜ 곡정성에 더 λ―Όκ°ν•˜λ‹€λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. λ‚˜μ•„κ°€ 기업이 κ³ ν’ˆμ§ˆ λ°μ΄ν„°λ§Œμ„ μ‚¬μš©ν•œλ‹€λŠ” κ°€μ •ν•˜μ— λ°μ΄ν„°μ˜ 곡정성과 데이터 제곡자의 κ°œμΈμ •λ³΄μ— λŒ€ν•œ λ³΄μƒλΉ„μš©μ„ λ‹¬λ¦¬ν•˜λŠ” λ„€ 가지 μ‹œλ‚˜λ¦¬μ˜€μ—μ„œ κΈ°μ—…μ˜ 행동을 λΆ„μ„ν•˜μ˜€λ‹€. 기업은 데이터가 κ³΅μ •ν• μˆ˜λ‘ 데이터 μ œκ³΅μžμ—κ²Œ 더 큰 μˆ˜μ΅μ„ 보μž₯ν•΄μ£Όμ—ˆκ³ , κ³ μ •λΉ„μš©μ΄ μž‘μ•„μ§ˆμˆ˜λ‘ κ°€λ³€λΉ„μš©μ„ ν†΅ν•΄μ„œ 데이터 μ œκ³΅μžμ—κ²Œ μˆ˜μ΅μ„ λ‚˜λˆ μ£ΌλŠ” 것을 ν™•μΈν•˜μ˜€λ‹€.Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Problem Description 2 1.3 Organization of the Thesis 3 Chapter 2 Literature Review 4 2.1 Fair Machine Learning 4 2.2 Private Machine Learning 5 2.3 Data Valuation 6 2.3.1 Dataset Price Estimation 6 2.3.2 Equitable Price Estimation 7 Chapter 3 Data Market Model 8 3.1 Basic Assumptions and Model Settings 8 3.2 Firms Profit Maximizing Problem 10 3.3 Data Valuation 12 3.4 Binary Classification Setting 14 Chapter 4 Analysis 17 4.1 Semi-value Approximation 17 4.1.1 Convergence Analysis 17 4.1.2 Group Data Calculation 20 4.2 Binary Classification 22 4.2.1 Parameter Analysis 22 4.2.2 Scenario Analysis 24 4.2.2.1 Description 24 4.2.2.2 Synthetic Data 25 4.2.2.3 Shapley Value Based Valuation 26 4.2.2.4 Banzhaf Value Based Valuation 28 4.2.2.5 Comparative Analysis 30 4.3 Data Pricing 33 Chapter 5 Conclusion 35 Bibliography 38 ꡭ문초둝 43Maste

    Improving Fairness for Data Valuation in Horizontal Federated Learning

    Full text link
    Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners' participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances

    Towards Query Pricing on Incomplete Data

    Get PDF

    On Shapley Value in Data Assemblage Under Independent Utility

    Full text link
    In many applications, an organization may want to acquire data from many data owners. Data marketplaces allow data owners to produce data assemblage needed by data buyers through coalition. To encourage coalitions to produce data, it is critical to allocate revenue to data owners in a fair manner according to their contributions. Although in literature Shapley fairness and alternatives have been well explored to facilitate revenue allocation in data assemblage, computing exact Shapley value for many data owners and large assembled data sets through coalition remains challenging due to the combinatoric nature of Shapley value. In this paper, we explore the decomposability of utility in data assemblage by formulating the independent utility assumption. We argue that independent utility enjoys many applications. Moreover, we identify interesting properties of independent utility and develop fast computation techniques for exact Shapley value under independent utility. Our experimental results on a series of benchmark data sets show that our new approach not only guarantees the exactness of Shapley value, but also achieves faster computation by orders of magnitudes.Comment: Accepted by VLDB 202
    corecore