234 research outputs found
Replication-Robust Payoff-Allocation with Applications in Machine Learning Marketplaces
The ever-increasing take-up of machine learning techniques requires ever-more
application-specific training data. Manually collecting such training data is a
tedious and time-consuming process. Data marketplaces represent a compelling
alternative, providing an easy way for acquiring data from potential data
providers. A key component of such marketplaces is the compensation mechanism
for data providers. Classic payoff-allocation methods such as the Shapley value
can be vulnerable to data-replication attacks, and are infeasible to compute in
the absence of efficient approximation algorithms. To address these challenges,
we present an extensive theoretical study on the vulnerabilities of game
theoretic payoff-allocation schemes to replication attacks. Our insights apply
to a wide range of payoff-allocation schemes, and enable the design of
customised replication-robust payoff-allocations. Furthermore, we present a
novel efficient sampling algorithm for approximating payoff-allocation schemes
based on marginal contributions. In our experiments, we validate the
replication-robustness of classic payoff-allocation schemes and new
payoff-allocation schemes derived from our theoretical insights. We also
demonstrate the efficiency of our proposed sampling algorithm on a wide range
of machine learning tasks
Towards a human-centric data economy
Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming
a key production factor, comparable in importance to capital, land, or labour in an increasingly
digital economy. In spite of an ever-growing demand for third-party data in the B2B
market, firms are generally reluctant to share their information. This is due to the unique characteristics
of “data” as an economic good (a freely replicable, non-depletable asset holding a highly
combinatorial and context-specific value), which moves digital companies to hoard and protect
their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise
the provision of innovative services built upon them. As a result, most of those valuable assets
still remain unexploited in corporate silos nowadays.
This situation is shaping the so-called data economy around a number of champions, and it is
hampering the benefits of a global data exchange on a large scale. Some analysts have estimated
the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking
the value of data has become a central policy of the European Union, which also estimated
the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of
the European Data Strategy, the European Commission is also steering relevant initiatives aimed
to identify relevant cross-industry use cases involving different verticals, and to enable sovereign
data exchanges to realise them.
Among individuals, the massive collection and exploitation of personal data by digital firms
in exchange of services, often with little or no consent, has raised a general concern about privacy
and data protection. Apart from spurring recent legislative developments in this direction,
this concern has raised some voices warning against the unsustainability of the existing digital
economics (few digital champions, potential negative impact on employment, growing inequality),
some of which propose that people are paid for their data in a sort of worldwide data labour
market as a potential solution to this dilemma [114, 115, 155].
From a technical perspective, we are far from having the required technology and algorithms
that will enable such a human-centric data economy. Even its scope is still blurry, and the question
about the value of data, at least, controversial. Research works from different disciplines have
studied the data value chain, different approaches to the value of data, how to price data assets,
and novel data marketplace designs. At the same time, complex legal and ethical issues with
respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets
over the Internet. We carry out what is, to the best of our understanding, the most thorough survey
of commercial data marketplaces. In this work, we have catalogued and characterised ten different
business models, including those of personal information management systems, companies born
in the wake of recent data protection regulations and aiming at empowering end users to take
control of their data. We have also identified the challenges faced by different types of entities,
and what kind of solutions and technology they are using to provide their services.
Then we present a first of its kind measurement study that sheds light on the prices of data
in the market using a novel methodology. We study how ten commercial data marketplaces categorise
and classify data assets, and which categories of data command higher prices. We also
develop classifiers for comparing data products across different marketplaces, and we study the
characteristics of the most valuable data assets and the features that specific vendors use to set
the price of their data products. Based on this information and adding data products offered by
other 33 data providers, we develop a regression analysis for revealing features that correlate with
prices of data products. As a result, we also implement the basic building blocks of a novel data
pricing tool capable of providing a hint of the market price of a new data product using as inputs
just its metadata. This tool would provide more transparency on the prices of data products in
the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of
nascent markets.
Next we turn to topics related to data marketplace design. Particularly, we study how buyers
can select and purchase suitable data for their tasks without requiring a priori access to such
data in order to make a purchase decision, and how marketplaces can distribute payoffs for a
data transaction combining data of different sources among the corresponding providers, be they
individuals or firms. The difficulty of both problems is further exacerbated in a human-centric
data economy where buyers have to choose among data of thousands of individuals, and where
marketplaces have to distribute payoffs to thousands of people contributing personal data to a
specific transaction.
Regarding the selection process, we compare different purchase strategies depending on the
level of information available to data buyers at the time of making decisions. A first methodological
contribution of our work is proposing a data evaluation stage prior to datasets being selected
and purchased by buyers in a marketplace. We show that buyers can significantly improve the
performance of the purchasing process just by being provided with a measurement of the performance
of their models when trained by the marketplace with individual eligible datasets. We
design purchase strategies that exploit such functionality and we call the resulting algorithm Try
Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to
near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N)
- needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value
of spatio-temporal datasets combined in marketplaces for predicting transportation demand and
travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and
New York we show that the value of data is different for each individual, and cannot be approximated
by its volume. Our results reveal that even more complex approaches based on the
“leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value
from economics and game theory, such as the Shapley value, need to be employed if one wishes
to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms.
However, the Shapley value entails serious computational challenges. Its exact calculation
requires repetitively training and evaluating every combination of data sources and hence O(N!)
or O(2N) computational time, which is unfeasible for complex models or thousands of individuals.
Moreover, our work paves the way to new methods of measuring the value of spatio-temporal
data. We identify heuristics such as entropy or similarity to the average that show a significant
correlation with the Shapley value and therefore can be used to overcome the significant computational
challenges posed by Shapley approximation algorithms in this specific context.
We conclude with a number of open issues and propose further research directions that leverage
the contributions and findings of this dissertation. These include monitoring data transactions
to better measure data markets, and complementing market data with actual transaction prices
to build a more accurate data pricing tool. A human-centric data economy would also require
that the contributions of thousands of individuals to machine learning tasks are calculated daily.
For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff
calculation processes in data marketplaces. In that direction, we also point to some alternatives
to repetitively training and evaluating a model to select data based on Try Before You Buy and
approximate the Shapley value. Finally, we discuss the challenges and potential technologies that
help with building a federation of standardised data marketplaces.
The data economy will develop fast in the upcoming years, and researchers from different
disciplines will work together to unlock the value of data and make the most out of it. Maybe
the proposal of getting paid for our data and our contribution to the data economy finally flies,
or maybe it is other proposals such as the robot tax that are finally used to balance the power
between individuals and tech firms in the digital economy. Still, we hope our work sheds light on
the value of data, and contributes to making the price of data more transparent and, eventually, to
moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue
A Property Rights Enforcement and Pricing Model for IIoT Data Marketplaces
학위논문(석사)--서울대학교 대학원 :공과대학 협동과정 기술경영·경제·정책전공,2019. 8. Jörn Altmann.The Industrial Internet of Things (IIoT) has become a valuable data source for products and services based on advanced data analytics. However, evidence suggests that industries are suffering a significant loss of value creation from insufficient IIoT data sharing. We argue that the limited utilization of the Sensing as a Service business model is caused by the economic and technological characteristics of sensor data, and the corresponding absence of applicable digital rights management models. Therefore, we propose a combined property rights enforcement and pricing model to solve the IIoT data sharing incentive problem.산업용 사물 인터넷 (IIoT) 데이터가 제품과 서비스를 위한 중요한 고급 데이터 소스로 여겨지고 있지만, 여전히 수 많은 기업들은 불충분한 산업용 사물 인터넷 데이터 공유 시스템으로 인하여 고충을 겪고 있다. 방대한 분량의 산업용 데이터가 제대로 거래되지 못하고 있으며, 이는 데이터의 커다란 가치 손실로 이어지고 있다. 본 연구에서는 서비스로서의 센싱 (Sensing as a Service) 비지니스 모델이 한정적으로 적용되고 있는 원인이 해당 정보의 경제적, 기술적 특징들을 반영하는 디지털 권리 시스템의 부재에 기인한다고 보고 있다. 따라서 본 연구에서는 산업용 사물 인터넷 데이터에 대한 지적재산권 집행 시스템과 데이터 가격산정 모델을 제안하여 산업용 사물 인터넷 데이터 공유 인센티브 문제를 해결하고자 한다.1 Introduction 1
1.1 Background 1
1.2 Problem Description 6
1.3 Research Objective and Question 8
1.4 Methodology 8
1.5 Contributions 9
1.6 Structure 10
2 Literature Review 11
2.1 Sensing as a Service 11
2.2 Economic Characteristics of IIoT Data 14
2.2.1 Property Rights of Data 18
2.2.2 Licensing of IIoT Data 23
2.3 IIoT Data Marketplaces 25
2.3.1 Use-cases and Value Propositions 30
2.3.2 Market Structures and Pricing Models 34
2.4 Digital Rights Management for IIoT 36
3 Model 44
3.1 Assumptions 45
3.2 Watermarking Technique 47
3.2.1 Function 48
3.2.2 Example 50
3.2.3 Robustness 51
3.3 Economic Reasoning 54
3.3.1 The Quality Gap 55
3.3.2 Cost of Watermarking (CoW) 57
3.3.3 Cost of Attacking (CoA) 58
4 Analytical Analysis 60
4.1 Equilibrium Between CoW and CoA 60
4.2 Determining the Optimal Quality Gap 62
4.3 Applicability of the Quality Gap Function 64
5 Conclusion 66
5.1 Summary 66
5.2 Discussion 66
6 Limitations and Future Research 68
References 70
Abstract (Korean) 79Maste
Evolutionary Mechanism Design
The advent of large-scale distributed systems poses unique engineering challenges. In open
systems such as the internet it is not possible to prescribe the behaviour of all of the
components of the system in advance. Rather, we attempt to design infrastructure, such as
network protocols, in such a way that the overall system is robust despite the fact that
numerous arbitrary, non-certified, third-party components can connect to our system.
Economists have long understood this issue, since it is analogous to the design of the rules
governing auctions and other marketplaces, in which we attempt to achieve sociallydesirable
outcomes despite the impossibility of prescribing the exact behaviour of the
market participants, who may attempt to subvert the market for their own personal gain.
This field is known as 'mechanism design': the science of designing rules of a game to
achieve a specific outcome, even though each participant may be self-interested. Although
it originated in economics, mechanism design has become an important foundation of
multi-agent systems (MAS) research. In many scenarios mechanism design and auction
theory yield clear-cut results; however, there are many situations in which the underlying
assumptions of the theory are violated due to the messiness of the real-world. In this thesis
I introduce an evolutionary methodology for mechanism design, which is able to incorporate
arbitrary design objectives and domain assumptions, and I validate the methodology using
empirical techniques
Automated Markets and Trading Agents
Computer automation has the potential, just starting to be realized, of transforming the
design and operation of markets, and the behaviors of agents trading in them. We discuss
the possibilities for automating markets, presenting a broad conceptual framework
covering resource allocation as well as enabling marketplace services such as search
and transaction execution. One of the most intriguing opportunities is provided by markets
implementing computationally sophisticated negotiation mechanisms, for example
combinatorial auctions. An important theme that emerges from the literature is the centrality
of design decisions about matching the domain of goods over which a mechanism
operates to the domain over which agents have preferences. When the match is imperfect
(as is almost inevitable), the market game induced by the mechanism is analytically
intractable, and the literature provides an incomplete characterization of rational bidding
policies. A review of the literature suggests that much of our existing knowledge
comes from computational simulations, including controlled studies of abstract market
designs (e.g., simultaneous ascending auctions), and research tournaments comparing
agent strategies in a variety of market scenarios. An empirical game-theoretic methodology
combines the advantages of simulation, agent-based modeling, and statistical and
game-theoretic analysis.http://deepblue.lib.umich.edu/bitstream/2027.42/49510/1/ace_galleys.pd
Essays on Experimental Economics for the Environment and Economics of Privacy
Im 21. Jahrhundert bestehen zwei Hauptherausforderungen der ökonomischen Forschung darin,
effektive Lösung für die Gestaltung der digitalen Transformation und für die Eindämmung des
menschengemachten Klimawandels aufzuzeigen. Die Forschung zur digitalen
Transformationen ist eng mit verschiedenen Datenschutz- (oder Privatsphäre-)relevanten
Fragestellungen verbunden, die sich vorwiegend auf die Präferenzen und Entscheidungen von
Einzelpersonen beziehen. Im Gegensatz dazu befasst sich die Forschung zum Klimawandel
damit, welche Faktoren eine effektive Kooperation zwischen mehreren Individuen erschweren
und wie gemeinsame Ziele, wie die Begrenzung des Klimawandels, erreicht werden können.
Die Verbindung zwischen Datenschutz- und Umweltökonomie besteht darin, dass viele digitale
Technologien das Potential haben, positive externe Effekte zu erzeugen, die zur Bereitstellung
oder Erhaltung öffentlicher Güter beitragen können. Oftmals sind diese digitalen Technologien
jedoch dadurch gekennzeichnet, dass ihre Nutzung die Offenlegung persönlicher Informationen
erfordert. Der potentielle Erfolg dieser Technologien und institutionellen Mechanismen hängt
daher weitgehend von der gesellschaftlichen Akzeptanz gegenüber diesen Technologien und
institutionellen Mechanismen ab.
Jeder Artikel in dieser kumulativen Dissertation leistet einen Beitrag zu der übergeordneten
Fragestellung, inwiefern ökonomische Experimente dazu beitragen können, die Effizienz von
Institutionen und Technologien, die öffentliche Güter bereitstellen oder erhalten können, zu
evaluieren und potentiell zu steigern. Im ersten Artikel wird untersucht, ob der
Publikationsprozess von Fachzeitschriften im Bereich der experimentellen Ökonomik
verbessert werden kann. Die weiteren fünf Artikel befassen sich direkt oder indirekt mit
unterschiedlichen, aber miteinander verbundenen Problemstellungen zu öffentlichen Gütern,
die eng mit Fragen zum Datenschutz oder Umweltfragen verbunden sind. Methodisch sind die
sechs Artikel dadurch gekennzeichnet, dass sie die experimentelle Methode entweder direkt für
ihre individuellen Forschungsfragen anwenden oder die Ergebnisse der experimentellen
Literatur nutzen, um Hypothesen abzuleiten und empirische Ergebnisse in spezifischen
Datenschutz-relevanten Kontexten zu erklären.
Im Bereich des Datenschutzes werden in der Dissertation Faktoren identifiziert, die die
Weitergabe von Daten in verschiedenen Smartphone-Apps aus Schlüsselindustrien der
digitalen Transformation und auf Arbeitgeberbewertungsplattformen beeinflussen. Im Bereich
der Umweltökonomie wird im ersten Artikel ein institutioneller Mechanismus vorgeschlagen,
IV
der die Bereitschaft erhöhen kann, zu Recyclingsystemen beizutragen und im zweiten Artikel
wird gezeigt, dass die Möglichkeit, ein öffentliches Gut auszubeuten, die Kooperation zur
Eindämmung des Klimawandels erschweren kann.In the 21st century, two main challenges for economic research are to propose effective
solutions to shape the digital transformation and mitigate human-induced climate change.
Research on digital transformation is closely linked to various privacy-related issues, which
mostly relate to the preferences and decisions of individuals. In contrast, climate change
research examines which factors impede effective cooperation among multiple individuals and
investigates how common goals, such as limiting climate change, can be achieved.
The link between economics of privacy and environmental economics is that many digital
technologies have the potential to generate positive externalities that can contribute to the
provision or maintenance of public goods. However, in many cases these digital technologies
are characterized by the fact that their use requires the disclosure of personal information. The
potential success of these technologies and institutional mechanisms therefore largely depends
on social acceptance towards these technologies and institutional mechanisms.
Each paper in this cumulative dissertation contributes to the broader question of how economic
experiments can contribute to evaluate and potentially increase the efficiency of institutions and
technologies that can provide or maintain public goods. The first paper investigates whether the
publication process of journals in the field of experimental economics can potentially be
improved. The remaining five papers focus directly or indirectly on different but related public
goods problems which are closely linked to privacy or environmental issues. Methodologically,
the six papers share the feature that they either directly apply the experimental method for their
individual research questions or use the results of experimental literature to derive hypotheses
and explain empirical outcomes in specific privacy-related contexts.
In the field of privacy, the dissertation identifies factors that influence data sharing in several
smartphone apps from key industries of the digital transformation and on employer review
platforms. In the area of environmental economics, the first paper proposes an institutional
mechanism that can increase the willingness to contribute to recycling systems, and the second
paper shows that the ability to exploit a public good can impede cooperation to mitigate climate
change
Design and implementation of a simulator to explore cooperation in distributed environments
La manca de recursos computacionals dels ordinadors personals, afegit al increment de requeriments cada vegada mes patent en el software actual, està demanant urgentment a una redefinició dels paradigmes de computació que hi havia fins ara. Les xarxes de computació distribuïda son una de les solucions que actualment s'estan valorant per resoldre el problema de la manca de recursos hardware. No obstant, la aplicació d'aquestes solucions es veu aturada per la manca de coneixement sobre el seu funcionament i la necessitat tecnològica de resoldre reptes que la recerca i la innovació encara no han descobert com encarar. Un dels problemes pendents és com generar la suficient confiança entre els usuaris i per tant, assegurar un cert nivell de cooperació dins de la xarxa, o el que és el mateix: una col·laboració bidireccional. Aquest projecte de màster pretén explorar els diferents mecanismes basats en incentius i topologies per promoure la cooperació utilitzant com a eina un simulador capaç de valorar aquests indicadors sobre múltiples topologies de xarxa, aplicacions i estratègies sobre sistemes GRID distribuïts. Per tant, una primera part est`a enfocada a l'estudi de la cooperació e identificar quins paràmetres poden portar-la a terme, i quins no influeixen. El segon objectiu és la creació d'una eina prou modular i potent com per extreure suficients conclusions de la seva utilització sobre diferents escenaris i, finalment, una ultima part centrada en l'anàlisi de resultats i la creació d'un protocol de negociació verificable per la compartició de recursos en xarxes distribuïdes
Secure Shapley Value for Cross-Silo Federated Learning
The Shapley value (SV) is a fair and principled metric for contribution
evaluation in cross-silo federated learning (cross-silo FL), wherein
organizations, i.e., clients, collaboratively train prediction models with the
coordination of a parameter server. However, existing SV calculation methods
for FL assume that the server can access the raw FL models and public test
data. This may not be a valid assumption in practice considering the emerging
privacy attacks on FL models and the fact that test data might be clients'
private assets. Hence, we investigate the problem of secure SV calculation for
cross-silo FL. We first propose HESV, a one-server solution based solely on
homomorphic encryption (HE) for privacy protection, which has limitations in
efficiency. To overcome these limitations, we propose SecSV, an efficient
two-server protocol with the following novel features. First, SecSV utilizes a
hybrid privacy protection scheme to avoid ciphertext--ciphertext
multiplications between test data and models, which are extremely expensive
under HE. Second, an efficient secure matrix multiplication method is proposed
for SecSV. Third, SecSV strategically identifies and skips some test samples
without significantly affecting the evaluation accuracy. Our experiments
demonstrate that SecSV is 7.2-36.6 times as fast as HESV, with a limited loss
in the accuracy of calculated SVs.Comment: Extened report for our VLDB 2023 pape
- …