237 research outputs found
Towards a human-centric data economy
Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming
a key production factor, comparable in importance to capital, land, or labour in an increasingly
digital economy. In spite of an ever-growing demand for third-party data in the B2B
market, firms are generally reluctant to share their information. This is due to the unique characteristics
of “data” as an economic good (a freely replicable, non-depletable asset holding a highly
combinatorial and context-specific value), which moves digital companies to hoard and protect
their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise
the provision of innovative services built upon them. As a result, most of those valuable assets
still remain unexploited in corporate silos nowadays.
This situation is shaping the so-called data economy around a number of champions, and it is
hampering the benefits of a global data exchange on a large scale. Some analysts have estimated
the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking
the value of data has become a central policy of the European Union, which also estimated
the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of
the European Data Strategy, the European Commission is also steering relevant initiatives aimed
to identify relevant cross-industry use cases involving different verticals, and to enable sovereign
data exchanges to realise them.
Among individuals, the massive collection and exploitation of personal data by digital firms
in exchange of services, often with little or no consent, has raised a general concern about privacy
and data protection. Apart from spurring recent legislative developments in this direction,
this concern has raised some voices warning against the unsustainability of the existing digital
economics (few digital champions, potential negative impact on employment, growing inequality),
some of which propose that people are paid for their data in a sort of worldwide data labour
market as a potential solution to this dilemma [114, 115, 155].
From a technical perspective, we are far from having the required technology and algorithms
that will enable such a human-centric data economy. Even its scope is still blurry, and the question
about the value of data, at least, controversial. Research works from different disciplines have
studied the data value chain, different approaches to the value of data, how to price data assets,
and novel data marketplace designs. At the same time, complex legal and ethical issues with
respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets
over the Internet. We carry out what is, to the best of our understanding, the most thorough survey
of commercial data marketplaces. In this work, we have catalogued and characterised ten different
business models, including those of personal information management systems, companies born
in the wake of recent data protection regulations and aiming at empowering end users to take
control of their data. We have also identified the challenges faced by different types of entities,
and what kind of solutions and technology they are using to provide their services.
Then we present a first of its kind measurement study that sheds light on the prices of data
in the market using a novel methodology. We study how ten commercial data marketplaces categorise
and classify data assets, and which categories of data command higher prices. We also
develop classifiers for comparing data products across different marketplaces, and we study the
characteristics of the most valuable data assets and the features that specific vendors use to set
the price of their data products. Based on this information and adding data products offered by
other 33 data providers, we develop a regression analysis for revealing features that correlate with
prices of data products. As a result, we also implement the basic building blocks of a novel data
pricing tool capable of providing a hint of the market price of a new data product using as inputs
just its metadata. This tool would provide more transparency on the prices of data products in
the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of
nascent markets.
Next we turn to topics related to data marketplace design. Particularly, we study how buyers
can select and purchase suitable data for their tasks without requiring a priori access to such
data in order to make a purchase decision, and how marketplaces can distribute payoffs for a
data transaction combining data of different sources among the corresponding providers, be they
individuals or firms. The difficulty of both problems is further exacerbated in a human-centric
data economy where buyers have to choose among data of thousands of individuals, and where
marketplaces have to distribute payoffs to thousands of people contributing personal data to a
specific transaction.
Regarding the selection process, we compare different purchase strategies depending on the
level of information available to data buyers at the time of making decisions. A first methodological
contribution of our work is proposing a data evaluation stage prior to datasets being selected
and purchased by buyers in a marketplace. We show that buyers can significantly improve the
performance of the purchasing process just by being provided with a measurement of the performance
of their models when trained by the marketplace with individual eligible datasets. We
design purchase strategies that exploit such functionality and we call the resulting algorithm Try
Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to
near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N)
- needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value
of spatio-temporal datasets combined in marketplaces for predicting transportation demand and
travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and
New York we show that the value of data is different for each individual, and cannot be approximated
by its volume. Our results reveal that even more complex approaches based on the
“leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value
from economics and game theory, such as the Shapley value, need to be employed if one wishes
to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms.
However, the Shapley value entails serious computational challenges. Its exact calculation
requires repetitively training and evaluating every combination of data sources and hence O(N!)
or O(2N) computational time, which is unfeasible for complex models or thousands of individuals.
Moreover, our work paves the way to new methods of measuring the value of spatio-temporal
data. We identify heuristics such as entropy or similarity to the average that show a significant
correlation with the Shapley value and therefore can be used to overcome the significant computational
challenges posed by Shapley approximation algorithms in this specific context.
We conclude with a number of open issues and propose further research directions that leverage
the contributions and findings of this dissertation. These include monitoring data transactions
to better measure data markets, and complementing market data with actual transaction prices
to build a more accurate data pricing tool. A human-centric data economy would also require
that the contributions of thousands of individuals to machine learning tasks are calculated daily.
For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff
calculation processes in data marketplaces. In that direction, we also point to some alternatives
to repetitively training and evaluating a model to select data based on Try Before You Buy and
approximate the Shapley value. Finally, we discuss the challenges and potential technologies that
help with building a federation of standardised data marketplaces.
The data economy will develop fast in the upcoming years, and researchers from different
disciplines will work together to unlock the value of data and make the most out of it. Maybe
the proposal of getting paid for our data and our contribution to the data economy finally flies,
or maybe it is other proposals such as the robot tax that are finally used to balance the power
between individuals and tech firms in the digital economy. Still, we hope our work sheds light on
the value of data, and contributes to making the price of data more transparent and, eventually, to
moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue
Challenges and perspectives of hate speech research
This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research
A Low-Energy Security Solution for IoT-Based Smart Farms
This work proposes a novel configuration of the Transport Layer Security protocol (TLS),
suitable for low energy Internet of Things (IoT), applications. The motivation behind
the redesign of TLS is energy consumption minimisation and sustainable farming, as
exemplified by an application domain of aquaponic smart farms. The work therefore considers
decentralisation of a formerly centralised security model, with a focus on reducing energy
consumption for battery powered devices. The research presents a four-part investigation
into the security solution, composed of a risk assessment, energy analysis of authentication
and data exchange functions, and finally the design and verification of a novel consensus
authorisation mechanism. The first investigation considered traditional risk-driven threat
assessment, but to include energy reduction, working towards device longevity within a
content-oriented framework. Since the aquaponics environments include limited but specific
data exchanges, a content-oriented approach produced valuable insights into security and
privacy requirements that would later be tested by implementing a variety of mechanisms
available on the ESP32.
The second and third investigations featured the energy analysis of authentication
and data exchange functions respectively, where the results of the risk assessment were
implemented to compare the re-configurations of TLS mechanisms and domain content.
Results concluded that selective confidentiality and persistent secure sessions between paired
devices enabled considerable improvements for energy consumptions, and were a good
reflection of the possibilities suggested by the risk assessment.
The fourth and final investigation proposed a granular authorisation design to increase
the safety of access control that would otherwise be binary in TLS. The motivation was
for damage mitigation from inside attacks or network faults. The approach involved an
automated, hierarchy-based, decentralised network topology to reduce data duplication whilst
still providing robustness beyond the vulnerability of central governance. Formal verification
using model-checking indicated a safe design model, using four automated back-ends.
The research concludes that lower energy IoT solutions for the smart farm application
domain are possible
Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community.
The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner.
Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services
Deep neural networks in the cloud: Review, applications, challenges and research directions
Deep neural networks (DNNs) are currently being deployed as machine learning technology in a wide
range of important real-world applications. DNNs consist of a huge number of parameters that require
millions of floating-point operations (FLOPs) to be executed both in learning and prediction modes. A
more effective method is to implement DNNs in a cloud computing system equipped with centralized
servers and data storage sub-systems with high-speed and high-performance computing capabilities.
This paper presents an up-to-date survey on current state-of-the-art deployed DNNs for cloud computing.
Various DNN complexities associated with different architectures are presented and discussed alongside
the necessities of using cloud computing. We also present an extensive overview of different cloud
computing platforms for the deployment of DNNs and discuss them in detail. Moreover, DNN applications
already deployed in cloud computing systems are reviewed to demonstrate the advantages of using
cloud computing for DNNs. The paper emphasizes the challenges of deploying DNNs in cloud computing
systems and provides guidance on enhancing current and new deployments.The EGIA project (KK-2022/00119The
Consolidated Research Group MATHMODE (IT1456-22
Platform://Democracy: Perspectives on Platform Power, Public Values and the Potential of Social Media Councils
Social media platforms have created private communication orders which they rule through terms of service and algorithmic moderation practices. As their impact on public communication and human rights has grown, different models to increase the role of public interests and values in the design of their rules and their practices has, too. But who should speak for both the users and the public at large? Bodies of experts and/or selected user representatives, usually called Platform Councils of Social Media Councils (SMCs) have gained attention as a potential solution. Examples of Social Media Councils include Meta’s Oversight Board but most platforms companies have so far shied away from installing one. This survey of approaches to increasing the quality of platform decision-making and content governance involving more than 35 researchers from four continents brough to together in regional "research clinics" makes clear that trade-offs have to be carefully balanced. The larger the council, the less effective is its decision-making, even if its legitimacy might be increased. While there is no one-size-fits-all approach, the projects demonstrates that procedures matter, that multistakeholderism is a key concept for effective Social Media Councils, and that incorporating technical expertise and promoting inclusivity are important considerations in their design. As the Digital Services Act becomes effective in 2024, a Social Media Council for Germany’s Digital Services Coordinator (overseeing platforms) can serve as test case and should be closely monitored. Beyond national councils, there is strong case for a commission focused on ensuring human rights online can be modeled after the Venice Commission and can provide expertise and guidelines on policy questions related to platform governance, particularly those that affect public interests like special treatment for public figures, for mass media and algorithmic diversity. The commission can be staffed by a diverse set of experts from selected organizations and institutions established in the platform governance field
Sacred Assets: Design for the Food System
Food is intimate to every individual and can communicate across cultures. The problem of food access is not technically a problem of lack of food, but largely an issue of income inequality and individual mobility. In Syracuse, every zip code has at least one sector that is considered a food desert. However, the region of Central New York that Syracuse sits is a bastion of food producers, including small farmers which are a keystone for a sustainable future. This project investigates how to expand food access as well as support small farmers by using strategic technologies. There 50 total participants in this study from across three methods: survey (n = 36), semi-structured interviews (n = 8), and user testing (n = 6). This study includes the development of a mobile application named Farm Loop, which empowers farmers to sell directly to consumers via an online retail platform or to the emergency food system for a lower price. This application has a system design that uses a business model that creates a community fund that expands food access and pays farmers. This supports both the small farmer with additional revenue streams while expanding food access
The platformised creative worker: an ethnographic study of precarity and inequality in the London influencer industry (2017-2022)
Building on the recent proliferation of scholarly interest in the impacts of platformisation on the Cultural and Creative Industries, this thesis draws on long-term ethnographic fieldwork in the London influencer industry (2017-2022) to examine the sociocultural, technological, and commercial contours of labour for social media content creators. Within this context, I ask which creators are able to gain visibility and success, and conversely who is systematically excluded from opportunities, and why? As a digital anthropologist, it is through immersion in the everyday contexts of creators’ lives, in seeing them interact both online and offline and hearing them describe their experiences, that I seek to understand these dynamics. To this end, the project combines several ethnographic methods: online participant observation, offline participant observation, ethnographic semi-structured interviews, and autoethnography in the form of becoming a YouTuber myself. In framing these micro ethnographic insights within macro structures of power and intersecting inequalities, this work seeks to make an original contribution to the literatures on influencer cultures and the platformisation of creative industries and labour. Shifting patterns of employment in the Cultural and Creative Industries away from stable structures, and the emergence of the neoliberal worker-subject: entrepreneurial, flexible, self-directed, always available to work, has been the topic of much academic scrutiny since the 1990’s. This research found that the labour of content creators bears many of these hallmarks, and yet platformisation has given rise to novel formations, concerns, and challenges. This thesis makes the case that the platformised creative worker marks an intensification of the neoliberal worker-subject, with content creators facing heightened conditions of both precarity and inequality. In their search for sustainable careers in an unstable emerging industry, creators must spread their labour thin across multiple platforms and revenue streams, all whilst obsessively scrutinising their popularity metrics, performing taxing relational labour, and navigating opaque algorithmic recommendation systems. Further—and contrary to highly celebratory discourses that position social media creation as more diverse, inclusive and meritocratic than legacy cultural industries—not only are certain creators subject to long standing discriminations, but we can identify new forms of structural inequality emerging. In the influencer industry certain identities, expressions and types of content are propelled into the spotlight whilst others are cast into the shadows of obscurity, mapping onto well-worn inequalities of race, class, gender and sexuality. This is an advertising-driven industry that makes visible the most profitable creators, those who do not disrupt the neoliberal status quo: white, straight, male, middle class, cisgendered, brand-friendly. Overall, this thesis argues that platformisation has significant implications for creative labour and contributes to ongoing debates about the future of work and the impact of technology on contemporary forms of employment
- …