237 research outputs found

    Towards a human-centric data economy

    Get PDF
    Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value), which moves digital companies to hoard and protect their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise the provision of innovative services built upon them. As a result, most of those valuable assets still remain unexploited in corporate silos nowadays. This situation is shaping the so-called data economy around a number of champions, and it is hampering the benefits of a global data exchange on a large scale. Some analysts have estimated the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed to identify relevant cross-industry use cases involving different verticals, and to enable sovereign data exchanges to realise them. Among individuals, the massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection. Apart from spurring recent legislative developments in this direction, this concern has raised some voices warning against the unsustainability of the existing digital economics (few digital champions, potential negative impact on employment, growing inequality), some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma [114, 115, 155]. From a technical perspective, we are far from having the required technology and algorithms that will enable such a human-centric data economy. Even its scope is still blurry, and the question about the value of data, at least, controversial. Research works from different disciplines have studied the data value chain, different approaches to the value of data, how to price data assets, and novel data marketplace designs. At the same time, complex legal and ethical issues with respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets over the Internet. We carry out what is, to the best of our understanding, the most thorough survey of commercial data marketplaces. In this work, we have catalogued and characterised ten different business models, including those of personal information management systems, companies born in the wake of recent data protection regulations and aiming at empowering end users to take control of their data. We have also identified the challenges faced by different types of entities, and what kind of solutions and technology they are using to provide their services. Then we present a first of its kind measurement study that sheds light on the prices of data in the market using a novel methodology. We study how ten commercial data marketplaces categorise and classify data assets, and which categories of data command higher prices. We also develop classifiers for comparing data products across different marketplaces, and we study the characteristics of the most valuable data assets and the features that specific vendors use to set the price of their data products. Based on this information and adding data products offered by other 33 data providers, we develop a regression analysis for revealing features that correlate with prices of data products. As a result, we also implement the basic building blocks of a novel data pricing tool capable of providing a hint of the market price of a new data product using as inputs just its metadata. This tool would provide more transparency on the prices of data products in the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of nascent markets. Next we turn to topics related to data marketplace design. Particularly, we study how buyers can select and purchase suitable data for their tasks without requiring a priori access to such data in order to make a purchase decision, and how marketplaces can distribute payoffs for a data transaction combining data of different sources among the corresponding providers, be they individuals or firms. The difficulty of both problems is further exacerbated in a human-centric data economy where buyers have to choose among data of thousands of individuals, and where marketplaces have to distribute payoffs to thousands of people contributing personal data to a specific transaction. Regarding the selection process, we compare different purchase strategies depending on the level of information available to data buyers at the time of making decisions. A first methodological contribution of our work is proposing a data evaluation stage prior to datasets being selected and purchased by buyers in a marketplace. We show that buyers can significantly improve the performance of the purchasing process just by being provided with a measurement of the performance of their models when trained by the marketplace with individual eligible datasets. We design purchase strategies that exploit such functionality and we call the resulting algorithm Try Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N) - needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that the value of data is different for each individual, and cannot be approximated by its volume. Our results reveal that even more complex approaches based on the “leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value from economics and game theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. However, the Shapley value entails serious computational challenges. Its exact calculation requires repetitively training and evaluating every combination of data sources and hence O(N!) or O(2N) computational time, which is unfeasible for complex models or thousands of individuals. Moreover, our work paves the way to new methods of measuring the value of spatio-temporal data. We identify heuristics such as entropy or similarity to the average that show a significant correlation with the Shapley value and therefore can be used to overcome the significant computational challenges posed by Shapley approximation algorithms in this specific context. We conclude with a number of open issues and propose further research directions that leverage the contributions and findings of this dissertation. These include monitoring data transactions to better measure data markets, and complementing market data with actual transaction prices to build a more accurate data pricing tool. A human-centric data economy would also require that the contributions of thousands of individuals to machine learning tasks are calculated daily. For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff calculation processes in data marketplaces. In that direction, we also point to some alternatives to repetitively training and evaluating a model to select data based on Try Before You Buy and approximate the Shapley value. Finally, we discuss the challenges and potential technologies that help with building a federation of standardised data marketplaces. The data economy will develop fast in the upcoming years, and researchers from different disciplines will work together to unlock the value of data and make the most out of it. Maybe the proposal of getting paid for our data and our contribution to the data economy finally flies, or maybe it is other proposals such as the robot tax that are finally used to balance the power between individuals and tech firms in the digital economy. Still, we hope our work sheds light on the value of data, and contributes to making the price of data more transparent and, eventually, to moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue

    Verfassungsblatt: 2023/2

    Get PDF

    Challenges and perspectives of hate speech research

    Get PDF
    This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research

    A Low-Energy Security Solution for IoT-Based Smart Farms

    Get PDF
    This work proposes a novel configuration of the Transport Layer Security protocol (TLS), suitable for low energy Internet of Things (IoT), applications. The motivation behind the redesign of TLS is energy consumption minimisation and sustainable farming, as exemplified by an application domain of aquaponic smart farms. The work therefore considers decentralisation of a formerly centralised security model, with a focus on reducing energy consumption for battery powered devices. The research presents a four-part investigation into the security solution, composed of a risk assessment, energy analysis of authentication and data exchange functions, and finally the design and verification of a novel consensus authorisation mechanism. The first investigation considered traditional risk-driven threat assessment, but to include energy reduction, working towards device longevity within a content-oriented framework. Since the aquaponics environments include limited but specific data exchanges, a content-oriented approach produced valuable insights into security and privacy requirements that would later be tested by implementing a variety of mechanisms available on the ESP32. The second and third investigations featured the energy analysis of authentication and data exchange functions respectively, where the results of the risk assessment were implemented to compare the re-configurations of TLS mechanisms and domain content. Results concluded that selective confidentiality and persistent secure sessions between paired devices enabled considerable improvements for energy consumptions, and were a good reflection of the possibilities suggested by the risk assessment. The fourth and final investigation proposed a granular authorisation design to increase the safety of access control that would otherwise be binary in TLS. The motivation was for damage mitigation from inside attacks or network faults. The approach involved an automated, hierarchy-based, decentralised network topology to reduce data duplication whilst still providing robustness beyond the vulnerability of central governance. Formal verification using model-checking indicated a safe design model, using four automated back-ends. The research concludes that lower energy IoT solutions for the smart farm application domain are possible

    Machine learning as a service for high energy physics (MLaaS4HEP): a service for ML-based data analyses

    Get PDF
    With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services

    Deep neural networks in the cloud: Review, applications, challenges and research directions

    Get PDF
    Deep neural networks (DNNs) are currently being deployed as machine learning technology in a wide range of important real-world applications. DNNs consist of a huge number of parameters that require millions of floating-point operations (FLOPs) to be executed both in learning and prediction modes. A more effective method is to implement DNNs in a cloud computing system equipped with centralized servers and data storage sub-systems with high-speed and high-performance computing capabilities. This paper presents an up-to-date survey on current state-of-the-art deployed DNNs for cloud computing. Various DNN complexities associated with different architectures are presented and discussed alongside the necessities of using cloud computing. We also present an extensive overview of different cloud computing platforms for the deployment of DNNs and discuss them in detail. Moreover, DNN applications already deployed in cloud computing systems are reviewed to demonstrate the advantages of using cloud computing for DNNs. The paper emphasizes the challenges of deploying DNNs in cloud computing systems and provides guidance on enhancing current and new deployments.The EGIA project (KK-2022/00119The Consolidated Research Group MATHMODE (IT1456-22

    Platform://Democracy: Perspectives on Platform Power, Public Values and the Potential of Social Media Councils

    Get PDF
    Social media platforms have created private communication orders which they rule through terms of service and algorithmic moderation practices. As their impact on public communication and human rights has grown, different models to increase the role of public interests and values in the design of their rules and their practices has, too. But who should speak for both the users and the public at large? Bodies of experts and/or selected user representatives, usually called Platform Councils of Social Media Councils (SMCs) have gained attention as a potential solution. Examples of Social Media Councils include Meta’s Oversight Board but most platforms companies have so far shied away from installing one. This survey of approaches to increasing the quality of platform decision-making and content governance involving more than 35 researchers from four continents brough to together in regional "research clinics" makes clear that trade-offs have to be carefully balanced. The larger the council, the less effective is its decision-making, even if its legitimacy might be increased. While there is no one-size-fits-all approach, the projects demonstrates that procedures matter, that multistakeholderism is a key concept for effective Social Media Councils, and that incorporating technical expertise and promoting inclusivity are important considerations in their design. As the Digital Services Act becomes effective in 2024, a Social Media Council for Germany’s Digital Services Coordinator (overseeing platforms) can serve as test case and should be closely monitored. Beyond national councils, there is strong case for a commission focused on ensuring human rights online can be modeled after the Venice Commission and can provide expertise and guidelines on policy questions related to platform governance, particularly those that affect public interests like special treatment for public figures, for mass media and algorithmic diversity. The commission can be staffed by a diverse set of experts from selected organizations and institutions established in the platform governance field

    Sacred Assets: Design for the Food System

    Get PDF
    Food is intimate to every individual and can communicate across cultures. The problem of food access is not technically a problem of lack of food, but largely an issue of income inequality and individual mobility. In Syracuse, every zip code has at least one sector that is considered a food desert. However, the region of Central New York that Syracuse sits is a bastion of food producers, including small farmers which are a keystone for a sustainable future. This project investigates how to expand food access as well as support small farmers by using strategic technologies. There 50 total participants in this study from across three methods: survey (n = 36), semi-structured interviews (n = 8), and user testing (n = 6). This study includes the development of a mobile application named Farm Loop, which empowers farmers to sell directly to consumers via an online retail platform or to the emergency food system for a lower price. This application has a system design that uses a business model that creates a community fund that expands food access and pays farmers. This supports both the small farmer with additional revenue streams while expanding food access

    The platformised creative worker: an ethnographic study of precarity and inequality in the London influencer industry (2017-2022)

    Get PDF
    Building on the recent proliferation of scholarly interest in the impacts of platformisation on the Cultural and Creative Industries, this thesis draws on long-term ethnographic fieldwork in the London influencer industry (2017-2022) to examine the sociocultural, technological, and commercial contours of labour for social media content creators. Within this context, I ask which creators are able to gain visibility and success, and conversely who is systematically excluded from opportunities, and why? As a digital anthropologist, it is through immersion in the everyday contexts of creators’ lives, in seeing them interact both online and offline and hearing them describe their experiences, that I seek to understand these dynamics. To this end, the project combines several ethnographic methods: online participant observation, offline participant observation, ethnographic semi-structured interviews, and autoethnography in the form of becoming a YouTuber myself. In framing these micro ethnographic insights within macro structures of power and intersecting inequalities, this work seeks to make an original contribution to the literatures on influencer cultures and the platformisation of creative industries and labour. Shifting patterns of employment in the Cultural and Creative Industries away from stable structures, and the emergence of the neoliberal worker-subject: entrepreneurial, flexible, self-directed, always available to work, has been the topic of much academic scrutiny since the 1990’s. This research found that the labour of content creators bears many of these hallmarks, and yet platformisation has given rise to novel formations, concerns, and challenges. This thesis makes the case that the platformised creative worker marks an intensification of the neoliberal worker-subject, with content creators facing heightened conditions of both precarity and inequality. In their search for sustainable careers in an unstable emerging industry, creators must spread their labour thin across multiple platforms and revenue streams, all whilst obsessively scrutinising their popularity metrics, performing taxing relational labour, and navigating opaque algorithmic recommendation systems. Further—and contrary to highly celebratory discourses that position social media creation as more diverse, inclusive and meritocratic than legacy cultural industries—not only are certain creators subject to long standing discriminations, but we can identify new forms of structural inequality emerging. In the influencer industry certain identities, expressions and types of content are propelled into the spotlight whilst others are cast into the shadows of obscurity, mapping onto well-worn inequalities of race, class, gender and sexuality. This is an advertising-driven industry that makes visible the most profitable creators, those who do not disrupt the neoliberal status quo: white, straight, male, middle class, cisgendered, brand-friendly. Overall, this thesis argues that platformisation has significant implications for creative labour and contributes to ongoing debates about the future of work and the impact of technology on contemporary forms of employment
    corecore