130 research outputs found

    Towards a human-centric data economy

    Get PDF
    Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value), which moves digital companies to hoard and protect their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise the provision of innovative services built upon them. As a result, most of those valuable assets still remain unexploited in corporate silos nowadays. This situation is shaping the so-called data economy around a number of champions, and it is hampering the benefits of a global data exchange on a large scale. Some analysts have estimated the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed to identify relevant cross-industry use cases involving different verticals, and to enable sovereign data exchanges to realise them. Among individuals, the massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection. Apart from spurring recent legislative developments in this direction, this concern has raised some voices warning against the unsustainability of the existing digital economics (few digital champions, potential negative impact on employment, growing inequality), some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma [114, 115, 155]. From a technical perspective, we are far from having the required technology and algorithms that will enable such a human-centric data economy. Even its scope is still blurry, and the question about the value of data, at least, controversial. Research works from different disciplines have studied the data value chain, different approaches to the value of data, how to price data assets, and novel data marketplace designs. At the same time, complex legal and ethical issues with respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets over the Internet. We carry out what is, to the best of our understanding, the most thorough survey of commercial data marketplaces. In this work, we have catalogued and characterised ten different business models, including those of personal information management systems, companies born in the wake of recent data protection regulations and aiming at empowering end users to take control of their data. We have also identified the challenges faced by different types of entities, and what kind of solutions and technology they are using to provide their services. Then we present a first of its kind measurement study that sheds light on the prices of data in the market using a novel methodology. We study how ten commercial data marketplaces categorise and classify data assets, and which categories of data command higher prices. We also develop classifiers for comparing data products across different marketplaces, and we study the characteristics of the most valuable data assets and the features that specific vendors use to set the price of their data products. Based on this information and adding data products offered by other 33 data providers, we develop a regression analysis for revealing features that correlate with prices of data products. As a result, we also implement the basic building blocks of a novel data pricing tool capable of providing a hint of the market price of a new data product using as inputs just its metadata. This tool would provide more transparency on the prices of data products in the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of nascent markets. Next we turn to topics related to data marketplace design. Particularly, we study how buyers can select and purchase suitable data for their tasks without requiring a priori access to such data in order to make a purchase decision, and how marketplaces can distribute payoffs for a data transaction combining data of different sources among the corresponding providers, be they individuals or firms. The difficulty of both problems is further exacerbated in a human-centric data economy where buyers have to choose among data of thousands of individuals, and where marketplaces have to distribute payoffs to thousands of people contributing personal data to a specific transaction. Regarding the selection process, we compare different purchase strategies depending on the level of information available to data buyers at the time of making decisions. A first methodological contribution of our work is proposing a data evaluation stage prior to datasets being selected and purchased by buyers in a marketplace. We show that buyers can significantly improve the performance of the purchasing process just by being provided with a measurement of the performance of their models when trained by the marketplace with individual eligible datasets. We design purchase strategies that exploit such functionality and we call the resulting algorithm Try Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N) - needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that the value of data is different for each individual, and cannot be approximated by its volume. Our results reveal that even more complex approaches based on the “leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value from economics and game theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. However, the Shapley value entails serious computational challenges. Its exact calculation requires repetitively training and evaluating every combination of data sources and hence O(N!) or O(2N) computational time, which is unfeasible for complex models or thousands of individuals. Moreover, our work paves the way to new methods of measuring the value of spatio-temporal data. We identify heuristics such as entropy or similarity to the average that show a significant correlation with the Shapley value and therefore can be used to overcome the significant computational challenges posed by Shapley approximation algorithms in this specific context. We conclude with a number of open issues and propose further research directions that leverage the contributions and findings of this dissertation. These include monitoring data transactions to better measure data markets, and complementing market data with actual transaction prices to build a more accurate data pricing tool. A human-centric data economy would also require that the contributions of thousands of individuals to machine learning tasks are calculated daily. For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff calculation processes in data marketplaces. In that direction, we also point to some alternatives to repetitively training and evaluating a model to select data based on Try Before You Buy and approximate the Shapley value. Finally, we discuss the challenges and potential technologies that help with building a federation of standardised data marketplaces. The data economy will develop fast in the upcoming years, and researchers from different disciplines will work together to unlock the value of data and make the most out of it. Maybe the proposal of getting paid for our data and our contribution to the data economy finally flies, or maybe it is other proposals such as the robot tax that are finally used to balance the power between individuals and tech firms in the digital economy. Still, we hope our work sheds light on the value of data, and contributes to making the price of data more transparent and, eventually, to moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue

    Data Trading Similarity Signature An Extended Data Trading Framework for Human and Non-Human Actors

    Get PDF
    Fair and secure data trading is one of the most prominent challenges of the 21st century. This paper presents a second iteration of an approach to develop a data marketplace concept by checking consumer requirements. The main problem we identified is data quality and the question: Would a dataset fulfill the consumer requirements? Starting from an approach that uses a binary response set to answer the question of whether requirements are met, we concluded that a description of consumer requirements needs to be quantitatively comparable. The novel approach presented here identifies similarities between datasets and consumer requirements. It forms a unique, fingerprint-like similarity signature for each dataset, which can be interpreted by both human and non-human actors. The approach is deducted and designed by using the Design Science Research Methodology and discussed critically in the end

    Modelling of Viral Disease Risk

    Get PDF
    Covid-19 has had a significant impact on daily life since the initial outbreak of the global pandemic in late 2019. Countries have been affected to varying degrees, depending on government actions and country characteristics such as infrastructure and demographics. Using Norway and Germany as a case study, this thesis aims to determine which factors influence the risk of infection in each country, using Bayesian modelling and a non-Bayesian machine learning approach. Specifically, the relationship between infection rates and demographic and infrastructural characteristics in a municipality at a fixed point in time is investigated and the effectiveness of a Bayesian model in this context is compared with a machine learning algorithm. In addition, temporal modelling is used to assess the usefulness of government interventions, the impact of changes in mobility behaviour and the prevalence of different strains of Covid-19 in relation to infection numbers. The results show that a spatial model is more useful than a machine learning model in this context. For Germany, it is found that the logarithmic trade tax in a municipality, the share of the vote for the right-wing AfD party and the population density have a positive influence on the infection figures. For Norway, the number of immigrants in a municipality, the number of unemployed immigrants in a municipality and population density are found to have a positive association with infection rates, while the proportion of women in a municipality is negatively associated with infection rates. The temporal models identify higher workplace mobility as a factor significantly influencing the risk of infection in Germany and Norway

    An innovative Deep Learning Based Approach for Accurate Agricultural Crop Price Prediction

    Full text link
    Accurate prediction of agricultural crop prices is a crucial input for decision-making by various stakeholders in agriculture: farmers, consumers, retailers, wholesalers, and the Government. These decisions have significant implications including, most importantly, the economic well-being of the farmers. In this paper, our objective is to accurately predict crop prices using historical price information, climate conditions, soil type, location, and other key determinants of crop prices. This is a technically challenging problem, which has been attempted before. In this paper, we propose an innovative deep learning based approach to achieve increased accuracy in price prediction. The proposed approach uses graph neural networks (GNNs) in conjunction with a standard convolutional neural network (CNN) model to exploit geospatial dependencies in prices. Our approach works well with noisy legacy data and produces a performance that is at least 20% better than the results available in the literature. We are able to predict prices up to 30 days ahead. We choose two vegetables, potato (stable price behavior) and tomato (volatile price behavior) and work with noisy public data available from Indian agricultural markets.Comment: 9 pages, 3 figures, 3 table

    Decentralized brokered enabled ecosystem for data marketplace in smart cities towards a data sharing economy

    Get PDF
    Presently data are indispensably important as cities consider data as a commodity which can be traded to earn revenues. In urban environment, data generated from internet of things devices, smart meters, smart sensors, etc. can provide a new source of income for citizens and enterprises who are data owners. These data can be traded as digital assets. To support such trading digital data marketplaces have emerged. Data marketplaces promote a data sharing economy which is crucial for provision of available data useful for cities which aims to develop data driven services. But currently existing data marketplaces are mostly inadequate due to several issues such as security, efficiency, and adherence to privacy regulations. Likewise, there is no consolidated understanding of how to achieve trust and fairness among data owners and data sellers when trading data. Therefore, this study presents the design of an ecosystem which comprises of a distributed ledger technology data marketplace enabled by message queueing telemetry transport (MQTT) to facilitate trust and fairness among data owners and data sellers. The designed ecosystem for data marketplaces is powered by IOTA technology and MQTT broker to support the trading of sdata sources by automating trade agreements, negotiations and payment settlement between data producers/sellers and data consumers/buyers. Overall, findings from this article discuss the issues associated in developing a decentralized data marketplace for smart cities suggesting recommendations to enhance the deployment of decentralized and distributed data marketplaces.publishedVersio

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Forecasting different dimensions of liquidity in the intraday electricity markets: A review

    Get PDF
    Energy consumption increases daily across the world. Electricity is the best means that humankind has found for transmitting energy. This can be said regardless of its origin. Energy transmission is crucial for ensuring the efficient and reliable distribution of electricity from power generation sources to end-users. It forms the backbone of modern societies, supporting various sectors such as residential, commercial, and industrial activities. Energy transmission is a fundamental enabler of well-functioning and competitive electricity markets, supporting reliable supply, market integration, price stability, and the integration of renewable energy sources. Electric energy sourced from various regions worldwide is routinely traded within these electricity markets on a daily basis. This paper presents a review of forecasting techniques for intraday electricity markets prices, volumes, and price volatility. Electricity markets operate in a sequential manner, encompassing distinct components such as the day-ahead, intraday, and balancing markets. The intraday market is closely linked to the timely delivery of electricity, as it facilitates the trading and adjustment of electricity supply and demand on the same day of delivery to ensure a balanced and reliable power grid. Accurate forecasts are essential for traders to maximize profits within intraday markets, making forecasting a critical concern in electricity market management. In this review, statistical and econometric approaches, involving various machine learning and ensemble/hybrid techniques, are presented. Overall, the literature highlights the superiority of machine learning and ensemble/hybrid models over statistical models
    corecore