16 research outputs found
Towards a human-centric data economy
Spurred by widespread adoption of artificial intelligence and machine learning, âdataâ is becoming
a key production factor, comparable in importance to capital, land, or labour in an increasingly
digital economy. In spite of an ever-growing demand for third-party data in the B2B
market, firms are generally reluctant to share their information. This is due to the unique characteristics
of âdataâ as an economic good (a freely replicable, non-depletable asset holding a highly
combinatorial and context-specific value), which moves digital companies to hoard and protect
their âvaluableâ data assets, and to integrate across the whole value chain seeking to monopolise
the provision of innovative services built upon them. As a result, most of those valuable assets
still remain unexploited in corporate silos nowadays.
This situation is shaping the so-called data economy around a number of champions, and it is
hampering the benefits of a global data exchange on a large scale. Some analysts have estimated
the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking
the value of data has become a central policy of the European Union, which also estimated
the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of
the European Data Strategy, the European Commission is also steering relevant initiatives aimed
to identify relevant cross-industry use cases involving different verticals, and to enable sovereign
data exchanges to realise them.
Among individuals, the massive collection and exploitation of personal data by digital firms
in exchange of services, often with little or no consent, has raised a general concern about privacy
and data protection. Apart from spurring recent legislative developments in this direction,
this concern has raised some voices warning against the unsustainability of the existing digital
economics (few digital champions, potential negative impact on employment, growing inequality),
some of which propose that people are paid for their data in a sort of worldwide data labour
market as a potential solution to this dilemma [114, 115, 155].
From a technical perspective, we are far from having the required technology and algorithms
that will enable such a human-centric data economy. Even its scope is still blurry, and the question
about the value of data, at least, controversial. Research works from different disciplines have
studied the data value chain, different approaches to the value of data, how to price data assets,
and novel data marketplace designs. At the same time, complex legal and ethical issues with
respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets
over the Internet. We carry out what is, to the best of our understanding, the most thorough survey
of commercial data marketplaces. In this work, we have catalogued and characterised ten different
business models, including those of personal information management systems, companies born
in the wake of recent data protection regulations and aiming at empowering end users to take
control of their data. We have also identified the challenges faced by different types of entities,
and what kind of solutions and technology they are using to provide their services.
Then we present a first of its kind measurement study that sheds light on the prices of data
in the market using a novel methodology. We study how ten commercial data marketplaces categorise
and classify data assets, and which categories of data command higher prices. We also
develop classifiers for comparing data products across different marketplaces, and we study the
characteristics of the most valuable data assets and the features that specific vendors use to set
the price of their data products. Based on this information and adding data products offered by
other 33 data providers, we develop a regression analysis for revealing features that correlate with
prices of data products. As a result, we also implement the basic building blocks of a novel data
pricing tool capable of providing a hint of the market price of a new data product using as inputs
just its metadata. This tool would provide more transparency on the prices of data products in
the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of
nascent markets.
Next we turn to topics related to data marketplace design. Particularly, we study how buyers
can select and purchase suitable data for their tasks without requiring a priori access to such
data in order to make a purchase decision, and how marketplaces can distribute payoffs for a
data transaction combining data of different sources among the corresponding providers, be they
individuals or firms. The difficulty of both problems is further exacerbated in a human-centric
data economy where buyers have to choose among data of thousands of individuals, and where
marketplaces have to distribute payoffs to thousands of people contributing personal data to a
specific transaction.
Regarding the selection process, we compare different purchase strategies depending on the
level of information available to data buyers at the time of making decisions. A first methodological
contribution of our work is proposing a data evaluation stage prior to datasets being selected
and purchased by buyers in a marketplace. We show that buyers can significantly improve the
performance of the purchasing process just by being provided with a measurement of the performance
of their models when trained by the marketplace with individual eligible datasets. We
design purchase strategies that exploit such functionality and we call the resulting algorithm Try
Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to
near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N)
- needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value
of spatio-temporal datasets combined in marketplaces for predicting transportation demand and
travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and
New York we show that the value of data is different for each individual, and cannot be approximated
by its volume. Our results reveal that even more complex approaches based on the
âleave-one-outâ value, are inaccurate. Instead, more complex and acknowledged notions of value
from economics and game theory, such as the Shapley value, need to be employed if one wishes
to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms.
However, the Shapley value entails serious computational challenges. Its exact calculation
requires repetitively training and evaluating every combination of data sources and hence O(N!)
or O(2N) computational time, which is unfeasible for complex models or thousands of individuals.
Moreover, our work paves the way to new methods of measuring the value of spatio-temporal
data. We identify heuristics such as entropy or similarity to the average that show a significant
correlation with the Shapley value and therefore can be used to overcome the significant computational
challenges posed by Shapley approximation algorithms in this specific context.
We conclude with a number of open issues and propose further research directions that leverage
the contributions and findings of this dissertation. These include monitoring data transactions
to better measure data markets, and complementing market data with actual transaction prices
to build a more accurate data pricing tool. A human-centric data economy would also require
that the contributions of thousands of individuals to machine learning tasks are calculated daily.
For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff
calculation processes in data marketplaces. In that direction, we also point to some alternatives
to repetitively training and evaluating a model to select data based on Try Before You Buy and
approximate the Shapley value. Finally, we discuss the challenges and potential technologies that
help with building a federation of standardised data marketplaces.
The data economy will develop fast in the upcoming years, and researchers from different
disciplines will work together to unlock the value of data and make the most out of it. Maybe
the proposal of getting paid for our data and our contribution to the data economy finally flies,
or maybe it is other proposals such as the robot tax that are finally used to balance the power
between individuals and tech firms in the digital economy. Still, we hope our work sheds light on
the value of data, and contributes to making the price of data more transparent and, eventually, to
moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en IngenierĂa TelemĂĄtica por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ăngel Cuevas RumĂn.- Vocal: Pablo RodrĂguez RodrĂgue
Digital Forensics and Born-Digital Content in Cultural Heritage Collections
Digital Forensics and Born-Digital Content in Cultural Heritage Collections examines
digital forensics and its relevance for contemporary research. The applicability
of digital forensics to archivists, curators, and others working within
our cultural heritage is not necessarily intuitive. When the shared interests of
digital forensics and responsibilities associated with securing and maintaining
our cultural legacy are identifiedâpreservation, extraction, documentation,
and interpretation, as this report detailsâthe correspondence between
these fields of study becomes logical and compelling.Council on Library and Information Resource
Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion
382 p.Libro ElectrĂłnicoEach of us has been in the computing field for more than 40 years. The book is the product of a lifetime of observing and participating in the changes it has brought. Each of us has been both a teacher and a learner in the field.
This book emerged from a general education course we have taught at Harvard, but it is not a textbook. We wrote this book to share what wisdom we have with as many people as we can reach. We try to paint a big picture,
with dozens of illuminating anecdotes as the brushstrokes. We aim to entertain you at the same time as we provoke your thinking.Preface
Chapter 1 Digital Explosion
Why Is It Happening, and What Is at Stake?
The Explosion of Bits, and Everything Else
The Koans of Bits
Good and Ill, Promise and Peril
Chapter 2 Naked in the Sunlight
Privacy Lost, Privacy Abandoned
1984 Is Here, and We Like It
Footprints and Fingerprints
Why We Lost Our Privacy, or Gave It Away
Little Brother Is Watching
Big Brother, Abroad and in the U.S.
Technology Change and Lifestyle Change
Beyond Privacy
Chapter 3 Ghosts in the Machine
Secrets and Surprises of Electronic Documents
What You See Is Not What the Computer Knows
Representation, Reality, and Illusion
Hiding Information in Images
The Scary Secrets of Old Disks
Chapter 4 Needles in the Haystack
Google and Other Brokers in the Bits Bazaar
Found After Seventy Years
The Library and the Bazaar
The Fall of Hierarchy
It Matters How It Works
Who Pays, and for What?
Search Is Power
You Searched for WHAT? Tracking Searches
Regulating or Replacing the Brokers
Chapter 5 Secret Bits
How Codes Became Unbreakable
Encryption in the Hands of Terrorists, and Everyone Else
Historical Cryptography
Lessons for the Internet Age
Secrecy Changes Forever
Cryptography for Everyone
Cryptography Unsettled
Chapter 6 Balance Toppled
Who Owns the Bits?
Automated CrimesâAutomated Justice
NET Act Makes Sharing a Crime
The Peer-to-Peer Upheaval
Sharing Goes Decentralized
Authorized Use Only
Forbidden Technology
Copyright Koyaanisqatsi: Life Out of Balance
The Limits of Property
Chapter 7 You Canât Say That on the Internet
Guarding the Frontiers of Digital Expression
Do You Know Where Your Child Is on the Web Tonight?
Metaphors for Something Unlike Anything Else
Publisher or Distributor?
Neither Liberty nor Security
The Nastiest Place on Earth
The Most Participatory Form of Mass Speech
Protecting Good Samaritansâand a Few Bad Ones
Laws of Unintended Consequences
Can the Internet Be Like a Magazine Store?
Let Your Fingers Do the Stalking
Like an Annoying Telephone Call?
Digital Protection, Digital Censorshipâand Self-Censorship
Chapter 8 Bits in the Air
Old Metaphors, New Technologies, and Free Speech
Censoring the President
How Broadcasting Became Regulated
The Path to Spectrum Deregulation
What Does the Future Hold for Radio?
Conclusion After the Explosion
Bits Lighting Up the World
A Few Bits in Conclusion
Appendix The Internet as System and Spirit
The Internet as a Communication System
The Internet Spirit
Endnotes
Inde
An examination of the Asus WL-HDD 2.5 as a nepenthes malware collector
The Linksys WRT54g has been used as a host for network forensics tools for instance Snort for a long period of time. Whilst large corporations are already utilising network forensic tools, this paper demonstrates that it is quite feasible for a non-security specialist to track and capture malicious network traffic. This paper introduces the Asus Wireless Hard disk as a replacement for the popular Linksys WRT54g. Firstly, the Linksys router will be introduced detailing some of the research that was undertaken on the device over the years amongst the security community. It then briefly discusses malicious software and the impact this may have for a home user. The paper then outlines the trivial steps in setting up Nepenthes 0.1.7 (a malware collector) for the Asus WL-HDD 2.5 according to the Nepenthes and tests the feasibility of running the malware collector on the selected device. The paper then concludes on discussing the limitations of the device when attempting to execute Nepenthes
An Empirical Analysis to Control Product Counterfeiting in the Automotive Industry\u27s Supply Chains in Pakistan
The counterfeits pose significant health and safety threat to consumers. The quality image of firms is vulnerable to the damage caused by the expanding flow of counterfeit products in todayâs global supply chains. The counterfeiting markets are swelling due to globalization and customersâ willingness to buy counterfeits, fueling illicit activities to explode further. Buyers look for the original parts are deceived by the false (deceptive) signalsâ communication. The counterfeiting market has become a multi-billion industry but lacks detailed insights into the supply side of counterfeiting (deceptive side). The study aims to investigate and assess the relationship between the anti-counterfeiting strategies and improvement in the firmâs supply performance within the internal and external supply chain quality management context in the auto-parts industryâs supply chains in Pakistan
Secure use of the Internet by business
This study focuses on electronic data security issues and their applicability to SMEs.Prior to this project, no frame of reference had been identified or defined for:⢠The electronic data and Internet security needs of SMEs⢠The critical success factors for implementing and using a secureelectronic data and authentication solutionUsing a source of both primary and secondary research data, firstly, a trusted thirdparty infrastructure based on public key encryption and digital certificate technologywas designed and developed. This provided trust, integrity, confidentiality and nonrepudiation,all of which are essential components for secure static storage or Internettransmission of electronic data.The second stage was the implementation of this infrastructure in SMEs. The casestudies revealed a reluctance to implement and use the designed infrastructure bothduring and after the pilot implementation period. Further primary research wasundertaken to identify and explain the reluctance of SMEs to participate in pilotingthis Internet based technology.As a result of this research project, there are four major contributions to knowledge.These are,⢠A time series survey of SME Internet usage and attitudes in the GreaterManchester region. The initial stage of the research found that at the start ofthis project (1996/7), only one in three SMEs were using the Internet and thestage of usage was extremely basic (chapter 5.2.1). Towards the end of theproject (1998/9), Internet usage by SMEs had doubled and had become moresophisticated (chapter 7.2). Awareness of security needs had also risen, butwas still not a part of the overall network infrastructure of the majority ofsmall and medium sized organisations.⢠A framework for the analysis of the potential success or failure of theimplementation of a security solution in particular and new technology projectmore generally (chapter 9).⢠A framework that can act as broad guide for SMEs in the development of theirsecurity network infrastructures.⢠The use of organic methodology (chapter 3.3) to deal with the fast movingand changing environment of IT related research projects.A "Best Practice" guide has been developed based on these two models to help SMEsin the implementation of a data security solution in their own organisations.As well as raised awareness of the issues, the success factors also include reengineeringexisting business processes, changing traditional business thinking andcreating a level of commitment to the implementation of technology that will enableSMEs to thrive in the new markets of the 21st century