35,767 research outputs found
Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers
Machine Learning (ML) algorithms are used to train computers to perform a
variety of complex tasks and improve with experience. Computers learn how to
recognize patterns, make unintended decisions, or react to a dynamic
environment. Certain trained machines may be more effective than others because
they are based on more suitable ML algorithms or because they were trained
through superior training sets. Although ML algorithms are known and publicly
released, training sets may not be reasonably ascertainable and, indeed, may be
guarded as trade secrets. While much research has been performed about the
privacy of the elements of training sets, in this paper we focus our attention
on ML classifiers and on the statistical information that can be unconsciously
or maliciously revealed from them. We show that it is possible to infer
unexpected but useful information from ML classifiers. In particular, we build
a novel meta-classifier and train it to hack other classifiers, obtaining
meaningful information about their training sets. This kind of information
leakage can be exploited, for example, by a vendor to build more effective
classifiers or to simply acquire trade secrets from a competitor's apparatus,
potentially violating its intellectual property rights
Preserving Social Media: the Problem of Access
As the applications and services made possible through Web 2.0 continue to proliferate and influence the way individuals exchange information, the landscape of social science research, as well as research in the humanities and the arts, has the potential to change dramatically and to be enriched by a wealth of new, user-generated data. In response to this phenomenon, the UK Data Service have commissioned the Digital Preservation Coalition to undertake a 12-month study into the preservation of social media as part of the ‘Big Data Network’ programme funded by the Economic and Social Research Council (ESRC). The larger study focuses on the potential uses and accompanying challenges of data generated by social networking applications.
This paper, ‘Preserving Social Media: the Problem of Access’, comprises an excerpt of that longer study, allowing the authors a space to explore in closer detail the issue of making social media archives accessible to researchers and students now and in the future. To do this, the paper addresses use cases that demonstrate the potential value of social media to academic social science. Furthermore, it examines how researchers and collecting institutions acquire and preserve social media data within a context of curatorial and legislative restrictions that may prove an even greater obstacle to access than any technical restrictions. Based on analysis of these obstacles, it will examine existing methods of curating and preserving social media archives, and second, make some recommendations for how collecting institutions might approach the long-term preservation of social media in a way that protects the individuals represented in the data and complies with the conditions of third party platforms. With the understanding that web-based communication technologies will continue to evolve, this paper will focus on the overarching properties of social media, analysing and comparing current methods of curation and preservation that provide sustainable solutions
Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development
The use of synthetic data provides an opportunity to accelerate online safety
research and development efforts while showing potential for bias mitigation,
facilitating data storage and sharing, preserving privacy and reducing exposure
to harmful content. However, the responsible use of synthetic data requires
caution regarding anticipated risks and challenges. This short report explores
the potential applications of synthetic data to the domain of online safety,
and addresses the ethical challenges that effective use of the technology may
present
- …