35,767 research outputs found

    Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

    Full text link
    Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

    Preserving Social Media: the Problem of Access

    Get PDF
    As the applications and services made possible through Web 2.0 continue to proliferate and influence the way individuals exchange information, the landscape of social science research, as well as research in the humanities and the arts, has the potential to change dramatically and to be enriched by a wealth of new, user-generated data. In response to this phenomenon, the UK Data Service have commissioned the Digital Preservation Coalition to undertake a 12-month study into the preservation of social media as part of the ‘Big Data Network’ programme funded by the Economic and Social Research Council (ESRC). The larger study focuses on the potential uses and accompanying challenges of data generated by social networking applications. This paper, ‘Preserving Social Media: the Problem of Access’, comprises an excerpt of that longer study, allowing the authors a space to explore in closer detail the issue of making social media archives accessible to researchers and students now and in the future. To do this, the paper addresses use cases that demonstrate the potential value of social media to academic social science. Furthermore, it examines how researchers and collecting institutions acquire and preserve social media data within a context of curatorial and legislative restrictions that may prove an even greater obstacle to access than any technical restrictions. Based on analysis of these obstacles, it will examine existing methods of curating and preserving social media archives, and second, make some recommendations for how collecting institutions might approach the long-term preservation of social media in a way that protects the individuals represented in the data and complies with the conditions of third party platforms. With the understanding that web-based communication technologies will continue to evolve, this paper will focus on the overarching properties of social media, analysing and comparing current methods of curation and preservation that provide sustainable solutions

    Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development

    Full text link
    The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the potential applications of synthetic data to the domain of online safety, and addresses the ethical challenges that effective use of the technology may present
    • …
    corecore