1,191 research outputs found

    Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers

    Full text link
    Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or because they were trained through superior training sets. Although ML algorithms are known and publicly released, training sets may not be reasonably ascertainable and, indeed, may be guarded as trade secrets. While much research has been performed about the privacy of the elements of training sets, in this paper we focus our attention on ML classifiers and on the statistical information that can be unconsciously or maliciously revealed from them. We show that it is possible to infer unexpected but useful information from ML classifiers. In particular, we build a novel meta-classifier and train it to hack other classifiers, obtaining meaningful information about their training sets. This kind of information leakage can be exploited, for example, by a vendor to build more effective classifiers or to simply acquire trade secrets from a competitor's apparatus, potentially violating its intellectual property rights

    PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces

    Full text link
    Abstract—Cameras are now commonplace in our social and computing landscapes and embedded into consumer devices like smartphones and tablets. A new generation of wearable devices (such as Google Glass) will soon make ‘first-person ’ cameras nearly ubiquitous, capturing vast amounts of imagery without deliberate human action. ‘Lifelogging ’ devices and applications will record and share images from people’s daily lives with their social networks. These devices that automatically capture images in the background raise serious privacy concerns, since they are likely to capture deeply private information. Users of these devices need ways to identify and prevent the sharing of sensitive images. As a first step, we introduce PlaceAvoider, a technique for owners of first-person cameras to ‘blacklist ’ sensitive spaces (like bathrooms and bedrooms). PlaceAvoider recognizes images captured in these spaces and flags them for review before the images are made available to applications. PlaceAvoider performs novel image analysis using both fine-grained image features (like specific objects) and coarse-grained, scene-level features (like colors and textures) to classify where a photo was taken. PlaceAvoider combines these features in a probabilistic framework that jointly labels streams of images in order to improve accuracy. We test the technique on five realistic first-person image datasets and show it is robust to blurriness, motion, and occlusion. I

    Privacy-preserving comparison of variable-length data with application to biometric template protection

    Full text link
    The establishment of cloud computing and big data in a wide variety of daily applications has raised some privacy concerns due to the sensitive nature of some of the processed data. This has promoted the need to develop data protection techniques, where the storage and all operations are carried out without disclosing any information. Following this trend, this paper presents a new approach to efficiently compare variable-length data in the encrypted domain using homomorphic encryption where only encrypted data is stored or exchanged. The new variable-length-based algorithm is fused with existing fixed-length techniques in order to obtain increased comparison accuracy. To assess the soundness of the proposed approach, we evaluate its performance on a particular application: a multi-algorithm biometric template protection system based on dynamic signatures that complies with the requirements described in the ISO/IEC 24745 standard on biometric information protection. Experiments have been carried out on a publicly available database and a free implementation of the Paillier cryptosystem to ensure reproducibility and comparability to other schemes.This work was supported in part by the German Federal Ministry of Education and Research (BMBF); in part by the Hessen State Ministry for Higher Education, Research, and the Arts (HMWK) within the Center for Research in Security and Privacy (CRISP); in part by the Spanish Ministerio de Economia y Competitividad / Fondo Europeo de Desarrollo Regional through the CogniMetrics Project under Grant TEC2015-70627-R; and in part by Cecaban

    Abstract Hidden Markov Models: a monadic account of quantitative information flow

    Full text link
    Hidden Markov Models, HMM's, are mathematical models of Markov processes with state that is hidden, but from which information can leak. They are typically represented as 3-way joint-probability distributions. We use HMM's as denotations of probabilistic hidden-state sequential programs: for that, we recast them as `abstract' HMM's, computations in the Giry monad D\mathbb{D}, and we equip them with a partial order of increasing security. However to encode the monadic type with hiding over some state X\mathcal{X} we use DX→D2X\mathbb{D}\mathcal{X}\to \mathbb{D}^2\mathcal{X} rather than the conventional X→DX\mathcal{X}{\to}\mathbb{D}\mathcal{X} that suffices for Markov models whose state is not hidden. We illustrate the DX→D2X\mathbb{D}\mathcal{X}\to \mathbb{D}^2\mathcal{X} construction with a small Haskell prototype. We then present uncertainty measures as a generalisation of the extant diversity of probabilistic entropies, with characteristic analytic properties for them, and show how the new entropies interact with the order of increasing security. Furthermore, we give a `backwards' uncertainty-transformer semantics for HMM's that is dual to the `forwards' abstract HMM's - it is an analogue of the duality between forwards, relational semantics and backwards, predicate-transformer semantics for imperative programs with demonic choice. Finally, we argue that, from this new denotational-semantic viewpoint, one can see that the Dalenius desideratum for statistical databases is actually an issue in compositionality. We propose a means for taking it into account

    Radar and RGB-depth sensors for fall detection: a review

    Get PDF
    This paper reviews recent works in the literature on the use of systems based on radar and RGB-Depth (RGB-D) sensors for fall detection, and discusses outstanding research challenges and trends related to this research field. Systems to detect reliably fall events and promptly alert carers and first responders have gained significant interest in the past few years in order to address the societal issue of an increasing number of elderly people living alone, with the associated risk of them falling and the consequences in terms of health treatments, reduced well-being, and costs. The interest in radar and RGB-D sensors is related to their capability to enable contactless and non-intrusive monitoring, which is an advantage for practical deployment and users’ acceptance and compliance, compared with other sensor technologies, such as video-cameras, or wearables. Furthermore, the possibility of combining and fusing information from The heterogeneous types of sensors is expected to improve the overall performance of practical fall detection systems. Researchers from different fields can benefit from multidisciplinary knowledge and awareness of the latest developments in radar and RGB-D sensors that this paper is discussing

    Location Privacy in Usage-Based Automotive Insurance: Attacks and Countermeasures

    Get PDF
    Usage-based insurance (UBI) is regarded as a promising way to provide accurate automotive insurance rates by analyzing the driving behaviors (e.g., speed, mileage, and harsh braking/accelerating) of drivers. The best practice that has been adopted by many insurance programs to protect users\u27 location privacy is the use of driving speed rather than GPS data. However, in this paper, we challenge this approach by presenting a novel speed-based location trajectory inference framework. The basic strategy of the proposed inference framework is motivated by the following observations. In practice, many environmental factors, such as real-time traffic and traffic regulations, can influence the driving speed. These factors provide side-channel information about the driving route, which can be exploited to infer the vehicle\u27s trace. We implement our discovered attack on a public data set in New Jersey. The experimental results show that the attacker has a nearly 60% probability of obtaining the real route if he chooses the top 10 candidate routes. To thwart the proposed attack, we design a privacy preserving scoring and data audition framework that enhances drivers\u27 control on location privacy without affecting the utility of UBI. Our defense framework can also detect users\u27 dishonest behavior (e.g., modification of speed data) via a probabilistic audition scheme. Extensive experimental results validate the effectiveness of the defense framework

    ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

    Full text link
    Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR). By privacy preserving, we mean, usage of ephemeral data which are not human annotated. This system is a step forward for production levelASR models for incremental/continual learning that offers near real-time test-bed for experimentation in the cloud for end-to-end ASR, while adhering to privacy-preserving policies. We show that the proposed system can improve the production models significantly(3%) over a new time period of six months even in the absence of human annotated labels with varying levels of weak supervision and large batch sizes in incremental learning. This improvement is 20% over test sets with new words and phrases in the new time period. We demonstrate the effectiveness of model building in a privacy-preserving incremental fashion for ASR while further exploring the utility of having an effective teacher model and use of large batch sizes.Comment: 9 page
    • …
    corecore