17,855 research outputs found

    In-the-wild residual data research and privacy

    Get PDF
    As the world becomes increasingly dependent on technology, researchers endeavor to understand how technology is used, the impact it has on everyday life and the life-cycle and span of digital information. In doing so, researchers are increasingly gathering `real-world' or `in the wild' residual data, obtained from a variety of sources without the explicit consent of the original owners. This data gathering raises significant concerns regarding privacy, ethics and legislation, as well as practical considerations concerning investigator training, data storage, overall security and disposal. This paper surveys recent studies of residual data gathered in the wild and analyses the challenges that were faced. Taking these insights, the paper presents a compendium of practices for addressing the issues that arise in in the wild residual data research. The practices presented in this paper can be used to critique current projects and assess the feasibility of proposed future research

    In The Wild Residual Data Research and Privacy

    Get PDF
    As the world becomes increasingly dependent on technology, researchers in both industry and academia endeavor to understand how technology is used, the impact it has on everyday life, the artifact life-cycle and overall integrations of digital information. In doing so, researchers are increasingly gathering ‘real- world’ or ‘in-the-wild’ residual data, obtained from a variety of sources, without the explicit consent of the original owners. This data gathering raises significant concerns regarding privacy, ethics and legislation, as well as practical considerations concerning investigator training, data storage, overall security and data disposal. This research surveys recent studies of residual data gathered in-the-wild and analyzes the challenges that were confronted. Amalgamating these insights, the research presents a compendium of practices for addressing the issues that can arise in-the-wild when conducting residual data research. The practices identified in this research can be used to critique current projects and assess the feasibility of proposed future research

    CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks

    Full text link
    The unprecedented increase in the usage of computer vision technology in society goes hand in hand with an increased concern in data privacy. In many real-world scenarios like people tracking or action recognition, it is important to be able to process the data while taking careful consideration in protecting people's identity. We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. Our model is able to remove the identifying characteristics of faces and bodies while producing high-quality images and videos that can be used for any computer vision task, such as detection or tracking. Unlike previous methods, we have full control over the de-identification (anonymization) procedure, ensuring both anonymization as well as diversity. We compare our method to several baselines and achieve state-of-the-art results.Comment: CVPR 202

    Machine Learning Models that Remember Too Much

    Full text link
    Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model. We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data

    Privacy Preserving Internet Browsers: Forensic Analysis of Browzar

    Full text link
    With the advance of technology, Criminal Justice agencies are being confronted with an increased need to investigate crimes perpetuated partially or entirely over the Internet. These types of crime are known as cybercrimes. In order to conceal illegal online activity, criminals often use private browsing features or browsers designed to provide total browsing privacy. The use of private browsing is a common challenge faced in for example child exploitation investigations, which usually originate on the Internet. Although private browsing features are not designed specifically for criminal activity, they have become a valuable tool for criminals looking to conceal their online activity. As such, Technological Crime units often focus their forensic analysis on thoroughly examining the web history on a computer. Private browsing features and browsers often require a more in-depth, post mortem analysis. This often requires the use of multiple tools, as well as different forensic approaches to uncover incriminating evidence. This evidence may be required in a court of law, where analysts are often challenged both on their findings and on the tools and approaches used to recover evidence. However, there are very few research on evaluating of private browsing in terms of privacy preserving as well as forensic acquisition and analysis of privacy preserving internet browsers. Therefore in this chapter, we firstly review the private mode of popular internet browsers. Next, we describe the forensic acquisition and analysis of Browzar, a privacy preserving internet browser and compare it with other popular internet browser

    Emotions in context: examining pervasive affective sensing systems, applications, and analyses

    Get PDF
    Pervasive sensing has opened up new opportunities for measuring our feelings and understanding our behavior by monitoring our affective states while mobile. This review paper surveys pervasive affect sensing by examining and considering three major elements of affective pervasive systems, namely; “sensing”, “analysis”, and “application”. Sensing investigates the different sensing modalities that are used in existing real-time affective applications, Analysis explores different approaches to emotion recognition and visualization based on different types of collected data, and Application investigates different leading areas of affective applications. For each of the three aspects, the paper includes an extensive survey of the literature and finally outlines some of challenges and future research opportunities of affective sensing in the context of pervasive computing

    Co-training for Demographic Classification Using Deep Learning from Label Proportions

    Full text link
    Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we propose a co-training algorithm that iteratively generates pseudo bags and refits the deep LLP model to improve classification accuracy. We demonstrate our models on demographic attribute classification (gender and race/ethnicity), which has many applications in social media analysis, public health, and marketing. We conduct experiments to predict demographics of Twitter users based on their tweets and profile image, without requiring any user-level annotations for training. We find that the deep LLP approach outperforms baselines for both text and image features separately. Additionally, we find that co-training algorithm improves image and text classification by 4% and 8% absolute F1, respectively. Finally, an ensemble of text and image classifiers further improves the absolute F1 measure by 4% on average

    Privacy Risks of Securing Machine Learning Models against Adversarial Examples

    Full text link
    The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one big limitation of previous research is that the security domain and the privacy domain have typically been considered separately. It is thus unclear whether the defense methods in one domain will have any unexpected impact on the other domain. In this paper, we take a step towards resolving this limitation by combining the two domains. In particular, we measure the success of membership inference attacks against six state-of-the-art defense methods that mitigate the risk of adversarial examples (i.e., evasion attacks). Membership inference attacks determine whether or not an individual data record has been part of a model's training set. The accuracy of such attacks reflects the information leakage of training algorithms about individual members of the training set. Adversarial defense methods against adversarial examples influence the model's decision boundaries such that model predictions remain unchanged for a small area around each input. However, this objective is optimized on training data. Thus, individual data records in the training set have a significant influence on robust models. This makes the models more vulnerable to inference attacks. To perform the membership inference attacks, we leverage the existing inference methods that exploit model predictions. We also propose two new inference methods that exploit structural properties of robust models on adversarially perturbed data. Our experimental evaluation demonstrates that compared with the natural training (undefended) approach, adversarial defense methods can indeed increase the target model's risk against membership inference attacks.Comment: ACM CCS 2019, code is available at https://github.com/inspire-group/privacy-vs-robustnes
    corecore