333 research outputs found

    A review of privacy-preserving human and human activity recognition

    Get PDF

    How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers

    Full text link
    As a booming research area in the past decade, deep learning technologies have been driven by big data collected and processed on an unprecedented scale. However, privacy concerns arise due to the potential leakage of sensitive information from the training data. Recent research has revealed that deep learning models are vulnerable to various privacy attacks, including membership inference attacks, attribute inference attacks, and gradient inversion attacks. Notably, the efficacy of these attacks varies from model to model. In this paper, we answer a fundamental question: Does model architecture affect model privacy? By investigating representative model architectures from CNNs to Transformers, we demonstrate that Transformers generally exhibit higher vulnerability to privacy attacks compared to CNNs. Additionally, We identify the micro design of activation layers, stem layers, and LN layers, as major factors contributing to the resilience of CNNs against privacy attacks, while the presence of attention modules is another main factor that exacerbates the privacy vulnerability of Transformers. Our discovery reveals valuable insights for deep learning models to defend against privacy attacks and inspires the research community to develop privacy-friendly model architectures.Comment: Under revie

    Privacy risk assessment of emerging machine learning paradigms

    Get PDF
    Machine learning (ML) has progressed tremendously, and data is the key factor to drive such development. However, there are two main challenges regarding collecting the data and handling it with ML models. First, the acquisition of high-quality labeled data can be difficult and expensive due to the need for extensive human annotation. Second, to model the complex relationship between entities, e.g., social networks or molecule structures, graphs have been leveraged. However, conventional ML models may not effectively handle graph data due to the non-linear and complex nature of the relationships between nodes. To address these challenges, recent developments in semi-supervised learning and self-supervised learning have been introduced to leverage unlabeled data for ML tasks. In addition, a new family of ML models known as graph neural networks has been proposed to tackle the challenges associated with graph data. Despite being powerful, the potential privacy risk stemming from these paradigms should also be taken into account. In this dissertation, we perform the privacy risk assessment of the emerging machine learning paradigms. Firstly, we investigate the membership privacy leakage stemming from semi-supervised learning. Concretely, we propose the first data augmentation-based membership inference attack that is tailored to the training paradigm of semi-supervised learning methods. Secondly, we quantify the privacy leakage of self-supervised learning through the lens of membership inference attacks and attribute inference attacks. Thirdly, we study the privacy implications of training GNNs on graphs. In particular, we propose the first attack to steal a graph from the outputs of a GNN model that is trained on the graph. Finally, we also explore potential defense mechanisms to mitigate these attacks.Maschinelles Lernen (ML) hat enorme Fortschritte gemacht, und Daten sind der Schlüsselfaktor, um diese Entwicklung voranzutreiben. Es gibt jedoch zwei große Herausforderungen bei der Erfassung der Daten und deren Handhabung mit ML-Modellen. Erstens kann die Erfassung qualitativ hochwertiger beschrifteter Daten aufgrund der Notwendigkeit umfangreicher menschlicher Anmerkungen schwierig und teuer sein. Zweitens wurden Graphen genutzt, um die komplexe Beziehung zwischen Entitäten, z. B. sozialen Netzwerken oder Molekülstrukturen, zu modellieren. Herkömmliche ML Modelle können Diagrammdaten jedoch aufgrund der nichtlinearen und komplexen Natur der Beziehungen zwischen Knoten möglicherweise nicht effektiv handhaben. Um diesen Herausforderungen zu begegnen, wurden jüngste Entwicklungen im halbüberwachten Lernen und im selbstüberwachten Lernen eingeführt, um unbeschriftete Daten für ML Aufgaben zu nutzen. Darüber hinaus wurde eine neue Familie von ML-Modellen, bekannt als Graph Neural Networks, vorgeschlagen, um die Herausforderungen im Zusammenhang mit Graphdaten zu bewältigen. Obwohl sie leistungsfähig sind, sollte auch das potenzielle Datenschutzrisiko berücksichtigt werden, das sich aus diesen Paradigmen ergibt. In dieser Dissertation führen wir die Datenschutzrisikobewertung der aufkommenden Paradigmen des maschinellen Lernens durch. Erstens untersuchen wir die Datenschutzlecks der Mitgliedschaft, die sich aus halbüberwachtem Lernen ergeben. Konkret schlagen wir den ersten auf Datenaugmentation basierenden Mitgliedschafts-Inferenz-Angriff vor, der auf das Trainingsparadigma halbüberwachter Lernmethoden zugeschnitten ist. Zweitens quantifizieren wir das Durchsickern der Privatsphäre des selbstüberwachten Lernens durch die Linse von Mitgliedschafts-Inferenz-Angriffen und Attribut-Inferenz- Angriffen. Drittens untersuchen wir die Datenschutzauswirkungen des Trainings von GNNs auf Graphen. Insbesondere schlagen wir den ersten Angriff vor, um einen Graphen aus den Ausgaben eines GNN-Modells zu stehlen, das auf dem Graphen trainiert wird. Schließlich untersuchen wir auch mögliche Verteidigungsmechanismen, um diese Angriffe abzuschwächen

    Fairness Properties of Face Recognition and Obfuscation Systems

    Full text link
    The proliferation of automated face recognition in the commercial and government sectors has caused significant privacy concerns for individuals. One approach to address these privacy concerns is to employ evasion attacks against the metric embedding networks powering face recognition systems: Face obfuscation systems generate imperceptibly perturbed images that cause face recognition systems to misidentify the user. Perturbed faces are generated on metric embedding networks, which are known to be unfair in the context of face recognition. A question of demographic fairness naturally follows: are there demographic disparities in face obfuscation system performance? We answer this question with an analytical and empirical exploration of recent face obfuscation systems. Metric embedding networks are found to be demographically aware: face embeddings are clustered by demographic. We show how this clustering behavior leads to reduced face obfuscation utility for faces in minority groups. An intuitive analytical model yields insight into these phenomena

    The Interaction of Learning Speed and Memory Interference: When Fast is Bad

    Get PDF
    Research on individual differences in speed of learning has suggested that forgetting rates could be different for fast and slow learners. Studies have shown either no difference or slower forgetting over time for fast learners. The present study extends this area of research by investigating the possibility that fast and slow learning are differentially vulnerable to interference. Based on neural network models and the encoding variability hypothesis, two novel hypotheses were built and tested in two experiments by a paired-associates task. The hypotheses suggested that fast learning will be more prone to interference when similarity of the learning material is high. Hence, an interaction of learning speed and interference (i.e., similarity) was predicted. Experiment 1 (N = 22) compared retention of Chinese characters for fast and slow learning (both subject and item-specific speed) by manipulating similarity (high vs. low) of the characters learned. Results of Experiment 1 were inconclusive. Experiment 2 (N = 21) had the same basic design as Experiment 1, but included a number of procedural improvements. Interactions in the predicted direction were found both when comparing learning speed between subjects as well as for item-specific speed. However, only the interaction of between-subjects learning speed and similarity was significant. A joint analysis, including data from both experiments, yielded significant interactions for both subject speed and item-specific speed, indicating that the lack of a significant interaction of item-specific speed and similarity in Experiment 2 was probably due to the low sample size. The findings are discussed in relation to previous research on individual differences in learning speed and forgetting

    Survey: Leakage and Privacy at Inference Time

    Get PDF
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research

    Generalizability of Predictive Performance Optimizer Predictions across Learning Task Type

    Get PDF
    The purpose of my study is to understand the relationship of learning and forgetting rates estimated by a cognitive model at the level of the individual and overall task performance across similar learning tasks. Cognitive computational models are formal representations of theories that enable better understanding and prediction of dynamic human behavior in complex environments (Adner, Polos, Ryall, & Sorenson, 2009). The Predictive Performance Optimizer (PPO) is a cognitive model and training aid based in learning theory that tracks quantitative performance data and also makes predictions for future performance. It does so by estimating learning and decay rates for specific tasks and trainees. In this study, I used three learning tasks to assess individual performance and the model\u27s potential to generalize parameters and retention interval predictions at the level of the individual and across similar-type tasks. The similar-type tasks were memory recall tasks and the different-type task was a spatial learning task. I hypothesized that the raw performance scores, PPO optimized parameter estimates, and PPO predictions for each individual would be similar for two learning tasks within the same type and different for the different type learning task. Fifty-eight participants completed four training sessions, each consisting of the three tasks. I used the PPO to assess performance on task, knowledge acquisition, learning, forgetting, and retention over time. Additionally, I tested PPO generalizability by assessing fit when PPO optimized parameters for one task were applied to another. Results showed similarities in performance, PPO optimization trends, and predicted performance trends across similar task types, and differences for the different type task. As hypothesized, the results for PPO parameter generalizability and overall performance predictions were less distinct. Outcomes of this study suggest potential differences in learning and retention based on task-type designation and potential generalizability of PPO by accounting for these differences. This decreases the requirements for individual performance data on a specific task to determine training optimization scheduling

    Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks

    Full text link
    We show that deep neural networks that satisfy demographic parity do so through a form of race or gender awareness, and that the more we force a network to be fair, the more accurately we can recover race or gender from the internal state of the network. Based on this observation, we propose a simple two-stage solution for enforcing fairness. First, we train a two-headed network to predict the protected attribute (such as race or gender) alongside the original task, and second, we enforce demographic parity by taking a weighted sum of the heads. In the end, this approach creates a single-headed network with the same backbone architecture as the original network. Our approach has near identical performance compared to existing regularization-based or preprocessing methods, but has greater stability and higher accuracy where near exact demographic parity is required. To cement the relationship between these two approaches, we show that an unfair and optimally accurate classifier can be recovered by taking a weighted sum of a fair classifier and a classifier predicting the protected attribute. We use this to argue that both the fairness approaches and our explicit formulation demonstrate disparate treatment and that, consequentially, they are likely to be unlawful in a wide range of scenarios under the US law
    corecore