534 research outputs found

    When Others Impinge upon Your Privacy:Interdependent Risks and Protection in a Connected World

    Get PDF
    Privacy is defined as the right to control, edit, manage, and delete information about oneself and decide when, how, and to what extent this information is communicated to others. Therefore, every person should ideally be empowered to manage and protect his own data, individually and independently of others. This assumption, however, barely holds in practice, because people are by nature biologically and socially interconnected. An individual's identity is essentially determined at the biological and social levels. First, a person is biologically determined by his DNA, his genes, that fully encode his physical characteristics. Second, human beings are social animals, with a strong need to create ties and interact with their peers. Interdependence is present at both levels. At the biological level, interdependence stems from genetic inheritance. At the social level, interdependence emerges from social ties. In this thesis, we investigate whether, in today's highly connected world, individual privacy is in fact achievable, or if it is almost impossible due to the inherent interdependence between people. First, we study interdependent privacy risks at the social level, focusing on online social networks (OSNs), the digital counterpart of our social lives. We show that, even if an OSN user carefully tunes his privacy settings in order to not be present in any search directory, it is possible for an adversary to find him by using publicly visible attributes of other OSN users. We demonstrate that, in OSNs where privacy settings are not aligned between users and where some users reveal a (even limited) set of attributes, it is almost impossible for a specific user to hide in the crowd. Our navigation attack complements existing work on inference attacks in OSNs by showing how we can efficiently find targeted profiles in OSNs, which is a necessary precondition for any targeted attack. Our attack also demonstrates the threat on OSN-membership privacy. Second, we investigate upcoming interdependent privacy risks at the biological level. More precisely, due to the recent drop in costs of genome sequencing, an increasing number of people are having their genomes sequenced and share them online and/or with third parties for various purposes. However, familial genetic dependencies induce indirect genomic privacy risks for the relatives of the individuals who share their genomes. We propose a probabilistic framework that relies upon graphical models and Bayesian inference in order to formally quantify genomic privacy risks. Then, we study the interplay between rational family members with potentially conflicting interests regarding the storage security and disclosure of their genomic data. We consider both purely selfish and altruistic behaviors, and we make use of multi-agent influence diagrams to efficiently derive equilibria in the general case where more than two relatives interact with each other. We also propose an obfuscation mechanism in order to reconcile utility with privacy in genomics, in the context where all family members are cooperative and care about each other's privacy. Third, we study privacy-enhancing systems, such as anonymity networks, where users do not damage other users' privacy but are actually needed in order to protect privacy. In this context, we show how incentives based on virtual currency can be used and their amount optimized in order to foster cooperation between users and eventually improve everyone's privacy.[...

    Data-Driven and Game-Theoretic Approaches for Privacy

    Get PDF
    abstract: In the past few decades, there has been a remarkable shift in the boundary between public and private information. The application of information technology and electronic communications allow service providers (businesses) to collect a large amount of data. However, this ``data collection" process can put the privacy of users at risk and also lead to user reluctance in accepting services or sharing data. This dissertation first investigates privacy sensitive consumer-retailers/service providers interactions under different scenarios, and then focuses on a unified framework for various information-theoretic privacy and privacy mechanisms that can be learned directly from data. Existing approaches such as differential privacy or information-theoretic privacy try to quantify privacy risk but do not capture the subjective experience and heterogeneous expression of privacy-sensitivity. The first part of this dissertation introduces models to study consumer-retailer interaction problems and to better understand how retailers/service providers can balance their revenue objectives while being sensitive to user privacy concerns. This dissertation considers the following three scenarios: (i) the consumer-retailer interaction via personalized advertisements; (ii) incentive mechanisms that electrical utility providers need to offer for privacy sensitive consumers with alternative energy sources; (iii) the market viability of offering privacy guaranteed free online services. We use game-theoretic models to capture the behaviors of both consumers and retailers, and provide insights for retailers to maximize their profits when interacting with privacy sensitive consumers. Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. In the second part, a novel context-aware privacy framework called generative adversarial privacy (GAP) is introduced. Inspired by recent advancements in generative adversarial networks, GAP allows the data holder to learn the privatization mechanism directly from the data. Under GAP, finding the optimal privacy mechanism is formulated as a constrained minimax game between a privatizer and an adversary. For appropriately chosen adversarial loss functions, GAP provides privacy guarantees against strong information-theoretic adversaries. Both synthetic and real-world datasets are used to show that GAP can greatly reduce the adversary's capability of inferring private information at a small cost of distorting the data.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    ์ธ๊ณต์ง€๋Šฅ ๋ณด์•ˆ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021. 2. ์œค์„ฑ๋กœ.With the development of machine learning (ML), expectations for artificial intelligence (AI) technologies have increased daily. In particular, deep neural networks have demonstrated outstanding performance in many fields. However, if a deep-learning (DL) model causes mispredictions or misclassifications, it can cause difficulty, owing to malicious external influences. This dissertation discusses DL security and privacy issues and proposes methodologies for security and privacy attacks. First, we reviewed security attacks and defenses from two aspects. Evasion attacks use adversarial examples to disrupt the classification process, and poisoning attacks compromise training by compromising the training data. Next, we reviewed attacks on privacy that can exploit exposed training data and defenses, including differential privacy and encryption. For adversarial DL, we study the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers. We believe that our problem is more challenging than those against ML models for image processing, owing to the highly complex data structure of PDFs, compared with traditional image datasets, and the requirement that the infected PDF should exhibit malicious behavior without being detected. We propose an attack using generative adversarial networks that effectively generates evasive PDFs using a variational autoencoder robust against adversarial examples. For privacy in DL, we study the problem of avoiding sensitive data being misused and propose a privacy-preserving framework for deep neural networks. Our methods are based on generative models that preserve the privacy of sensitive data while maintaining a high prediction performance. Finally, we study the security aspect in biological domains to detect maliciousness in deoxyribonucleic acid sequences and watermarks to protect intellectual properties. In summary, the proposed DL models for security and privacy embrace a diversity of research by attempting actual attacks and defenses in various fields.์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฐœ์ธ๋ณ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ํ•„์ˆ˜์ ์ด๋‹ค. ๋ฐ˜๋ฉด ๊ฐœ์ธ์˜ ๋ฏผ๊ฐํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์œ ์ถœ๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ์นจํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋‹ค. ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์™ธ๋ถ€์— ์œ ์ถœ๋˜์ง€ ์•Š๋„๋ก ํ•˜๊ฑฐ๋‚˜, ์ต๋ช…ํ™”, ๋ถ€ํ˜ธํ™” ๋“ฑ์˜ ๋ณด์•ˆ ๊ธฐ๋ฒ•์„ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์— ์ ์šฉํ•˜๋Š” ๋ถ„์•ผ๋ฅผ Private AI๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์ด ๋…ธ์ถœ๋  ๊ฒฝ์šฐ ์ง€์  ์†Œ์œ ๊ถŒ์ด ๋ฌด๋ ฅํ™”๋  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ์ ๊ณผ, ์•…์˜์ ์ธ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๊ณต์ง€๋Šฅ ์‹œ์Šคํ…œ์„ ์˜ค์ž‘๋™ํ•  ์ˆ˜ ์žˆ๊ณ  ์ด๋Ÿฌํ•œ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ ์ž์ฒด์— ๋Œ€ํ•œ ์œ„ํ˜‘์€ Secure AI๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ณต๊ฒฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹ ๊ฒฝ๋ง์˜ ๊ฒฐ์† ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๊ธฐ์กด์˜ AEs ์—ฐ๊ตฌ๋“ค์€ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ๋ณด๋‹ค ๋ณต์žกํ•œ heterogenousํ•œ PDF ๋ฐ์ดํ„ฐ๋กœ ์—ฐ๊ตฌ๋ฅผ ํ™•์žฅํ•˜์—ฌ generative ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์—ฌ ๊ณต๊ฒฉ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ด์ƒ ํŒจํ„ด์„ ๋ณด์ด๋Š” ์ƒ˜ํ”Œ์„ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋Š” DNA steganalysis ๋ฐฉ์–ด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ๋ฅผ ์œ„ํ•ด generative ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์ต๋ช…ํ™” ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์€ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๊ณต๊ฒฉ ๋ฐ ๋ฐฉ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜๋Š”๋ฐ ๋ฐœ์ƒ๋˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ์ด์Šˆ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•œ ์ผ๋ จ์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค.Abstract i List of Figures vi List of Tables xiii 1 Introduction 1 2 Background 6 2.1 Deep Learning: a brief overview . . . . . . . . . . . . . . . . . . . 6 2.2 Security Attacks on Deep Learning Models . . . . . . . . . . . . . 10 2.2.1 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Poisoning Attack . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Defense Techniques Against Deep Learning Models . . . . . . . . . 26 2.3.1 Defense Techniques against Evasion Attacks . . . . . . . . 27 2.3.2 Defense against Poisoning Attacks . . . . . . . . . . . . . . 36 2.4 Privacy issues on Deep Learning Models . . . . . . . . . . . . . . . 38 2.4.1 Attacks on Privacy . . . . . . . . . . . . . . . . . . . . . . 39 2.4.2 Defenses Against Attacks on Privacy . . . . . . . . . . . . 40 3 Attacks on Deep Learning Models 47 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.2 Portable Document Format (PDF) . . . . . . . . . . . . . . 55 3.1.3 PDF Malware Classifiers . . . . . . . . . . . . . . . . . . . 57 3.1.4 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Feature Selection Process . . . . . . . . . . . . . . . . . . 61 3.2.3 Seed Selection for Mutation . . . . . . . . . . . . . . . . . 62 3.2.4 Evading Model . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.5 Model architecture . . . . . . . . . . . . . . . . . . . . . . 67 3.2.6 PDF Repacking and Verification . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.1 Datasets and Model Training . . . . . . . . . . . . . . . . . 68 3.3.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 CVEs for Various Types of PDF Malware . . . . . . . . . . 72 3.3.4 Malicious Signature . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 AntiVirus Engines (VirusTotal) . . . . . . . . . . . . . . . 76 3.3.6 Feature Mutation Result for Contagio . . . . . . . . . . . . 76 3.3.7 Feature Mutation Result for CVEs . . . . . . . . . . . . . . 78 3.3.8 Malicious Signature Verification . . . . . . . . . . . . . . . 78 3.3.9 Evasion Speed . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.10 AntiVirus Engines (VirusTotal) Result . . . . . . . . . . . . 82 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 Defense on Deep Learning Models 88 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.1 Message-Hiding Regions . . . . . . . . . . . . . . . . . . . 91 4.1.2 DNA Steganography . . . . . . . . . . . . . . . . . . . . . 92 4.1.3 Example of Message Hiding . . . . . . . . . . . . . . . . . 94 4.1.4 DNA Steganalysis . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2.2 Proposed Model Architecture . . . . . . . . . . . . . . . . 103 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.5 Message Hiding Procedure . . . . . . . . . . . . . . . . . . 108 4.3.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . 109 4.3.7 Performance Comparison . . . . . . . . . . . . . . . . . . . 109 4.3.8 Analyzing Malicious Code in DNA Sequences . . . . . . . 112 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Privacy: Generative Models for Anonymizing Private Data 115 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.2 Anonymization using GANs . . . . . . . . . . . . . . . . . 119 5.1.3 Security Principle of Anonymized GANs . . . . . . . . . . 123 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.4 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . 126 5.2.5 Comparison to Differential Privacy . . . . . . . . . . . . . 128 5.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 128 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Privacy: Privacy-preserving Inference for Deep Learning Models 132 6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.2 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.3 Deep Private Generation Framework . . . . . . . . . . . . . 137 6.1.4 Security Principle . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.5 Threat to the Classifier . . . . . . . . . . . . . . . . . . . . 143 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.2 Experimental Process . . . . . . . . . . . . . . . . . . . . . 146 6.2.3 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . 149 6.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 150 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7 Conclusion 153 7.0.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.0.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157 Abstract in Korean 195Docto

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patientsโ€™ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness

    Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging

    Get PDF
    Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in image synthesis, Generative Adversarial Networks (GANs), and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on data synthesis and adversarial networks in the artificial intelligence community

    The 4th Conference of PhD Students in Computer Science

    Get PDF

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
    • โ€ฆ
    corecore