137 research outputs found

    Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography

    Full text link
    Data hiding is the process of embedding information into a noise-tolerant signal such as a piece of audio, video, or image. Digital watermarking is a form of data hiding where identifying data is robustly embedded so that it can resist tampering and be used to identify the original owners of the media. Steganography, another form of data hiding, embeds data for the purpose of secure and secret communication. This survey summarises recent developments in deep learning techniques for data hiding for the purposes of watermarking and steganography, categorising them based on model architectures and noise injection methods. The objective functions, evaluation metrics, and datasets used for training these data hiding models are comprehensively summarised. Finally, we propose and discuss possible future directions for research into deep data hiding techniques

    ์ธ๊ณต์ง€๋Šฅ ๋ณด์•ˆ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021. 2. ์œค์„ฑ๋กœ.With the development of machine learning (ML), expectations for artificial intelligence (AI) technologies have increased daily. In particular, deep neural networks have demonstrated outstanding performance in many fields. However, if a deep-learning (DL) model causes mispredictions or misclassifications, it can cause difficulty, owing to malicious external influences. This dissertation discusses DL security and privacy issues and proposes methodologies for security and privacy attacks. First, we reviewed security attacks and defenses from two aspects. Evasion attacks use adversarial examples to disrupt the classification process, and poisoning attacks compromise training by compromising the training data. Next, we reviewed attacks on privacy that can exploit exposed training data and defenses, including differential privacy and encryption. For adversarial DL, we study the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers. We believe that our problem is more challenging than those against ML models for image processing, owing to the highly complex data structure of PDFs, compared with traditional image datasets, and the requirement that the infected PDF should exhibit malicious behavior without being detected. We propose an attack using generative adversarial networks that effectively generates evasive PDFs using a variational autoencoder robust against adversarial examples. For privacy in DL, we study the problem of avoiding sensitive data being misused and propose a privacy-preserving framework for deep neural networks. Our methods are based on generative models that preserve the privacy of sensitive data while maintaining a high prediction performance. Finally, we study the security aspect in biological domains to detect maliciousness in deoxyribonucleic acid sequences and watermarks to protect intellectual properties. In summary, the proposed DL models for security and privacy embrace a diversity of research by attempting actual attacks and defenses in various fields.์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ฐœ์ธ๋ณ„ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์ด ํ•„์ˆ˜์ ์ด๋‹ค. ๋ฐ˜๋ฉด ๊ฐœ์ธ์˜ ๋ฏผ๊ฐํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์œ ์ถœ๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ์นจํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋‹ค. ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์™ธ๋ถ€์— ์œ ์ถœ๋˜์ง€ ์•Š๋„๋ก ํ•˜๊ฑฐ๋‚˜, ์ต๋ช…ํ™”, ๋ถ€ํ˜ธํ™” ๋“ฑ์˜ ๋ณด์•ˆ ๊ธฐ๋ฒ•์„ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์— ์ ์šฉํ•˜๋Š” ๋ถ„์•ผ๋ฅผ Private AI๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์ด ๋…ธ์ถœ๋  ๊ฒฝ์šฐ ์ง€์  ์†Œ์œ ๊ถŒ์ด ๋ฌด๋ ฅํ™”๋  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ œ์ ๊ณผ, ์•…์˜์ ์ธ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๊ณต์ง€๋Šฅ ์‹œ์Šคํ…œ์„ ์˜ค์ž‘๋™ํ•  ์ˆ˜ ์žˆ๊ณ  ์ด๋Ÿฌํ•œ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ ์ž์ฒด์— ๋Œ€ํ•œ ์œ„ํ˜‘์€ Secure AI๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ณต๊ฒฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹ ๊ฒฝ๋ง์˜ ๊ฒฐ์† ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๊ธฐ์กด์˜ AEs ์—ฐ๊ตฌ๋“ค์€ ์ด๋ฏธ์ง€๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ๋ณด๋‹ค ๋ณต์žกํ•œ heterogenousํ•œ PDF ๋ฐ์ดํ„ฐ๋กœ ์—ฐ๊ตฌ๋ฅผ ํ™•์žฅํ•˜์—ฌ generative ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜์—ฌ ๊ณต๊ฒฉ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜์˜€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ด์ƒ ํŒจํ„ด์„ ๋ณด์ด๋Š” ์ƒ˜ํ”Œ์„ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๋Š” DNA steganalysis ๋ฐฉ์–ด ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ๋ฅผ ์œ„ํ•ด generative ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์ต๋ช…ํ™” ๊ธฐ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์€ ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ๊ณต๊ฒฉ ๋ฐ ๋ฐฉ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•˜๋Š”๋ฐ ๋ฐœ์ƒ๋˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ์ด์Šˆ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•œ ์ผ๋ จ์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค.Abstract i List of Figures vi List of Tables xiii 1 Introduction 1 2 Background 6 2.1 Deep Learning: a brief overview . . . . . . . . . . . . . . . . . . . 6 2.2 Security Attacks on Deep Learning Models . . . . . . . . . . . . . 10 2.2.1 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Poisoning Attack . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Defense Techniques Against Deep Learning Models . . . . . . . . . 26 2.3.1 Defense Techniques against Evasion Attacks . . . . . . . . 27 2.3.2 Defense against Poisoning Attacks . . . . . . . . . . . . . . 36 2.4 Privacy issues on Deep Learning Models . . . . . . . . . . . . . . . 38 2.4.1 Attacks on Privacy . . . . . . . . . . . . . . . . . . . . . . 39 2.4.2 Defenses Against Attacks on Privacy . . . . . . . . . . . . 40 3 Attacks on Deep Learning Models 47 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.2 Portable Document Format (PDF) . . . . . . . . . . . . . . 55 3.1.3 PDF Malware Classifiers . . . . . . . . . . . . . . . . . . . 57 3.1.4 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Feature Selection Process . . . . . . . . . . . . . . . . . . 61 3.2.3 Seed Selection for Mutation . . . . . . . . . . . . . . . . . 62 3.2.4 Evading Model . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.5 Model architecture . . . . . . . . . . . . . . . . . . . . . . 67 3.2.6 PDF Repacking and Verification . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.1 Datasets and Model Training . . . . . . . . . . . . . . . . . 68 3.3.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 CVEs for Various Types of PDF Malware . . . . . . . . . . 72 3.3.4 Malicious Signature . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 AntiVirus Engines (VirusTotal) . . . . . . . . . . . . . . . 76 3.3.6 Feature Mutation Result for Contagio . . . . . . . . . . . . 76 3.3.7 Feature Mutation Result for CVEs . . . . . . . . . . . . . . 78 3.3.8 Malicious Signature Verification . . . . . . . . . . . . . . . 78 3.3.9 Evasion Speed . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.10 AntiVirus Engines (VirusTotal) Result . . . . . . . . . . . . 82 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 Defense on Deep Learning Models 88 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.1 Message-Hiding Regions . . . . . . . . . . . . . . . . . . . 91 4.1.2 DNA Steganography . . . . . . . . . . . . . . . . . . . . . 92 4.1.3 Example of Message Hiding . . . . . . . . . . . . . . . . . 94 4.1.4 DNA Steganalysis . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2.2 Proposed Model Architecture . . . . . . . . . . . . . . . . 103 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.5 Message Hiding Procedure . . . . . . . . . . . . . . . . . . 108 4.3.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . 109 4.3.7 Performance Comparison . . . . . . . . . . . . . . . . . . . 109 4.3.8 Analyzing Malicious Code in DNA Sequences . . . . . . . 112 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Privacy: Generative Models for Anonymizing Private Data 115 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.2 Anonymization using GANs . . . . . . . . . . . . . . . . . 119 5.1.3 Security Principle of Anonymized GANs . . . . . . . . . . 123 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.4 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . 126 5.2.5 Comparison to Differential Privacy . . . . . . . . . . . . . 128 5.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 128 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Privacy: Privacy-preserving Inference for Deep Learning Models 132 6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.2 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.3 Deep Private Generation Framework . . . . . . . . . . . . . 137 6.1.4 Security Principle . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.5 Threat to the Classifier . . . . . . . . . . . . . . . . . . . . 143 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.2 Experimental Process . . . . . . . . . . . . . . . . . . . . . 146 6.2.3 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . 149 6.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 150 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7 Conclusion 153 7.0.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.0.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157 Abstract in Korean 195Docto

    Convolutional Neural Networks for Image Steganalysis in the Spatial Domain

    Get PDF
    Esta tesis doctoral muestra los resultados obtenidos al aplicar Redes Neuronales Convolucionales (CNNs) para el estegoanรกlisis de imรกgenes digitales en el dominio espacial. La esteganografรญa consiste en ocultar mensajes dentro de un objeto conocido como portador para establecer un canal de comunicaciรณn encubierto para que el acto de comunicaciรณn pase desapercibido para los observadores que tienen acceso a ese canal. Steganalysis se dedica a detectar mensajes ocultos mediante esteganografรญa; estos mensajes pueden estar implรญcitos en diferentes tipos de medios, como imรกgenes digitales, archivos de video, archivos de audio o texto sin formato. Desde 2014, los investigadores se han interesado especialmente en aplicar tรฉcnicas de Deep Learning (DL) para lograr resultados que superen los mรฉtodos tradicionales de Machine Learning (ML).Is doctoral thesis shows the results obtained by applying Convolutional Neural Networks (CNNs) for the steganalysis of digital images in the spatial domain. Steganography consists of hiding messages inside an object known as a carrier to establish a covert communication channel so that the act of communication goes unnoticed by observers who have access to that channel. Steganalysis is dedicated to detecting hidden messages using steganography; these messages can be implicit in di.erent types of media, such as digital images, video โ‚ฌles, audio โ‚ฌles, or plain text. Since 2014 researchers have taken a particular interest in applying Deep Learning (DL) techniques to achieving results that surpass traditional Machine Learning (ML) methods

    Towards private and robust machine learning for information security

    Get PDF
    Many problems in information security are pattern recognition problems. For example, determining if a digital communication can be trusted amounts to certifying that the communication does not carry malicious or secret content, which can be distilled into the problem of recognising the difference between benign and malicious content. At a high level, machine learning is the study of how patterns are formed within data, and how learning these patterns generalises beyond the potentially limited data pool at a practitionerโ€™s disposal, and so has become a powerful tool in information security. In this work, we study the benefits machine learning can bring to two problems in information security. Firstly, we show that machine learning can be used to detect which websites are visited by an internet user over an encrypted connection. By analysing timing and packet size information of encrypted network traffic, we train a machine learning model that predicts the target website given a stream of encrypted network traffic, even if browsing is performed over an anonymous communication network. Secondly, in addition to studying how machine learning can be used to design attacks, we study how it can be used to solve the problem of hiding information within a cover medium, such as an image or an audio recording, which is commonly referred to as steganography. How well an algorithm can hide information within a cover medium amounts to how well the algorithm models and exploits areas of redundancy. This can again be reduced to a pattern recognition problem, and so we apply machine learning to design a steganographic algorithm that efficiently hides a secret message with an image. Following this, we proceed with discussions surrounding why machine learning is not a panacea for information security, and can be an attack vector in and of itself. We show that machine learning can leak private and sensitive information about the data it used to learn, and how malicious actors can exploit vulnerabilities in these learning algorithms to compel them to exhibit adversarial behaviours. Finally, we examine the problem of the disconnect between image recognition systems learned by humans and by machine learning models. While human classification of an image is relatively robust to noise, machine learning models do not possess this property. We show how an attacker can cause targeted misclassifications against an entire data distribution by exploiting this property, and go onto introduce a mitigation that ameliorates this undesirable trait of machine learning

    Human-Centric Deep Generative Models: The Blessing and The Curse

    Get PDF
    Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction

    Lost in the Digital Wild: Hiding Information in Digital Activities

    Get PDF
    This paper presents a new general framework of information hiding, in which the hidden information is embedded into a collection of activities conducted by selected human and computer entities (e.g., a number of online accounts of one or more online social networks) in a selected digital world. Different from other traditional schemes, where the hidden information is embedded into one or more selected or generated cover objects, in the new framework the hidden information is embedded in the fact that some particular digital activities with some particular attributes took place in some particular ways in the receiver-observable digital world. In the new framework the concept of "cover" almost disappears, or one can say that now the whole digital world selected becomes the cover. The new framework can find applications in both security (e.g., steganography) and non-security domains (e.g., gaming). For security applications we expect that the new framework calls for completely new steganalysis techniques, which are likely more complicated, less effective and less efficient than existing ones due to the need to monitor and analyze the whole digital world constantly and in real time. A proof-of-concept system was developed as a mobile app based on Twitter activities to demonstrate the information hiding framework works. We are developing a more hybrid system involving several online social networks
    • โ€ฆ
    corecore