Search CORE

23 research outputs found

인공지능 보안

Author: 배호
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2021. 2. 윤성로.With the development of machine learning (ML), expectations for artificial intelligence (AI) technologies have increased daily. In particular, deep neural networks have demonstrated outstanding performance in many fields. However, if a deep-learning (DL) model causes mispredictions or misclassifications, it can cause difficulty, owing to malicious external influences. This dissertation discusses DL security and privacy issues and proposes methodologies for security and privacy attacks. First, we reviewed security attacks and defenses from two aspects. Evasion attacks use adversarial examples to disrupt the classification process, and poisoning attacks compromise training by compromising the training data. Next, we reviewed attacks on privacy that can exploit exposed training data and defenses, including differential privacy and encryption. For adversarial DL, we study the problem of finding adversarial examples against ML-based portable document format (PDF) malware classifiers. We believe that our problem is more challenging than those against ML models for image processing, owing to the highly complex data structure of PDFs, compared with traditional image datasets, and the requirement that the infected PDF should exhibit malicious behavior without being detected. We propose an attack using generative adversarial networks that effectively generates evasive PDFs using a variational autoencoder robust against adversarial examples. For privacy in DL, we study the problem of avoiding sensitive data being misused and propose a privacy-preserving framework for deep neural networks. Our methods are based on generative models that preserve the privacy of sensitive data while maintaining a high prediction performance. Finally, we study the security aspect in biological domains to detect maliciousness in deoxyribonucleic acid sequences and watermarks to protect intellectual properties. In summary, the proposed DL models for security and privacy embrace a diversity of research by attempting actual attacks and defenses in various fields.인공지능 모델을 사용하기 위해서는 개인별 데이터 수집이 필수적이다. 반면 개인의 민감한 데이터가 유출되는 경우에는 프라이버시 침해의 소지가 있다. 인공지능 모델을 사용하는데 수집된 데이터가 외부에 유출되지 않도록 하거나, 익명화, 부호화 등의 보안 기법을 인공지능 모델에 적용하는 분야를 Private AI로 분류할 수 있다. 또한 인공지능 모델이 노출될 경우 지적 소유권이 무력화될 수 있는 문제점과, 악의적인 학습 데이터를 이용하여 인공지능 시스템을 오작동할 수 있고 이러한 인공지능 모델 자체에 대한 위협은 Secure AI로 분류할 수 있다. 본 논문에서는 학습 데이터에 대한 공격을 기반으로 신경망의 결손 사례를 보여준다. 기존의 AEs 연구들은 이미지를 기반으로 많은 연구가 진행되었다. 보다 복잡한 heterogenous한 PDF 데이터로 연구를 확장하여 generative 기반의 모델을 제안하여 공격 샘플을 생성하였다. 다음으로 이상 패턴을 보이는 샘플을 검출할 수 있는 DNA steganalysis 방어 모델을 제안한다. 마지막으로 개인 정보 보호를 위해 generative 모델 기반의 익명화 기법들을 제안한다. 요약하면 본 논문은 인공지능 모델을 활용한 공격 및 방어 알고리즘과 신경망을 활용하는데 발생되는 프라이버시 이슈를 해결할 수 있는 기계학습 알고리즘에 기반한 일련의 방법론을 제안한다.Abstract i List of Figures vi List of Tables xiii 1 Introduction 1 2 Background 6 2.1 Deep Learning: a brief overview . . . . . . . . . . . . . . . . . . . 6 2.2 Security Attacks on Deep Learning Models . . . . . . . . . . . . . 10 2.2.1 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Poisoning Attack . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Defense Techniques Against Deep Learning Models . . . . . . . . . 26 2.3.1 Defense Techniques against Evasion Attacks . . . . . . . . 27 2.3.2 Defense against Poisoning Attacks . . . . . . . . . . . . . . 36 2.4 Privacy issues on Deep Learning Models . . . . . . . . . . . . . . . 38 2.4.1 Attacks on Privacy . . . . . . . . . . . . . . . . . . . . . . 39 2.4.2 Defenses Against Attacks on Privacy . . . . . . . . . . . . 40 3 Attacks on Deep Learning Models 47 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.2 Portable Document Format (PDF) . . . . . . . . . . . . . . 55 3.1.3 PDF Malware Classifiers . . . . . . . . . . . . . . . . . . . 57 3.1.4 Evasion Attacks . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 60 3.2.2 Feature Selection Process . . . . . . . . . . . . . . . . . . 61 3.2.3 Seed Selection for Mutation . . . . . . . . . . . . . . . . . 62 3.2.4 Evading Model . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.5 Model architecture . . . . . . . . . . . . . . . . . . . . . . 67 3.2.6 PDF Repacking and Verification . . . . . . . . . . . . . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.1 Datasets and Model Training . . . . . . . . . . . . . . . . . 68 3.3.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 CVEs for Various Types of PDF Malware . . . . . . . . . . 72 3.3.4 Malicious Signature . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 AntiVirus Engines (VirusTotal) . . . . . . . . . . . . . . . 76 3.3.6 Feature Mutation Result for Contagio . . . . . . . . . . . . 76 3.3.7 Feature Mutation Result for CVEs . . . . . . . . . . . . . . 78 3.3.8 Malicious Signature Verification . . . . . . . . . . . . . . . 78 3.3.9 Evasion Speed . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.10 AntiVirus Engines (VirusTotal) Result . . . . . . . . . . . . 82 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 Defense on Deep Learning Models 88 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.1 Message-Hiding Regions . . . . . . . . . . . . . . . . . . . 91 4.1.2 DNA Steganography . . . . . . . . . . . . . . . . . . . . . 92 4.1.3 Example of Message Hiding . . . . . . . . . . . . . . . . . 94 4.1.4 DNA Steganalysis . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2.2 Proposed Model Architecture . . . . . . . . . . . . . . . . 103 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 107 4.3.5 Message Hiding Procedure . . . . . . . . . . . . . . . . . . 108 4.3.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . 109 4.3.7 Performance Comparison . . . . . . . . . . . . . . . . . . . 109 4.3.8 Analyzing Malicious Code in DNA Sequences . . . . . . . 112 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5 Privacy: Generative Models for Anonymizing Private Data 115 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.2 Anonymization using GANs . . . . . . . . . . . . . . . . . 119 5.1.3 Security Principle of Anonymized GANs . . . . . . . . . . 123 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 126 5.2.4 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . 126 5.2.5 Comparison to Differential Privacy . . . . . . . . . . . . . 128 5.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 128 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Privacy: Privacy-preserving Inference for Deep Learning Models 132 6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.2 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.3 Deep Private Generation Framework . . . . . . . . . . . . . 137 6.1.4 Security Principle . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.5 Threat to the Classifier . . . . . . . . . . . . . . . . . . . . 143 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2.2 Experimental Process . . . . . . . . . . . . . . . . . . . . . 146 6.2.3 Target Classifiers . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . 149 6.2.6 Performance Comparison . . . . . . . . . . . . . . . . . . . 150 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7 Conclusion 153 7.0.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.0.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157 Abstract in Korean 195Docto

SNU Open Repository and Archive

Transfer learning eﬀects on image steganalysis with pre-trained deep residual neural network model

Author: Özcan Selim
Publication venue
Publication date: 10/09/2019
Field of study

Steganalysis researches for the techniques used to reveal the embedded messages that is hidden in a digital medium -in most cases in images. The research and development activities in Image Steganalysis has gained more traction in recent years. Although machine learning techniques have been used for many years Deep Learning is a new paradigm for the Image Steganalysis domain. The success of the deep learning process is based on the training of the model for a suﬃcient amount of and with a high quality, diverse and large-scale data set. When the training process lacks dataset in terms of quality, variety and quantity, Transfer Learning emerges as an eﬀective solution from Deep Learning methods. In Transfer Learning, an untrained model beneﬁts from a previouslytrainedmodelanditsdataset. Basefunctionisdeﬁnedtotransfertheparameters from the trained model to the untrained model. Hence, it would increase the success of deep learning model on Image Steganalysis. In this work, we compare the results of two series of models that are trained both with and without Transfer Learning method. The optimization method of the model training process is selected as experimental AdamW optimization method. Comparison of training, testing, evaluating and F1 scoring are based on the models trained with diﬀerent steganography payload values which starts from easy to hard to detect. We investigated for the best possible ways of increasing the success rate and decreasing the error rate on detecting stego images and cover images separately with this study. Results showed that transfer learning applied model is more successful on detecting stego images on every diﬀerent rated payload dataset compared to the normal trained model.Declaration of Authorship ii Abstract iv Öz v Acknowledgments vii List of Figures x List of Tables xii Abbreviations xiii 1 Introduction 1 1.1 Issue of Secrecy ................................. 1 1.2 Steganography ................................. 3 1.2.1 History of Steganography ....................... 5 1.2.2 A Very Basic Steganographic Method: Least Signiﬁcant Bit (LSB) 8 1.2.3 An Least Signiﬁcant Bit (LSB) Example ............... 9 1.3 Steganalysis .................................. 10 1.4 Deep Learning ................................. 11 1.4.1 Convolutional Neural Networks .................... 13 1.4.2 Residual Neural Networks ....................... 15 1.4.3 Transfer Learning ........................... 16 1.5 Contributions ................................. 17 1.6 Outline ..................................... 18 2 Related Work 19 3 Proposed Method: Steganalysis via Transfer Learning 22 3.1 Background ................................... 22 3.2 Transfer Learning Applied Model ....................... 24 3.3 Normal Trained Model ............................. 25 4 Evaluation 27 4.1 Research Questions ............................... 27 4.2 Experimental Setup .............................. 27 4.2.1 The Dataset ............................... 27 4.2.2 Test Environment ........................... 37 4.2.3 Discussions ............................... 38 4.3 Performance Evaluation ............................ 39 5 Results 41 5.1 HUGO Test Results .............................. 42 5.2 WOW Test Results ............................... 48 5.3 Result Comparisons .............................. 52 5.3.1 Train Comparisons ........................... 52 5.3.2 Train Validation Comparisons ..................... 54 5.3.3 Evaluation Comparisons ........................ 56 5.3.4 Prediction Comparisons ........................ 58 5.3.5 Precision Comparisons ......................... 60 5.3.6 Recall Comparisons .......................... 61 5.3.7 F1-Score Comparisons ......................... 65 5.3.8 Related Work Comparisons ...................... 66 6 Conclusion 69 A WOW Training Validation Results 71 B Source Codes 82 Bibliography 9

Istanbul Sehir University Repository

System Steganalysis: Implementation Vulnerabilities and Side-Channel Attacks Against Digital Steganography Systems

Author: Sloan Thomas
Publication venue
Publication date: 01/08/2018
Field of study

Steganography is the process of hiding information in plain sight, it is a technology that can be used to hide data and facilitate secret communications. Steganography is commonly seen in the digital domain where the pervasive nature of media content (image, audio, video) provides an ideal avenue for hiding secret information. In recent years, video steganography has shown to be a highly suitable alternative to image and audio steganography due to its potential advantages (capacity, flexibility, popularity). An increased interest towards research in video steganography has led to the development of video stego-systems that are now available to the public. Many of these stego-systems have not yet been subjected to analysis or evaluation, and their capabilities for performing secure, practical, and effective video steganography are unknown. This thesis presents a comprehensive analysis of the state-of-the-art in practical video steganography. Video-based stego-systems are identified and examined using steganalytic techniques (system steganalysis) to determine the security practices of relevant stego-systems. The research in this thesis is conducted through a series of case studies that aim to provide novel insights in the field of steganalysis and its capabilities towards practical video steganography. The results of this work demonstrate the impact of system attacks over the practical state-of-the-art in video steganography. Through this research, it is evident that video-based stego-systems are highly vulnerable and fail to follow many of the well-understood security practices in the field. Consequently, it is possible to confidently detect each stego-system with a high rate of accuracy. As a result of this research, it is clear that current work in practical video steganography demonstrates a failure to address key principles and best practices in the field. Continued efforts to address this will provide safe and secure steganographic technologies

Kent Academic Repository

Optimization of medical image steganography using n-decomposition genetic algorithm

Author: Al-Sarayefi Bushra Abdullah Shtayt
Publication venue
Publication date: 01/01/2023
Field of study

Protecting patients' confidential information is a critical concern in medical image steganography. The Least Significant Bits (LSB) technique has been widely used for secure communication. However, it is susceptible to imperceptibility and security risks due to the direct manipulation of pixels, and ASCII patterns present limitations. Consequently, sensitive medical information is subject to loss or alteration. Despite attempts to optimize LSB, these issues persist due to (1) the formulation of the optimization suffering from non-valid implicit constraints, causing inflexibility in reaching optimal embedding, (2) lacking convergence in the searching process, where the message length significantly affects the size of the solution space, and (3) issues of application customizability where different data require more flexibility in controlling the embedding process. To overcome these limitations, this study proposes a technique known as an n-decomposition genetic algorithm. This algorithm uses a variable-length search to identify the best location to embed the secret message by incorporating constraints to avoid local minimum traps. The methodology consists of five main phases: (1) initial investigation, (2) formulating an embedding scheme, (3) constructing a decomposition scheme, (4) integrating the schemes' design into the proposed technique, and (5) evaluating the proposed technique's performance based on parameters using medical datasets from kaggle.com. The proposed technique showed resistance to statistical analysis evaluated using Reversible Statistical (RS) analysis and histogram. It also demonstrated its superiority in imperceptibility and security measured by MSE and PSNR to Chest and Retina datasets (0.0557, 0.0550) and (60.6696, 60.7287), respectively. Still, compared to the results obtained by the proposed technique, the benchmark outperforms the Brain dataset due to the homogeneous nature of the images and the extensive black background. This research has contributed to genetic-based decomposition in medical image steganography and provides a technique that offers improved security without compromising efficiency and convergence. However, further validation is required to determine its effectiveness in real-world applications

Universiti Utara Malaysia: UUM eTheses

Introductory Computer Forensics

Author: Xiaodong Lin
Publication venue: Springer Nature
Publication date: 27/04/2020
Field of study

INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic

Open Library

Information Analysis for Steganography and Steganalysis in 3D Polygonal Meshes

Author: YANG YING
Publication venue
Publication date: 01/01/2013
Field of study

Information hiding, which embeds a watermark/message over a cover signal, has recently found extensive applications in, for example, copyright protection, content authentication and covert communication. It has been widely considered as an appealing technology to complement conventional cryptographic processes in the field of multimedia security by embedding information into the signal being protected. Generally, information hiding can be classified into two categories: steganography and watermarking. While steganography attempts to embed as much information as possible into a cover signal, watermarking tries to emphasize the robustness of the embedded information at the expense of embedding capacity. In contrast to information hiding, steganalysis aims at detecting whether a given medium has hidden message in it, and, if possible, recover that hidden message. It can be used to measure the security performance of information hiding techniques, meaning a steganalysis resistant steganographic/watermarking method should be imperceptible not only to Human Vision Systems (HVS), but also to intelligent analysis. As yet, 3D information hiding and steganalysis has received relatively less attention compared to image information hiding, despite the proliferation of 3D computer graphics models which are fairly promising information carriers. This thesis focuses on this relatively neglected research area and has the following primary objectives: 1) to investigate the trade-off between embedding capacity and distortion by considering the correlation between spatial and normal/curvature noise in triangle meshes; 2) to design satisfactory 3D steganographic algorithms, taking into account this trade-off; 3) to design robust 3D watermarking algorithms; 4) to propose a steganalysis framework for detecting the existence of the hidden information in 3D models and introduce a universal 3D steganalytic method under this framework. %and demonstrate the performance of the proposed steganalysis by testing it against six well-known 3D steganographic/watermarking methods. The thesis is organized as follows. Chapter 1 describes in detail the background relating to information hiding and steganalysis, as well as the research problems this thesis will be studying. Chapter 2 conducts a survey on the previous information hiding techniques for digital images, 3D models and other medium and also on image steganalysis algorithms. Motivated by the observation that the knowledge of the spatial accuracy of the mesh vertices does not easily translate into information related to the accuracy of other visually important mesh attributes such as normals, Chapters 3 and 4 investigate the impact of modifying vertex coordinates of 3D triangle models on the mesh normals. Chapter 3 presents the results of an empirical investigation, whereas Chapter 4 presents the results of a theoretical study. Based on these results, a high-capacity 3D steganographic algorithm capable of controlling embedding distortion is also presented in Chapter 4. In addition to normal information, several mesh interrogation, processing and rendering algorithms make direct or indirect use of curvature information. Motivated by this, Chapter 5 studies the relation between Discrete Gaussian Curvature (DGC) degradation and vertex coordinate modifications. Chapter 6 proposes a robust watermarking algorithm for 3D polygonal models, based on modifying the histogram of the distances from the model vertices to a point in 3D space. That point is determined by applying Principal Component Analysis (PCA) to the cover model. The use of PCA makes the watermarking method robust against common 3D operations, such as rotation, translation and vertex reordering. In addition, Chapter 6 develops a 3D specific steganalytic algorithm to detect the existence of the hidden messages embedded by one well-known watermarking method. By contrast, the focus of Chapter 7 will be on developing a 3D watermarking algorithm that is resistant to mesh editing or deformation attacks that change the global shape of the mesh. By adopting a framework which has been successfully developed for image steganalysis, Chapter 8 designs a 3D steganalysis method to detect the existence of messages hidden in 3D models with existing steganographic and watermarking algorithms. The efficiency of this steganalytic algorithm has been evaluated on five state-of-the-art 3D watermarking/steganographic methods. Moreover, being a universal steganalytic algorithm can be used as a benchmark for measuring the anti-steganalysis performance of other existing and most importantly future watermarking/steganographic algorithms. Chapter 9 concludes this thesis and also suggests some potential directions for future work

Durham e-Theses

New watermarking methods for digital images.

Author
Publication venue: University of Northern British Columbia
Publication date: 01/01/2013
Field of study

The phenomenal spread of the Internet places an enormous demand on content-ownership-validation. In this thesis, four new image-watermarking methods are presented. One method is based on discrete-wavelet-transformation (DWT) only while the rest are based on DWT and singular-value-decomposition (SVD) ensemble. The main target for this thesis is to reach a new blind-watermarking-method. Method IV presents such watermark using QR-codes. The use of QR-codes in watermarking is novel. The choice of such application is based on the fact that QR-Codes have errors self-correction-capability of 5% or higher which satisfies the nature of digital-image-processing. Results show that the proposed-methods introduced minimal distortion to the watermarked images as compared to other methods and are robust against JPEG, resizing and other attacks. Moreover, watermarking-method-II provides a solution to the detection of false watermark in the literature. Finally, method IV presents a new QR-code guided watermarking-approach that can be used as a steganography as well. --Leaf ii.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b183575

Arca British Columbia's network of post-secondary digital repositories

Method for Effective PDF Files Manipulation Detection

Author: Fernández Bascuñana Gema
Publication venue
Publication date: 01/01/2017
Field of study

Käesoleva magistritöö eesmärgiks on lihtsustada PDF failides tehtud muudatuste tuvastamise protsessi kasutades faili lähtekoodi enne, kui liigutakse edasi teiste meetodite juurde nagu näiteks pilditöötlus. Lähtekoodi analüüs on mõeldud esimeseks sammuks, mis võimaldab säästa palju uurijate aega ning pakkuda rohkem tõestusmaterjali muudatuste tegemise kohta asitõendiks oleva digitaalse faili kohta. Magistritöö tulemusel valmib põhjalik ja efektiivne metoodika PDF failide terviklikkuse uurimiseks ja analüüsimiseks. Püstitatud eesmärgi saavutamiseks õpitakse kõigepealt tundma PDF faili ehitust mõistmaks faili struktuuri ja komponente. Seejärel tehakse ridamisi muudatusi faili lähtekoodis, mis võimaldab süveneda faili varjatud külgedesse ja leida haavatavaid kohti ning millest saadav informatsioon on abiks metoodika aluste paika panemisel. Failide enamlevinud muutmise tüüpide uurimisel saadakse kogum andmeid, millede suhtes hakatakse võrdlema uurimise all olevaid faile ning seeläbi testitakse faili tõepärasust. Lisaks otsitakse vabavaralisi tarkvarasid, millega antud ülesannet lahendada. Töö lõpetatakse kontrollkatsetega, sealhulgas hinnatakse saadud tulemusi ja märgitakse ära tuleviku tegevussuunad antud valdkonnas.The aim of this thesis is to ease the process of detecting manipulations in PDF files by addressing its source code, before having to use other methods such as image processing or text-line examination. It is intended to be a previous step to tackle, which can save a lot of time to examiners and provide them with more proof of manipulations regarding digital file evidence. The result is the construction of a solid and effective method for PDF file investigation and analysis to determine its integrity. To achieve this goal, a study of PDF file anatomy will be conducted firstly, in order to become familiar with the structure and composition of this file format. Afterwards, a series of manipulations performed directly against the file source code will deepen in its secrets and vulnerabilities, and will therefore help in setting the foundations for the method. Finally, a study on the most common types of file manipulations will lead to a set of layouts to which compare the files under investigation and thus, test its veracity, complemented with a quest for specialised open source tools to accomplish this task; a set of validation experiments will complete the work, evaluating the obtained results and stating future lines of work in this field

DSpace at Tartu University Library