2,289 research outputs found

    Deep learning for printed document source identification

    Get PDF
    Karena perkembangan teknologi informasi yang sangat pesat dan penggunaan internet yang luas, Informasi dengan mudah diperoleh dalam bentuk format digital. Konten digital dapat dengan bebas dicetak ke dalam dokumen karena kemudahan dan aksesibilitas printer. Di sisi lain, dokumen tercetak dapat dimanipulasi secara ilegal oleh beberapa masalah kriminal seperti: dokumen palsu, mata uang palsu, pelanggaran hak cipta, dan sebagainya. Oleh karena itu, bagaimana mengembangkan alat pengujian keamanan yang efisien dan tepat untuk mengidentifikasi sumber dokumen tercetak merupakan tugas penting untuk sementara. Saat ini, sistem forensik dengan menggunakan metode statistik dan dukungan teknologi mesin vektor telah mampu mengidentifikasi sumber printer untuk dokumen teks dan gambar. Pendekatan semacam itu termasuk dalam kategori pembelajaran mesin dangkal dengan interaksi manusia selama tahap ekstraksi fitur, pemilihan fitur, dan pra-pemrosesan data. Dalam makalah ini, sistem deep learning untuk memecahkan masalah klasifikasi citra yang kompleks dikembangkan oleh Convolutional Neural Networks (CNNs) dari deep learning yang dapat mempelajari fitur secara otomatis. Eksperimen sistematis telah dilakukan untuk kedua sistem. Untuk dokumen mikroskopis, sistem SVM berbasis fitur mengungguli sistem pembelajaran mendalam dengan celah terbatas. Untuk dokumen yang dipindai, kedua sistem dapat mencapai hasil yang sama baiknya dengan akurasi yang tinggi. Kedua sistem harus terus dievaluasi dan dibandingkan untuk kepentingan terbaik dalam pemanfaatan universal

    Attacking and Defending Printer Source Attribution Classifiers in the Physical Domain

    Get PDF
    The security of machine learning classifiers has received increasing attention in the last years. In forensic applications, guaranteeing the security of the tools investigators rely on is crucial, since the gathered evidence may be used to decide about the innocence or the guilt of a suspect. Several adversarial attacks were proposed to assess such security, with a few works focusing on transferring such attacks from the digital to the physical domain. In this work, we focus on physical domain attacks against source attribution of printed documents. We first show how a simple reprinting attack may be sufficient to fool a model trained on images that were printed and scanned only once. Then, we propose a hardened version of the classifier trained on the reprinted attacked images. Finally, we attack the hardened classifier with several attacks, including a new attack based on the Expectation Over Transformation approach, which finds the adversarial perturbations by simulating the physical transformations occurring when the image attacked in the digital domain is printed again. The results we got demonstrate a good capability of the hardened classifier to resist attacks carried out in the physical domai

    Digital Forensics for Skulls Classification in Physical Anthropology Collection Management

    Get PDF
    Ukuran, bentuk, dan karakteristik fisik tengkorak manusia berbeda jika mempertimbangkan manusia secara individu. Dalam antropologi fisik, pengelolaan koleksi tengkorak yang akurat sangat penting untuk menyimpan dan memelihara koleksi dengan cara yang hemat biaya. Misalnya, pelabelan tengkorak yang tidak akurat atau menempelkan label cetak pada tengkorak dapat memengaruhi keaslian koleksi. Mengingat beberapa masalah yang terkait dengan identifikasi tengkorak secara manual, kami mengusulkan pendekatan klasifikasi tengkorak manusia otomatis yang menggunakan mesin vektor pendukung dan metode ekstraksi fitur yang berbeda seperti fitur matriks co-occurrence tingkat abu-abu, fitur Gabor, fitur fraktal, wavelet diskrit transformasi, dan kombinasi fitur. Setiap tulang wajah di bawahnya menunjukkan karakteristik unik yang penting bagi struktur fisik wajah yang dapat dimanfaatkan untuk identifikasi. Oleh karena itu, kami mengembangkan metode pengenalan otomatis untuk mengklasifikasikan tengkorak manusia untuk identifikasi yang konsisten dibandingkan dengan pendekatan klasifikasi tradisional. Dengan menggunakan pendekatan yang kami usulkan, kami dapat mencapai akurasi 92,3–99,5% dalam klasifikasi tengkorak manusia dengan mandibula dan akurasi 91,4–99,9% dalam klasifikasi keterampilan manusia tanpa mandibula. Studi kami merupakan langkah maju dalam pembangunan sistem identifikasi tengkorak manusia otomatis yang efektif dengan proses klasifikasi yang mencapai kinerja yang memuaskan untuk kumpulan data gambar tengkorak yang terbatas

    Printed document integrity verification using barcode

    Get PDF
    Printed documents are still relevant in our daily life and information in it must be protected from threats and attacks such as forgery, falsification or unauthorized modification. Such threats make the document lose its integrity and authenticity. There are several techniques that have been proposed and used to ensure authenticity and originality of printed documents. But some of the techniques are not suitable for public use due to its complexity, hard to obtain special materials to secure the document and expensive. This paper discuss several techniques for printed document security such as watermarking and barcode as well as the usability of two dimensional barcode in document authentication and data compression with the barcode. A conceptual solution that are simple and efficient to secure the integrity and document sender's authenticity is proposed that uses two dimensional barcode to carry integrity and authenticity information in the document. The information stored in the barcode contains digital signature that provides sender's authenticity and hash value that can ensure the integrity of the printed document

    Um método supervisionado para encontrar variáveis discriminantes na análise de problemas complexos : estudos de caso em segurança do Android e em atribuição de impressora fonte

    Get PDF
    Orientadores: Ricardo Dahab, Anderson de Rezende RochaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A solução de problemas onde muitos componentes atuam e interagem simultaneamente requer modelos de representação nem sempre tratáveis pelos métodos analíticos tradicionais. Embora em muitos caso se possa prever o resultado com excelente precisão através de algoritmos de aprendizagem de máquina, a interpretação do fenómeno requer o entendimento de quais são e em que proporção atuam as variáveis mais importantes do processo. Esta dissertação apresenta a aplicação de um método onde as variáveis discriminantes são identificadas através de um processo iterativo de ranqueamento ("ranking") por eliminação das que menos contribuem para o resultado, avaliando-se em cada etapa o impacto da redução de características nas métricas de acerto. O algoritmo de florestas de decisão ("Random Forest") é utilizado para a classificação e sua propriedade de importância das características ("Feature Importance") para o ranqueamento. Para a validação do método, dois trabalhos abordando sistemas complexos de natureza diferente foram realizados dando origem aos artigos aqui apresentados. O primeiro versa sobre a análise das relações entre programas maliciosos ("malware") e os recursos requisitados pelos mesmos dentro de um ecossistema de aplicações no sistema operacional Android. Para realizar esse estudo, foram capturados dados, estruturados segundo uma ontologia definida no próprio artigo (OntoPermEco), de 4.570 aplicações (2.150 malware, 2.420 benignas). O modelo complexo produziu um grafo com cerca de 55.000 nós e 120.000 arestas, o qual foi transformado usando-se a técnica de bolsa de grafos ("Bag Of Graphs") em vetores de características de cada aplicação com 8.950 elementos. Utilizando-se apenas os dados do manifesto atingiu-se com esse modelo 88% de acurácia e 91% de precisão na previsão do comportamento malicioso ou não de uma aplicação, e o método proposto foi capaz de identificar 24 características relevantes na classificação e identificação de famílias de malwares, correspondendo a 70 nós no grafo do ecosistema. O segundo artigo versa sobre a identificação de regiões em um documento impresso que contém informações relevantes na atribuição da impressora laser que o imprimiu. O método de identificação de variáveis discriminantes foi aplicado sobre vetores obtidos a partir do uso do descritor de texturas (CTGF-"Convolutional Texture Gradient Filter") sobre a imagem scaneada em 600 DPI de 1.200 documentos impressos em 10 impressoras. A acurácia e precisão médias obtidas no processo de atribuição foram de 95,6% e 93,9% respectivamente. Após a atribuição da impressora origem a cada documento, 8 das 10 impressoras permitiram a identificação de variáveis discriminantes associadas univocamente a cada uma delas, podendo-se então visualizar na imagem do documento as regiões de interesse para uma análise pericial. Os objetivos propostos foram atingidos mostrando-se a eficácia do método proposto na análise de dois problemas em áreas diferentes (segurança de aplicações e forense digital) com modelos complexos e estruturas de representação bastante diferentes, obtendo-se um modelo reduzido interpretável para ambas as situaçõesAbstract: Solving a problem where many components interact and affect results simultaneously requires models which sometimes are not treatable by traditional analytic methods. Although in many cases the result is predicted with excellent accuracy through machine learning algorithms, the interpretation of the phenomenon requires the understanding of how the most relevant variables contribute to the results. This dissertation presents an applied method where the discriminant variables are identified through an iterative ranking process. In each iteration, a classifier is trained and validated discarding variables that least contribute to the result and evaluating in each stage the impact of this reduction in the classification metrics. Classification uses the Random Forest algorithm, and the discarding decision applies using its feature importance property. The method handled two works approaching complex systems of different nature giving rise to the articles presented here. The first article deals with the analysis of the relations between \textit{malware} and the operating system resources requested by them within an ecosystem of Android applications. Data structured according to an ontology defined in the article (OntoPermEco) were captured to carry out this study from 4,570 applications (2,150 malware, 2,420 benign). The complex model produced a graph of about 55,000 nodes and 120,000 edges, which was transformed using the Bag of Graphs technique into feature vectors of each application with 8,950 elements. The work accomplished 88% of accuracy and 91% of precision in predicting malicious behavior (or not) for an application using only the data available in the application¿s manifest, and the proposed method was able to identify 24 relevant features corresponding to only 70 nodes of the entire ecosystem graph. The second article is about to identify regions in a printed document that contains information relevant to the attribution of the laser printer that printed it. The discriminant variable determination method achieved average accuracy and precision of 95.6% and 93.9% respectively in the source printer attribution using a dataset of 1,200 documents printed on ten printers. Feature vectors were obtained from the scanned image at 600 DPI applying the texture descriptor Convolutional Texture Gradient Filter (CTGF). After the assignment of the source printer to each document, eight of the ten printers allowed the identification of discriminant variables univocally associated to each one of them, and it was possible to visualize in document's image the regions of interest for expert analysis. The work in both articles accomplished the objective of reducing a complex system into an interpretable streamlined model demonstrating the effectiveness of the proposed method in the analysis of two problems in different areas (application security and digital forensics) with complex models and entirely different representation structuresMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
    corecore