11 research outputs found

    Bytewise Approximate Matching: The Good, The Bad, and The Unknown

    Get PDF
    Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature

    Effectiveness of Similarity Digest Algorithms for Binary Code Similarity in Memory Forensic Analysis

    Get PDF
    Hoy en dı́a, cualquier organización que esté conectada a Internet es susceptible de sufrir incidentes de ciberseguridad y por tanto, debe contar con un plan de respuesta a incidentes. Este plan ayuda a prevenir, detectar, priorizar y gestionar los incidentes de ciberseguridad. Uno de los pasos para gestionar estos incidentes es la fase de eliminación, que se encarga de neutralizar la persistencia de los ataques, evaluar el alcance de los mismos e identificar el grado de compromiso. Uno de los puntos clave de esta fase es la identicación mediante triaje de la información que es relevante en el incidente. Esto suele hacerse comparando los elementos disponibles con información conocida, centrándose ası́ en aquellos elementos que tienen relevancia para la investigación (llamados evidencias).Este objetivo puede alcanzarse estudiando dos fuentes de información. Por un lado, mediante el análisis de los datos persistentes, como los datos de los discos duros o los dispositivos USB. Por otro lado, mediante el análisis de los datos volátiles, como los datos de la memoria RAM. A diferencia del análisis de datos persistentes, el análisis de datos volátiles permite determinar el alcance de algunos tipos de ataque que no guardan su código en dispositivos de persistencia o cuando los archivos ejecutables almacenados en el disco están cifrados; cuyo código sólo se muestra cuando está en la memoria y se está ejecutado.Existe una limitación en el uso de hashes criptográficos, comúnmente utilizados en el caso de identificación de evidencias en datos persistentes, para identificar evidencias de memoria. Esta limitación se debe a que las evidencias nunca serán idénticas porque la ejecución modifica el contenido de la memoria constantemente. Además, es imposible adquirir la memoria más de una vez con todos los programas en el mismo punto de ejecución. Por lo tanto, los hashes son un método de identificación inválido para el triaje de memoria. Como solución a este problema, en esta tesis se propone el uso de algoritmos de similitud de digest, que miden la similitud entre dos entradas de manera aproximada.Las principales aportaciones de esta tesis son tres. En primer lugar, se realiza un estudio del dominio del problema en el que se evalúa la gestión de la memoria y la modificación de la misma en ejecución. A continuación, se estudian los algoritmos de similitud de digest, desarrollando una clasificación de sus fases y de los ataques contra estos algoritmos, correlacionando las caracterı́sticas de la primera clasificación con los ataques identificados. Por último, se proponen dos métodos de preprocesamiento del contenido de volcados de memoria para mejorar la identificación de los elementos de interés para el análisis.Como conclusión, en esta tesis se muestra que la modificación de bytes dispersos afecta negativamente a los cálculos de similitud entre evidencias de memoria. Esta modificación se produce principalmente por el gestor de memoria del sistema operativo. Además, se muestra que las técnicas propuestas para preprocesar el contenido de volcados de memoria permiten mejorar el proceso de identificación de evidencias en memoria.<br /

    Security Analysis of MVhash-B Similarity Hashing

    Get PDF
    In the era of big data, the volume of digital data is increasing rapidly, causing new challenges for investigators to examine the same in a reasonable amount of time. A major requirement of modern forensic investigation is the ability to perform automatic filtering of correlated data, and thereby reducing and focusing the manual effort of the investigator. Approximate matching is a technique to find “closeness” between two digital artifacts. mvHash-B is a well-known approximate matching scheme used for finding similarity between two digital objects and produces a ‘score of similarity’ on a scale of 0 to 100. However, no security analysis of mvHash-B is available in the literature. In this work, we perform the first academic security analysis of this algorithm and show that it is possible for an attacker to “fool” it by causing the similarity score to be close to zero even when the objects are very similar. By similarity of the objects, we mean semantic similarity for text and visual match for images. The designers of mvHash-B had claimed that the scheme is secure against ‘active manipulation’. We contest this claim in this work. We propose an algorithm that starts with a given document and produces another one of the same size without influencing its semantic and visual meaning (for text and image files, respectively) but which has low similarity score as measured by mvHash-B. In our experiments, we show that the similarity score can be reduced from 100 to less than 6 for text and image documents. We performed experiments with 50 text files and 200 images and the average similarity score between the original file and the file produced by our algorithm was found to be 4 for text files and 6 for image files. In fact, if the original file size is small then the similarity score between the two files was close to 0, almost always. To improve the security of mvHash-B against active adversaries, we propose a modification in the scheme. We show that the modification prevents the attack we describe in this work

    Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities

    Get PDF
    This research aims to examine the effectiveness and efficiency of fuzzing hashing algorithm in the identification of similarities in Malware Analysis. More precisely, it will present the benefit of using fuzzy hashing algorithms, such as ssdeep, sdhash, mvHash and mrsh – v2, in identifying similarities in Malware domain. The obtained results will be compared with the traditional and most common Cryptographic Hashes, such as the MD5, SHA-1 and SHA-256. Furthermore, it will highlight the pros and cons of fuzzy and cryptographic hashing, as well as their adoption in real world applications

    Professor Frank Breitinger\u27s Full Bibliography

    Get PDF

    Um estudo sobre pareamento aproximado para busca por similaridade : técnicas, limitações e melhorias para investigações forenses digitais

    Get PDF
    Orientador: Marco Aurélio Amaral HenriquesTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A forense digital é apenas um dos ramos da Ciência da Computação que visa investigar e analisar dispositivos eletrônicos na busca por evidências de crimes. Com o rápido aumento da capacidade de armazenamento de dados, é necessário o uso de procedimentos automatizados para lidar com o grande volume de dados disponíveis atualmente, principalmente em investigações forenses, nas quais o tempo é um recurso escasso. Uma possível abordagem para tornar o processo mais eficiente é através da técnica KFF (Filtragem por arquivos conhecidos - Known File Filtering), onde uma lista de objetos de interesse é usada para reduzir/separar dados para análise. Com um banco de dados de hashes destes objetos, o examinador realiza buscas no dispositivo de destino sob investigação por qualquer item que seja igual ao buscado. No entanto, devido a limitações nas funções criptográficas de hash (incapacidade de detectar objetos semelhantes), novos métodos foram projetados baseando-se em funções de Pareamento Aproximado (ou Approximate Matching) (AM). Estas funções aparecem como candidatos para realizar buscas uma vez que elas têm a capacidade de identificar similaridade (no nível de bits) de uma maneira muito eficiente, criando e comparando representações compactas de objetos (conhecidos como resumos). Neste trabalho, apresentamos as funções de Pareamento Aproximado. Mostramos algumas das ferramentas de AM mais conhecidas e apresentamos as Estratégias de Busca por Similaridade baseadas em resumos, capazes de realizar a busca de similaridade (usando AM) de maneira mais eficiente, principalmente ao lidar com grandes conjuntos de dados. Realizamos também uma análise detalhada das estratégias atuais e, dado que as mesmas trabalham somente com algumas ferramentas específicas de AM, nós propomos uma nova abordagem baseada em uma ferramenta diferente que possui boas características para investigações forenses. Além disso, abordamos algumas limitações das ferramentas atuais de AM em relação ao processo de detecção de similaridade, onde muitas comparações apontadas como semelhantes, são de fato falsos positivos; as ferramentas geralmente são enganadas por blocos comuns (dados comuns em muitos objetos diferentes). Ao remover estes blocos dos resumos de AM, obtemos melhorias significativas na detecção de objetos similares. Também apresentamos neste trabalho uma análise teórica detalhada das capacidades de detecção da ferramenta de AM sdhash e propomos melhorias em sua função de comparação, onde a versão aprimorada apresenta uma medida de similaridade (score) mais precisa. Por último, novas aplicações de AM são apresentadas e analisadas: uma de identificação rápida de arquivos por meio de amostragem de dados e outra de identificação eficiente de impressões digitais. Esperamos que profissionais da área forense e de outras áreas relacionadas se beneficiem de nosso estudo sobre AM para resolver seus problemasAbstract: Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. With the rapid increase in data storage capacity, the use of automated procedures to handle the massive volume of data available nowadays is required, especially in forensic investigations, in which time is a scarce resource. One possible approach to make the process more efficient is the Known File Filter (KFF) technique, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device under investigation. However, due to limitations over cryptographic hash functions (inability to detect similar objects), new methods have been designed based on Approximate Matching (AM). They appear as suitable candidates to perform this process because of their ability to identify similarity (bytewise level) in a very efficient way, by creating and comparing compact representations of objects (a.k.a. digests). In this work, we present the Approximate Matching functions. We show some of the most known AM tools and present the Similarity Digest Search Strategies (SDSS), capable of performing the similarity search (using AM) more efficiently, especially when dealing with large data sets. We perform a detailed analysis of current SDSS approaches and, given that current strategies only work for a few particular AM tools, we propose a new strategy based on a different tool that has good characteristics for forensic investigations. Furthermore, we address some limitations of current AM tools regarding the similarity detection process, where many matches pointed out as similar, are indeed false positives; the tools are usually misled by common blocks (pieces of data common in many different objects). By removing such blocks from AM digests, we obtain significant improvements in the detection of similar data. We also present a detailed theoretical analysis of the capabilities of sdhash AM tool and provide some improvements to its comparison function, where our improved version has a more precise similarity measure (score). Lastly, new applications of AM are presented and analyzed: One for fast file identification based on data samples and another for efficient fingerprint identification. We hope that practitioners in the forensics field and other related areas will benefit from our studies on AM when solving their problemsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica23038.007604/2014-69CAPE

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Applied Methuerstic computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Actas de las VI Jornadas Nacionales (JNIC2021 LIVE)

    Get PDF
    Estas jornadas se han convertido en un foro de encuentro de los actores más relevantes en el ámbito de la ciberseguridad en España. En ellas, no sólo se presentan algunos de los trabajos científicos punteros en las diversas áreas de ciberseguridad, sino que se presta especial atención a la formación e innovación educativa en materia de ciberseguridad, y también a la conexión con la industria, a través de propuestas de transferencia de tecnología. Tanto es así que, este año se presentan en el Programa de Transferencia algunas modificaciones sobre su funcionamiento y desarrollo que han sido diseñadas con la intención de mejorarlo y hacerlo más valioso para toda la comunidad investigadora en ciberseguridad

    Building information modeling – A game changer for interoperability and a chance for digital preservation of architectural data?

    Get PDF
    Digital data associated with the architectural design-andconstruction process is an essential resource alongside -and even past- the lifecycle of the construction object it describes. Despite this, digital architectural data remains to be largely neglected in digital preservation research – and vice versa, digital preservation is so far neglected in the design-and-construction process. In the last 5 years, Building Information Modeling (BIM) has seen a growing adoption in the architecture and construction domains, marking a large step towards much needed interoperability. The open standard IFC (Industry Foundation Classes) is one way in which data is exchanged in BIM processes. This paper presents a first digital preservation based look at BIM processes, highlighting the history and adoption of the methods as well as the open file format standard IFC (Industry Foundation Classes) as one way to store and preserve BIM data
    corecore