Search CORE

11 research outputs found

Bytewise Approximate Matching: The Good, The Bad, and The Unknown

Author: Baggili Ibrahim
Breitinger Frank
Harichandran Vikram S.
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature

Digital Commons @ New Haven

Embry-Riddle Aeronautical University

Effectiveness of Similarity Digest Algorithms for Binary Code Similarity in Memory Forensic Analysis

Author: Martín Pérez Miguel
Rodríguez Hernández Ricardo Julio
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2022
Field of study

Hoy en dı́a, cualquier organización que esté conectada a Internet es susceptible de sufrir incidentes de ciberseguridad y por tanto, debe contar con un plan de respuesta a incidentes. Este plan ayuda a prevenir, detectar, priorizar y gestionar los incidentes de ciberseguridad. Uno de los pasos para gestionar estos incidentes es la fase de eliminación, que se encarga de neutralizar la persistencia de los ataques, evaluar el alcance de los mismos e identificar el grado de compromiso. Uno de los puntos clave de esta fase es la identicación mediante triaje de la información que es relevante en el incidente. Esto suele hacerse comparando los elementos disponibles con información conocida, centrándose ası́ en aquellos elementos que tienen relevancia para la investigación (llamados evidencias).Este objetivo puede alcanzarse estudiando dos fuentes de información. Por un lado, mediante el análisis de los datos persistentes, como los datos de los discos duros o los dispositivos USB. Por otro lado, mediante el análisis de los datos volátiles, como los datos de la memoria RAM. A diferencia del análisis de datos persistentes, el análisis de datos volátiles permite determinar el alcance de algunos tipos de ataque que no guardan su código en dispositivos de persistencia o cuando los archivos ejecutables almacenados en el disco están cifrados; cuyo código sólo se muestra cuando está en la memoria y se está ejecutado.Existe una limitación en el uso de hashes criptográficos, comúnmente utilizados en el caso de identificación de evidencias en datos persistentes, para identificar evidencias de memoria. Esta limitación se debe a que las evidencias nunca serán idénticas porque la ejecución modifica el contenido de la memoria constantemente. Además, es imposible adquirir la memoria más de una vez con todos los programas en el mismo punto de ejecución. Por lo tanto, los hashes son un método de identificación inválido para el triaje de memoria. Como solución a este problema, en esta tesis se propone el uso de algoritmos de similitud de digest, que miden la similitud entre dos entradas de manera aproximada.Las principales aportaciones de esta tesis son tres. En primer lugar, se realiza un estudio del dominio del problema en el que se evalúa la gestión de la memoria y la modificación de la misma en ejecución. A continuación, se estudian los algoritmos de similitud de digest, desarrollando una clasificación de sus fases y de los ataques contra estos algoritmos, correlacionando las caracterı́sticas de la primera clasificación con los ataques identificados. Por último, se proponen dos métodos de preprocesamiento del contenido de volcados de memoria para mejorar la identificación de los elementos de interés para el análisis.Como conclusión, en esta tesis se muestra que la modificación de bytes dispersos afecta negativamente a los cálculos de similitud entre evidencias de memoria. Esta modificación se produce principalmente por el gestor de memoria del sistema operativo. Además, se muestra que las técnicas propuestas para preprocesar el contenido de volcados de memoria permiten mejorar el proceso de identificación de evidencias en memoria.<br /

Repositorio Universidad de Zaragoza

Security Analysis of MVhash-B Similarity Hashing

Author: Chang Donghoon
Sanadhya Somitra
Singh Monika
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

In the era of big data, the volume of digital data is increasing rapidly, causing new challenges for investigators to examine the same in a reasonable amount of time. A major requirement of modern forensic investigation is the ability to perform automatic filtering of correlated data, and thereby reducing and focusing the manual effort of the investigator. Approximate matching is a technique to find “closeness” between two digital artifacts. mvHash-B is a well-known approximate matching scheme used for finding similarity between two digital objects and produces a ‘score of similarity’ on a scale of 0 to 100. However, no security analysis of mvHash-B is available in the literature. In this work, we perform the first academic security analysis of this algorithm and show that it is possible for an attacker to “fool” it by causing the similarity score to be close to zero even when the objects are very similar. By similarity of the objects, we mean semantic similarity for text and visual match for images. The designers of mvHash-B had claimed that the scheme is secure against ‘active manipulation’. We contest this claim in this work. We propose an algorithm that starts with a given document and produces another one of the same size without influencing its semantic and visual meaning (for text and image files, respectively) but which has low similarity score as measured by mvHash-B. In our experiments, we show that the similarity score can be reduced from 100 to less than 6 for text and image documents. We performed experiments with 50 text files and 200 images and the average similarity score between the original file and the file produced by our algorithm was found to be 4 for text files and 6 for image files. In fact, if the original file size is small then the similarity score between the two files was close to 0, almost always. To improve the security of mvHash-B against active adversaries, we propose a modification in the scheme. We show that the modification prevents the attack we describe in this work

Embry-Riddle Aeronautical University

Forensic Malware Analysis: The Value of Fuzzy Hashing Algorithms in Identifying Similarities

Author: Al-Nemrat A.
Al-Nemrat A.
Arabiat Omar
Arabiat Omar
Benzaid Chafika
Benzaid Chafika
Sarantinos Nikolaos
Sarantinos Nikolaos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

This research aims to examine the effectiveness and efficiency of fuzzing hashing algorithm in the identification of similarities in Malware Analysis. More precisely, it will present the benefit of using fuzzy hashing algorithms, such as ssdeep, sdhash, mvHash and mrsh – v2, in identifying similarities in Malware domain. The obtained results will be compared with the traditional and most common Cryptographic Hashes, such as the MD5, SHA-1 and SHA-256. Furthermore, it will highlight the pros and cons of fuzzy and cryptographic hashing, as well as their adoption in real world applications

UEL Research Repository at University of East London

Crossref

Professor Frank Breitinger\u27s Full Bibliography

Author: Breitinger Frank
Publication venue: Digital Commons @ New Haven
Publication date: 01/10/2015
Field of study

Digital Commons @ New Haven

Um estudo sobre pareamento aproximado para busca por similaridade : técnicas, limitações e melhorias para investigações forenses digitais

Author: Moia Vitor Hugo Galhardo, 1990-
Publication venue: [s.n.]
Publication date: 29/07/2020
Field of study

Orientador: Marco Aurélio Amaral HenriquesTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A forense digital é apenas um dos ramos da Ciência da Computação que visa investigar e analisar dispositivos eletrônicos na busca por evidências de crimes. Com o rápido aumento da capacidade de armazenamento de dados, é necessário o uso de procedimentos automatizados para lidar com o grande volume de dados disponíveis atualmente, principalmente em investigações forenses, nas quais o tempo é um recurso escasso. Uma possível abordagem para tornar o processo mais eficiente é através da técnica KFF (Filtragem por arquivos conhecidos - Known File Filtering), onde uma lista de objetos de interesse é usada para reduzir/separar dados para análise. Com um banco de dados de hashes destes objetos, o examinador realiza buscas no dispositivo de destino sob investigação por qualquer item que seja igual ao buscado. No entanto, devido a limitações nas funções criptográficas de hash (incapacidade de detectar objetos semelhantes), novos métodos foram projetados baseando-se em funções de Pareamento Aproximado (ou Approximate Matching) (AM). Estas funções aparecem como candidatos para realizar buscas uma vez que elas têm a capacidade de identificar similaridade (no nível de bits) de uma maneira muito eficiente, criando e comparando representações compactas de objetos (conhecidos como resumos). Neste trabalho, apresentamos as funções de Pareamento Aproximado. Mostramos algumas das ferramentas de AM mais conhecidas e apresentamos as Estratégias de Busca por Similaridade baseadas em resumos, capazes de realizar a busca de similaridade (usando AM) de maneira mais eficiente, principalmente ao lidar com grandes conjuntos de dados. Realizamos também uma análise detalhada das estratégias atuais e, dado que as mesmas trabalham somente com algumas ferramentas específicas de AM, nós propomos uma nova abordagem baseada em uma ferramenta diferente que possui boas características para investigações forenses. Além disso, abordamos algumas limitações das ferramentas atuais de AM em relação ao processo de detecção de similaridade, onde muitas comparações apontadas como semelhantes, são de fato falsos positivos; as ferramentas geralmente são enganadas por blocos comuns (dados comuns em muitos objetos diferentes). Ao remover estes blocos dos resumos de AM, obtemos melhorias significativas na detecção de objetos similares. Também apresentamos neste trabalho uma análise teórica detalhada das capacidades de detecção da ferramenta de AM sdhash e propomos melhorias em sua função de comparação, onde a versão aprimorada apresenta uma medida de similaridade (score) mais precisa. Por último, novas aplicações de AM são apresentadas e analisadas: uma de identificação rápida de arquivos por meio de amostragem de dados e outra de identificação eficiente de impressões digitais. Esperamos que profissionais da área forense e de outras áreas relacionadas se beneficiem de nosso estudo sobre AM para resolver seus problemasAbstract: Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. With the rapid increase in data storage capacity, the use of automated procedures to handle the massive volume of data available nowadays is required, especially in forensic investigations, in which time is a scarce resource. One possible approach to make the process more efficient is the Known File Filter (KFF) technique, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device under investigation. However, due to limitations over cryptographic hash functions (inability to detect similar objects), new methods have been designed based on Approximate Matching (AM). They appear as suitable candidates to perform this process because of their ability to identify similarity (bytewise level) in a very efficient way, by creating and comparing compact representations of objects (a.k.a. digests). In this work, we present the Approximate Matching functions. We show some of the most known AM tools and present the Similarity Digest Search Strategies (SDSS), capable of performing the similarity search (using AM) more efficiently, especially when dealing with large data sets. We perform a detailed analysis of current SDSS approaches and, given that current strategies only work for a few particular AM tools, we propose a new strategy based on a different tool that has good characteristics for forensic investigations. Furthermore, we address some limitations of current AM tools regarding the similarity detection process, where many matches pointed out as similar, are indeed false positives; the tools are usually misled by common blocks (pieces of data common in many different objects). By removing such blocks from AM digests, we obtain significant improvements in the detection of similar data. We also present a detailed theoretical analysis of the capabilities of sdhash AM tool and provide some improvements to its comparison function, where our improved version has a more precise similarity measure (score). Lastly, new applications of AM are presented and analyzed: One for fast file identification based on data samples and another for efficient fingerprint identification. We hope that practitioners in the forensics field and other related areas will benefit from our studies on AM when solving their problemsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica23038.007604/2014-69CAPE

Repositorio da Producao Cientifica e Intelectual da Unicamp

Applied Metaheuristic Computing

Author
Publication venue: 'MDPI AG'
Publication date: 06/12/2022
Field of study

For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

Directory of Open Access Books (DOAB)

Applied Methuerstic computing

Author
Publication venue: MDPI
Publication date: 01/11/2022
Field of study

York St John University Institutional Repository

Actas de las VI Jornadas Nacionales (JNIC2021 LIVE)

Author: Alcaraz Cristina
Calvo Guillermo
Castro Noemí de
Fernández-Medina Eduardo
Serrano Manuel A.
Publication venue: 'Universidad Castilla la Mancha'
Publication date: 01/06/2021
Field of study

Estas jornadas se han convertido en un foro de encuentro de los actores más relevantes en el ámbito de la ciberseguridad en España. En ellas, no sólo se presentan algunos de los trabajos científicos punteros en las diversas áreas de ciberseguridad, sino que se presta especial atención a la formación e innovación educativa en materia de ciberseguridad, y también a la conexión con la industria, a través de propuestas de transferencia de tecnología. Tanto es así que, este año se presentan en el Programa de Transferencia algunas modificaciones sobre su funcionamiento y desarrollo que han sido diseñadas con la intención de mejorarlo y hacerlo más valioso para toda la comunidad investigadora en ciberseguridad

Universidad de Castilla-La Mancha: Repositorio Universitario Institucional de Recursos Abiertos (RUIdeRA)

Building information modeling – A game changer for interoperability and a chance for digital preservation of architectural data?

Author: Lindlar Michelle
Publication venue
Publication date: 01/01/2014
Field of study

Digital data associated with the architectural design-andconstruction process is an essential resource alongside -and even past- the lifecycle of the construction object it describes. Despite this, digital architectural data remains to be largely neglected in digital preservation research – and vice versa, digital preservation is so far neglected in the design-and-construction process. In the last 5 years, Building Information Modeling (BIM) has seen a growing adoption in the architecture and construction domains, marking a large step towards much needed interoperability. The open standard IFC (Industry Foundation Classes) is one way in which data is exchanged in BIM processes. This paper presents a first digital preservation based look at BIM processes, highlighting the history and adoption of the methods as well as the open file format standard IFC (Industry Foundation Classes) as one way to store and preserve BIM data

Repositorium für Naturwissenschaften und Technik