4 research outputs found

    FRASHER – A framework for automated evaluation of similarity hashing

    Get PDF
    A challenge for digital forensic investigations is dealing with large amounts of data that need to be processed. Approximate matching (AM), a.k.a. similarity hashing or fuzzy hashing, plays a pivotal role in solving this challenge. Many algorithms have been proposed over the years such as ssdeep, sdhash, MRSH-v2, or TLSH, which can be used for similarity assessment, clustering of different artifacts, or finding fragments and embedded objects. To assess the differences between these implementations (e.g., in terms of runtime efficiency, fragment detection, or resistance against obfuscation attacks), a testing framework is indispensable and the core of this article. The proposed framework is called FRASHER (referring to a predecessor FRASH from 2013) and provides an up-to-date view on the problem of evaluating AM algorithms with respect to both the conceptual and the practical aspects. Consequently, we present and discuss relevant test case scenarios as well as release and demonstrate our framework allowing a comprehensive evaluation of AM algorithms. Compared to its predecessor, we adapt it to a modern environment providing better modularity and usability as well as more thorough testing cases

    ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown

    Full text link
    The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances

    PRIORITISATION IN DIGITAL FORENSICS: A CASE STUDY OF ABU DHABI POLICE

    Get PDF
    The main goal of this research is to investigate prioritization process in digital forensics departments in law enforcement organizations. This research is motivated by the fact that case prioritisation plays crucial role to achieve efficient operations in digital forensics departments. Recent years have witnessed the widespread use of digital devices in every aspect of human life, around the globe. One of these aspects is crime. These devices have became an essential part of every investigation in almost all cases handled by police. The reason behind their importance lies in their ability to store huge amounts of data that can be utilized by investigators to solve cases under consideration. Thus, involving Digital Forensics departments, though often over-burdened and under-resourced, is becoming a compulsory to achieve successful investigations. Increasing the effectiveness of these departments requires improving their processes including case prioritisation. Existing literature focuses on prioritisation process within the context of crime scene triage. The main research problem in literature is prioritising existing digital devices found in crime scene in a way that leads to successful digital forensics. On the other hand, the research problem in this thesis focuses on prioritisation of cases rather than digital devices belonging to a specific case. Normally, Digital Forensics cases are prioritised based on several factors where influence of officers handling the case play one of the most important roles. Therefore, this research investigates how perception of different individuals in law enforcement organization may affect case prioritisation for the Digital Forensics department. To address this prioritisation problem, the research proposes the use of maturity models and machine learning. A questionnaire was developed and distributed among officers in Abu Dhabi Police. The main goal of this questionnaire is to measure perception regarding digital forensics among employees in Abu Dhabi police. Response of the subjects were divided into two sets. The first set represents responses of subjects who are experts in DF; while the other set includes the remaining subjects. Responses in the first set were averaged to produce a benchmark of the optimal questionnaire answers. Then, a reliability measure is proposed to summarize each subject’s perception. Data obtained from the reliability measurement were used in machine learning models, so that the process is automated. Results of data analysis confirmed the severity of problem where the proposed prioritisation process can be a very effective solution as seen in the results provided in this thesis

    Um estudo sobre pareamento aproximado para busca por similaridade : técnicas, limitações e melhorias para investigações forenses digitais

    Get PDF
    Orientador: Marco Aurélio Amaral HenriquesTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A forense digital é apenas um dos ramos da Ciência da Computação que visa investigar e analisar dispositivos eletrônicos na busca por evidências de crimes. Com o rápido aumento da capacidade de armazenamento de dados, é necessário o uso de procedimentos automatizados para lidar com o grande volume de dados disponíveis atualmente, principalmente em investigações forenses, nas quais o tempo é um recurso escasso. Uma possível abordagem para tornar o processo mais eficiente é através da técnica KFF (Filtragem por arquivos conhecidos - Known File Filtering), onde uma lista de objetos de interesse é usada para reduzir/separar dados para análise. Com um banco de dados de hashes destes objetos, o examinador realiza buscas no dispositivo de destino sob investigação por qualquer item que seja igual ao buscado. No entanto, devido a limitações nas funções criptográficas de hash (incapacidade de detectar objetos semelhantes), novos métodos foram projetados baseando-se em funções de Pareamento Aproximado (ou Approximate Matching) (AM). Estas funções aparecem como candidatos para realizar buscas uma vez que elas têm a capacidade de identificar similaridade (no nível de bits) de uma maneira muito eficiente, criando e comparando representações compactas de objetos (conhecidos como resumos). Neste trabalho, apresentamos as funções de Pareamento Aproximado. Mostramos algumas das ferramentas de AM mais conhecidas e apresentamos as Estratégias de Busca por Similaridade baseadas em resumos, capazes de realizar a busca de similaridade (usando AM) de maneira mais eficiente, principalmente ao lidar com grandes conjuntos de dados. Realizamos também uma análise detalhada das estratégias atuais e, dado que as mesmas trabalham somente com algumas ferramentas específicas de AM, nós propomos uma nova abordagem baseada em uma ferramenta diferente que possui boas características para investigações forenses. Além disso, abordamos algumas limitações das ferramentas atuais de AM em relação ao processo de detecção de similaridade, onde muitas comparações apontadas como semelhantes, são de fato falsos positivos; as ferramentas geralmente são enganadas por blocos comuns (dados comuns em muitos objetos diferentes). Ao remover estes blocos dos resumos de AM, obtemos melhorias significativas na detecção de objetos similares. Também apresentamos neste trabalho uma análise teórica detalhada das capacidades de detecção da ferramenta de AM sdhash e propomos melhorias em sua função de comparação, onde a versão aprimorada apresenta uma medida de similaridade (score) mais precisa. Por último, novas aplicações de AM são apresentadas e analisadas: uma de identificação rápida de arquivos por meio de amostragem de dados e outra de identificação eficiente de impressões digitais. Esperamos que profissionais da área forense e de outras áreas relacionadas se beneficiem de nosso estudo sobre AM para resolver seus problemasAbstract: Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. With the rapid increase in data storage capacity, the use of automated procedures to handle the massive volume of data available nowadays is required, especially in forensic investigations, in which time is a scarce resource. One possible approach to make the process more efficient is the Known File Filter (KFF) technique, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device under investigation. However, due to limitations over cryptographic hash functions (inability to detect similar objects), new methods have been designed based on Approximate Matching (AM). They appear as suitable candidates to perform this process because of their ability to identify similarity (bytewise level) in a very efficient way, by creating and comparing compact representations of objects (a.k.a. digests). In this work, we present the Approximate Matching functions. We show some of the most known AM tools and present the Similarity Digest Search Strategies (SDSS), capable of performing the similarity search (using AM) more efficiently, especially when dealing with large data sets. We perform a detailed analysis of current SDSS approaches and, given that current strategies only work for a few particular AM tools, we propose a new strategy based on a different tool that has good characteristics for forensic investigations. Furthermore, we address some limitations of current AM tools regarding the similarity detection process, where many matches pointed out as similar, are indeed false positives; the tools are usually misled by common blocks (pieces of data common in many different objects). By removing such blocks from AM digests, we obtain significant improvements in the detection of similar data. We also present a detailed theoretical analysis of the capabilities of sdhash AM tool and provide some improvements to its comparison function, where our improved version has a more precise similarity measure (score). Lastly, new applications of AM are presented and analyzed: One for fast file identification based on data samples and another for efficient fingerprint identification. We hope that practitioners in the forensics field and other related areas will benefit from our studies on AM when solving their problemsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica23038.007604/2014-69CAPE
    corecore