8 research outputs found

    EviPlant: An efficient digital forensic challenge creation, manipulation and distribution solution

    Full text link
    Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant time and resources and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system as an alternative to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely base operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a "diffing" of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an "evidence package". Evidence packages can be created for different personae, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated into the base images. A number of additional applications in digital forensic challenge creation for tool testing and validation, proficiency testing, and malware analysis are also discussed as a result of using EviPlant.Comment: Digital Forensic Research Workshop Europe 201

    Improving trace synthesis by utilizing computer vision for user action emulation

    Get PDF
    Forensic analyses are performed by skilled forensic practitioners who require reliable, state-of-the-art tooling and ongoing training. To provide both, education and academia rely on realistic training datasets. Those datasets are crucial to teaching investigators, validating forensic tools, advancing algorithms, and pursuing research. At the same time, the forensic community faces a shortcoming of realistic datasets, mainly due to ethical and legal reasons. To overcome this challenge, prior work introduced several frameworks aiming to create unproblematic replications of real evidence. Those frameworks generate synthetic datasets by populating disk images with traces of emulated user behavior. However, it is general consent that existing frameworks have some drawbacks concerning the quality of generated datasets, particularly due to the incorporation of unrealistic traces in GUI-based environments. Reviewing the implementation details of common frameworks, we found that current solutions miss realistic trace synthesis, reducing the quality and usefulness of synthesized datasets.By leveraging computer vision, this paper introduces a novel approach aiming to enhance the quality of synthetic datasets. We propose an architecture and provide an open-source implementation utilizing a hypervisor for creating Human Interface Device (HID) input, which is controlled by computer vision algorithms to imitate human-like user actions. In this way, we provide external GUI automation capabilities that enable more realistic trace synthesis than existing solutions and open up the applicability to a wide range of GUI-based operating systems. In contrast to previous research results, our approach is independent of software running in virtual machines, further optimizing the quality of generated datasets by omitting automation artifacts. Our experiments indicate that using external GUI automation for user action emulation results in a greater amount and a more widespread distribution of traces. Therefore our approach may refine the quality of datasets in this field

    TraceGen: user activity emulation for digital forensic test image generation

    Get PDF
    Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. Furthermore, when using real data it is not usually known what actions were performed when, i.e., what was the ‘ground truth’. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a ‘story’ to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or ‘background noise’ on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts

    ChatGPT for digital forensic investigation: The good, the bad, and the unknown

    Get PDF
    The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances

    Sharing datasets for digital forensic: A novel taxonomy and legal concerns

    Get PDF
    During the last few years, there have been numerous changes concerning datasets for digital forensics like the development of data generation frameworks or the newly released CFReDS website by NIST. In addition, it becomes mandatory (e.g., by funding agencies) to share datasets and publish them in a manner that they can be found and processed. The core of this article is a novel taxonomy that should be used to structure the data commonly used in the domain, complementing the existing methods. Based on the taxonomy, we discuss that it is not always necessary to release the dataset, e.g., in the case of random data. In addition, we address the legal aspects of sharing data. Lastly, as a minor contribution, we provide a separation of the terms structured, semi-structured, and unstructured data where there is currently no consent in the community

    The Advanced Framework for Evaluating Remote Agents (AFERA): A Framework for Digital Forensic Practitioners

    Get PDF
    Digital forensics experts need a dependable method for evaluating evidence-gathering tools. Limited research and resources challenge this process and the lack of multi-endpoint data validation hinders reliability in distributed digital forensics. A framework was designed to evaluate distributed agent-based forensic tools while enabling practitioners to self-evaluate and demonstrate evidence reliability as required by the courts. Grounded in Design Science, the framework features guidelines, data, criteria, and checklists. Expert review enhances its quality and practicality

    Educating the effective digital forensics practitioner: academic, professional, graduate and student perspectives

    Get PDF
    Over the years, digital forensics has become an important and sought-after profession where the gateway of training and education has developed vastly over the past decade. Many UK higher education (HE) institutions now deliver courses that prepare students for careers in digital forensics and, in most recent advances, cyber security. Skills shortages and external influences attributed within the field of cyber security, and its relationship as a discipline with digital forensics, has shifted the dynamic of UK higher education provisions. The implications of this now sees the route to becoming a digital forensic practitioner, be it in law enforcement or business, transform from on-the-job training to university educated, trained analysts. This thesis examined courses within HE and discovered that the delivery of these courses often overlooked areas such as mobile forensics, live data forensics, Linux and Mac knowledge. This research also considered current standards available across HE to understand whether educational programmes are delivering what is documented as relevant curriculum. Cyber security was found to be the central focus of these standards within inclusion of digital forensics, adding further to the debate and lack of distinctive nature of digital forensics as its own discipline. Few standards demonstrated how the topics, knowledge, skills and competences drawn were identified as relevant and effective for producing digital forensic practitioners. Additionally, this thesis analyses and discusses results from 201 participants across five stakeholder groups: graduates, professionals, academics, students and the public. These areas were selected due to being underdeveloped in existing literature and the crucial role they play in the cycle of producing effective practitioners. Analysis on stakeholder views, experiences and thoughts surrounding education and training offer unique insight, theoretical underpinnings and original contributions not seen in existing literature. For example, challenges, costs and initial issues with introducing graduates to employment for the employers and/or supervising practitioners, the lack of awareness and contextualisation on behalf of students and graduates towards what knowledge and skills they have learned and acquired on a course and its practical application on-the-job which often lead to suggestions of a lack of fundamental knowledge and skills. This is evidenced throughout the thesis, but examples include graduates: for their reflections on education based on their new on-the-job experiences and practices; professionals: for their job experiences and requirements, academics: for their educational practices and challenges; students: their initial expectations and views; and, the public: for their general understanding. This research uniquely captures these perspectives, bolstering the development of digital forensics as an academic discipline, along with the importance these diverse views play in the overall approach to delivering skilled practitioners. While the main contribution to knowledge within this thesis is its narrative focusing on the education of effective digital forensic practitioners and its major stakeholders, this thesis also makes additional contributions both academically and professionally; including the discussion, analysis and reflection of: - improvements for education and digital forensics topics for research and curriculum development; - where course offerings can be improved for institutions offering digital forensic degree programmes; - the need for further collaboration between industry and academia to provide students and graduates with greater understanding of the real-life role of a digital forensic practitioner and the expectations in employment; - continuous and unique challenges within both academia and the industry which digital forensics possess and the need for improved facilities and tool development to curate and share problem and scenario-based learning studies

    The Forensic Image Generator Generator (Forensig2)

    Full text link
    corecore