161 research outputs found

    Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

    Full text link
    The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicious file artefacts (i.e., file artefacts that are pertinent to the investigation) is proposed to reduce the manual analysis effort required. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is likely to be suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion

    An Automated Approach for Digital Forensic Analysis of Heterogeneous Big Data

    Get PDF
    The major challenges with big data examination and analysis are volume, complex interdependence across content, and heterogeneity. The examination and analysis phases are considered essential to a digital forensics process. However, traditional techniques for the forensic investigation use one or more forensic tools to examine and analyse each resource. In addition, when multiple resources are included in one case, there is an inability to cross-correlate findings which often leads to inefficiencies in processing and identifying evidence. Furthermore, most current forensics tools cannot cope with large volumes of data. This paper develops a novel framework for digital forensic analysis of heterogeneous big data. The framework mainly focuses upon the investigations of three core issues: data volume, heterogeneous data and the investigators cognitive load in understanding the relationships between artefacts. The proposed approach focuses upon the use of metadata to solve the data volume problem, semantic web ontologies to solve the heterogeneous data sources and artificial intelligence models to support the automated identification and correlation of artefacts to reduce the burden placed upon the investigator to understand the nature and relationship of the artefacts

    Automating the harmonisation of heterogeneous data in digital forensics

    Get PDF
    Digital forensics has become an increasingly important tool in the fight against cyber and computer-assisted crime. However, with an increasing range of technologies at people’s disposal, investigators find themselves having to process and analyse many systems (e.g. PC, laptop, tablet, Smartphone) in a single case. Unfortunately, current tools operate within an isolated manner, investigating systems and applications on an individual basis. The heterogeneity of the evidence places time constraints and additional cognitive loads upon the investigator. Examples of heterogeneity include applications such as messaging (e.g. iMessenger, Viber, Snapchat and Whatsapp), web browsers (e.g. Firefox and Chrome) and file systems (e.g. NTFS, FAT, and HFS). Being able to analyse and investigate evidence from across devices and applications based upon categories would enable investigators to query all data at once. This paper proposes a novel algorithm to the merging of datasets through a ‘characterisation and harmonisation’ process. The characterisation process analyses the nature of the metadata and the harmonisation process merges the data. A series of experiments using real-life forensic datasets are conducted to evaluate the algorithm across five different categories of datasets (i.e. messaging, graphical files, file system, Internet history, and emails), each containing data from different applications across difference devices (a total of 22 disparate datasets). The results showed that the algorithm is able to merge all fields successfully, with the exception of some binary-based data found within the messaging datasets (contained within Viber and SMS). The error occurred due to a lack of information for the characterisation process to make a useful determination. However, upon the further analysis it was found the error had a minimal impact on subsequent merged data

    Single-Snapshot File System Analysis

    Full text link
    Abstract—Metadata snapshots are a common method for gaining insight into filesystems due to their small size and relative ease of acquisition. Since they are static, most researchers have used them for relatively simple analyses such as file size distributions and age of files. We hypothesize that it is possible to gain much richer insights into file system and user behavior by clustering features in metadata snapshots and comparing the entropy within clusters to the entropy within natural partitions such as directory hierarchies. We discuss several different methods for gaining deeper insights into metadata snap-shots, and show a small proof of concept using data from Los Alamos National Laboratories. In our initial work, we see evidence that it is possible to identify user locality information, traditionally the purview of dynamic traces, using a single static snapshot. I

    Evidence identification in heterogeneous data using clustering

    Get PDF
    Digital forensics faces several challenges in examining and analyzing data due to an increasing range of technologies at people\u27s disposal. The investigators find themselves having to process and analyze many systems manually (e.g. PC, laptop, Smartphone) in a single case. Unfortunately, current tools such as FTK and Encase have a limited ability to achieve the automation in finding evidence. As a result, a heavy burden is placed on the investigator to both find and analyze evidential artifacts in a heterogenous environment. This paper proposed a clustering approach based on Fuzzy C-Means (FCM) and K-means algorithms to identify the evidential files and isolate the non-related files based on their metadata. A series of experiments using heterogenous real-life forensic cases are conducted to evaluate the approach. Within each case, various types of metadata categories were created based on file systems and applications. The results showed that the clustering based on file systems gave the best results of grouping the evidential artifacts within only five clusters. The proportion across the five clusters was 100% using small configurations of both FCM and K-means with less than 16% of the non-evidential artifacts across all cases -- representing a reduction in having to analyze 84% of the benign files. In terms of the applications, the proportion of evidence was more than 97%, but the proportion of benign files was also relatively high based upon small configurations. However, with a large configuration, the proportion of benign files became very low less than 10%. Successfully prioritizing large proportions of evidence and reducing the volume of benign files to be analyzed, reduces the time taken and cognitive load upon the investigator

    Automated Identification of Digital Evidence across Heterogeneous Data Resources

    Get PDF
    Digital forensics has become an increasingly important tool in the fight against cyber and computer-assisted crime. However, with an increasing range of technologies at people’s disposal, investigators find themselves having to process and analyse many systems with large volumes of data (e.g., PCs, laptops, tablets, and smartphones) within a single case. Unfortunately, current digital forensic tools operate in an isolated manner, investigating systems and applications individually. The heterogeneity and volume of evidence place time constraints and a significant burden on investigators. Examples of heterogeneity include applications such as messaging (e.g., iMessenger, Viber, Snapchat, and WhatsApp), web browsers (e.g., Firefox and Google Chrome), and file systems (e.g., NTFS, FAT, and HFS). Being able to analyse and investigate evidence from across devices and applications in a universal and harmonized fashion would enable investigators to query all data at once. In addition, successfully prioritizing evidence and reducing the volume of data to be analysed reduces the time taken and cognitive load on the investigator. This thesis focuses on the examination and analysis phases of the digital investigation process. It explores the feasibility of dealing with big and heterogeneous data sources in order to correlate the evidence from across these evidential sources in an automated way. Therefore, a novel approach was developed to solve the heterogeneity issues of big data using three developed algorithms. The three algorithms include the harmonising, clustering, and automated identification of evidence (AIE) algorithms. The harmonisation algorithm seeks to provide an automated framework to merge similar datasets by characterising similar metadata categories and then harmonising them in a single dataset. This algorithm overcomes heterogeneity issues and makes the examination and analysis easier by analysing and investigating the evidential artefacts across devices and applications based on the categories to query data at once. Based on the merged datasets, the clustering algorithm is used to identify the evidential files and isolate the non-related files based on their metadata. Afterwards, the AIE algorithm tries to identify the cluster holding the largest number of evidential artefacts through searching based on two methods: criminal profiling activities and some information from the criminals themselves. Then, the related clusters are identified through timeline analysis and a search of associated artefacts of the files within the first cluster. A series of experiments using real-life forensic datasets were conducted to evaluate the algorithms across five different categories of datasets (i.e., messaging, graphical files, file system, internet history, and emails), each containing data from different applications across different devices. The results of the characterisation and harmonisation process show that the algorithm can merge all fields successfully, with the exception of some binary-based data found within the messaging datasets (contained within Viber and SMS). The error occurred because of a lack of information for the characterisation process to make a useful determination. However, on further analysis, it was found that the error had a minimal impact on subsequent merged data. The results of the clustering process and AIE algorithm showed the two algorithms can collaborate and identify more than 92% of evidential files.HCED Ira

    Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events

    Get PDF
    The 2020 IEEE International Conference on Cyber Security And Protection Of Digital Services (Cyber Security 2020), Dublin City University, Ireland (held online due to coronavirus outbreak, 15-17 June 2020Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications can facilitate an opportunity for training automated artificial intelligence based evidence processing systems. These can significantly aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination building on the growing trend towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered pertinent files to classify newly discovered files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique generates a relevancy score for file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios

    Auditing database systems through forensic analysis

    Get PDF
    The majority of sensitive and personal data is stored in a number of different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for many webstores, and SQLite stores personal data such as SMS messages or browser bookmarks. Consequently, the pervasive use of DBMSes has led to an increase in the rate at which they are exploited in cybercrimes. After a cybercrime occurs, investigators need forensic tools and methods to recreate a timeline of events and determine the extent of the security breach. When a breach involves a compromised system, these tools must make few assumptions about the system (e.g., corrupt storage, poorly configured logging, data tampering). Since DBMSes manage storage independent of the operating system, they require their own set of forensic tools. This dissertation presents 1) our database-agnostic forensic methods to examine DBMS contents from any evidence source (e.g., disk images or RAM snapshots) without using a live system and 2) applications of our forensic analysis methods to secure data. The foundation of this analysis is page carving, our novel database forensic method that we implemented as the tool DBCarver. We demonstrate that DBCarver is capable of reconstructing DBMS contents, including metadata and deleted data, from various types of digital evidence. Since DBMS storage is managed independently of the operating system, DBCarver can be used for new methods to securely delete data (i.e., data sanitization). In the event of suspected log tampering or direct modification to DBMS storage, DBCarver can be used to verify log integrity and discover storage inconsistencies

    Facilitating forensic examinations of multi-user computer environments through session-to-session analysis of internet history

    Get PDF
    This paper proposes a new approach to the forensic investigation of Internet history artefacts by aggregating the history from a recovered device into sessions and comparing those sessions to other sessions to determine whether they are one-time events or form a repetitive or habitual pattern. We describe two approaches for performing the session aggregation: fixed-length sessions and variable-length sessions. We also describe an approach for identifying repetitive pattern of life behaviour and show how such patterns can be extracted and represented as binary strings. Using the Jaccard similarity coefficient, a session-to-session comparison can be performed and the sessions can be analysed to determine to what extent a particular session is similar to any other session in the Internet history, and thus is highly likely to correspond to the same user. Experiments have been conducted using two sets of test data, where multiple users have access to the same computer. By identifying patterns of Internet usage that are unique to each user, our approach exhibits a high success rate in attributing particular sessions of the Internet history to the correct user. This can provide considerable help to a forensic investigator trying to establish which user was using the computer when a web-related crime was committed

    A Holistic Methodology for Profiling Ransomware Through Endpoint Detection

    Get PDF
    Computer security incident response is a critical capability in light of the growing threat of malware infecting endpoint systems today. Ransomware is one type of malware that is causing increasing harm to organizations. Ransomware infects an endpoint system by encrypting files until a ransom is paid. Ransomware can have a negative impact on an organization’s daily functions if critical business files are encrypted and are not backed up properly. Many tools exist that claim to detect and respond to malware. Organizations and small businesses are often short-staffed and lack the technical expertise to properly configure security tools. One such endpoint detection tool is Sysmon, which logs critical events to the Windows event log. Sysmon is free to download on the Internet. The details contained in Sysmon events can be extremely helpful during an incident response. The author of Sysmon states that the Sysmon configuration needs be iteratively assessed to determine which Sysmon events are most effective. Unfortunately, an organization may not have the time, knowledge, or infrastructure to properly configure and analyze Sysmon events. If configured incorrectly, the organization may have a false sense of security or lack the logs necessary to respond quickly and accurately during a malware incident. This research seeks to answer the question “What methodology can an organization follow to determine which Sysmon events should be analyzed to identify ransomware in a Windows environment?” The answer to this question helps organizations make informed decisions regarding how to configure Sysmon and analyze Sysmon logs. This study uses design science research methods to create three artifacts: a method, an instantiation, and a tool. The artifacts are used to analyze Sysmon logs against a ransomware dataset consisting of publicly available samples from three ransomware families that were major threats in 2017 according to Symantec. The artifacts are built using software that is free to download on the Internet. Step-by-step instructions, source code, and configuration files are provided so that other researchers can replicate and expand on the results. The end goal provides concrete results that organizations can apply directly to their environment to begin leveraging the benefits of Sysmon and understand the analytics needed to identify suspicious activity during an incident response
    corecore