7 research outputs found

    An Automated Approach for Digital Forensic Analysis of Heterogeneous Big Data

    Get PDF
    The major challenges with big data examination and analysis are volume, complex interdependence across content, and heterogeneity. The examination and analysis phases are considered essential to a digital forensics process. However, traditional techniques for the forensic investigation use one or more forensic tools to examine and analyse each resource. In addition, when multiple resources are included in one case, there is an inability to cross-correlate findings which often leads to inefficiencies in processing and identifying evidence. Furthermore, most current forensics tools cannot cope with large volumes of data. This paper develops a novel framework for digital forensic analysis of heterogeneous big data. The framework mainly focuses upon the investigations of three core issues: data volume, heterogeneous data and the investigators cognitive load in understanding the relationships between artefacts. The proposed approach focuses upon the use of metadata to solve the data volume problem, semantic web ontologies to solve the heterogeneous data sources and artificial intelligence models to support the automated identification and correlation of artefacts to reduce the burden placed upon the investigator to understand the nature and relationship of the artefacts

    Effects of the Factory Reset on Mobile Devices

    Get PDF
    Mobile devices usually provide a “factory-reset” tool to erase user-specific data from the main secondary storage. 9 Apple iPhones, 10 Android devices, and 2 BlackBerry devices were tested in the first systematic evaluation of the effectiveness of factory resets. Tests used the Cellebrite UME-36 Pro with the UFED Physical Analyzer, the Bulk Extractor open-source tool, and our own programs for extracting metadata, classifying file paths, and comparing them between images. Two phones were subjected to more detailed analysis. Results showed that many kinds of data were removed by the resets, but much user-specific configuration data was left. Android devices did poorly at removing user documents and media, and occasional surprising user data was left on all devices including photo images, audio, documents, phone numbers, email addresses, geolocation data, configuration data, and keys. A conclusion is that reset devices can still provide some useful information to a forensic investigation

    Multi-Stakeholder Case Prioritization in Digital Investigations

    Get PDF
    This work examines the problem of case prioritization in digital investigations for better utilization of limited criminal investigation resources. Current methods of case prioritization, as well as observed prioritization methods used in digital forensic investigation laboratories are examined. After, a multi-stakeholder approach to case prioritization is given that may help reduce reputational risk to digital forensic laboratories while improving resource allocation. A survey is given that shows differing opinions of investigation priority between Law Enforcement and the public that is used in the development of a prioritization model. Finally, an example case is given to demonstrate the practicality of the proposed method

    Evidence identification in heterogeneous data using clustering

    Get PDF
    Digital forensics faces several challenges in examining and analyzing data due to an increasing range of technologies at people\u27s disposal. The investigators find themselves having to process and analyze many systems manually (e.g. PC, laptop, Smartphone) in a single case. Unfortunately, current tools such as FTK and Encase have a limited ability to achieve the automation in finding evidence. As a result, a heavy burden is placed on the investigator to both find and analyze evidential artifacts in a heterogenous environment. This paper proposed a clustering approach based on Fuzzy C-Means (FCM) and K-means algorithms to identify the evidential files and isolate the non-related files based on their metadata. A series of experiments using heterogenous real-life forensic cases are conducted to evaluate the approach. Within each case, various types of metadata categories were created based on file systems and applications. The results showed that the clustering based on file systems gave the best results of grouping the evidential artifacts within only five clusters. The proportion across the five clusters was 100% using small configurations of both FCM and K-means with less than 16% of the non-evidential artifacts across all cases -- representing a reduction in having to analyze 84% of the benign files. In terms of the applications, the proportion of evidence was more than 97%, but the proportion of benign files was also relatively high based upon small configurations. However, with a large configuration, the proportion of benign files became very low less than 10%. Successfully prioritizing large proportions of evidence and reducing the volume of benign files to be analyzed, reduces the time taken and cognitive load upon the investigator

    Multi-Stakeholder Case Prioritization in Digital Investigations

    Get PDF
    This work examines the problem of case prioritization in digital investigations for better utilization of limited criminal investigation resources. Current methods of case prioritization, as well as observed prioritization methods used in digital forensic investigation laboratories are examined. After, a multi-stakeholder approach to case prioritization is given that may help reduce reputational risk to digital forensic laboratories while improving resource allocation. A survey is given that shows differing opinions of investigation priority between Law Enforcement and the public that is used in the development of a prioritization model. Finally, an example case is given to demonstrate the practicality of the proposed method.</span

    Constructing and classifying email networks from raw forensic images

    Get PDF
    Email addresses extracted from secondary storage devices are important to a forensic analyst when conducting an investigation. They can provide insight into the user's social network and help identify other potential persons of interest. However, a large portion of the email addresses from any given device are artifacts of installed software and are of no interest to the analyst.We propose a method for discovering relevant email addresses by creating graphs consisting of extracted email addresses along with their byte-offset location in storage.We compute certain global attributes of these graphs to construct feature vectors, which we use to classify graphs into useful and not useful categories. This process filters out the majority of uninteresting email addresses. We show that using the network topological measures on the dataset tested, Naïve Bayes and SVM were successful in identifying 100% and 95:5%, respectively, of all graphs that contained useful email addresses both with areas under the curve above :97 and F1 scores at :80 and :90 for Naïve Bayes and SVM, respectively. Our results show that using network science metrics as attributes to classify graphs of email addresses based on the graph's topology could be an effective and efficient tool for automatically delivering evidence to an analyst.http://archive.org/details/constructingndcl1094550520Lieutenant, United States NavyApproved for public release; distribution is unlimited

    Automated Identification of Digital Evidence across Heterogeneous Data Resources

    Get PDF
    Digital forensics has become an increasingly important tool in the fight against cyber and computer-assisted crime. However, with an increasing range of technologies at people’s disposal, investigators find themselves having to process and analyse many systems with large volumes of data (e.g., PCs, laptops, tablets, and smartphones) within a single case. Unfortunately, current digital forensic tools operate in an isolated manner, investigating systems and applications individually. The heterogeneity and volume of evidence place time constraints and a significant burden on investigators. Examples of heterogeneity include applications such as messaging (e.g., iMessenger, Viber, Snapchat, and WhatsApp), web browsers (e.g., Firefox and Google Chrome), and file systems (e.g., NTFS, FAT, and HFS). Being able to analyse and investigate evidence from across devices and applications in a universal and harmonized fashion would enable investigators to query all data at once. In addition, successfully prioritizing evidence and reducing the volume of data to be analysed reduces the time taken and cognitive load on the investigator. This thesis focuses on the examination and analysis phases of the digital investigation process. It explores the feasibility of dealing with big and heterogeneous data sources in order to correlate the evidence from across these evidential sources in an automated way. Therefore, a novel approach was developed to solve the heterogeneity issues of big data using three developed algorithms. The three algorithms include the harmonising, clustering, and automated identification of evidence (AIE) algorithms. The harmonisation algorithm seeks to provide an automated framework to merge similar datasets by characterising similar metadata categories and then harmonising them in a single dataset. This algorithm overcomes heterogeneity issues and makes the examination and analysis easier by analysing and investigating the evidential artefacts across devices and applications based on the categories to query data at once. Based on the merged datasets, the clustering algorithm is used to identify the evidential files and isolate the non-related files based on their metadata. Afterwards, the AIE algorithm tries to identify the cluster holding the largest number of evidential artefacts through searching based on two methods: criminal profiling activities and some information from the criminals themselves. Then, the related clusters are identified through timeline analysis and a search of associated artefacts of the files within the first cluster. A series of experiments using real-life forensic datasets were conducted to evaluate the algorithms across five different categories of datasets (i.e., messaging, graphical files, file system, internet history, and emails), each containing data from different applications across different devices. The results of the characterisation and harmonisation process show that the algorithm can merge all fields successfully, with the exception of some binary-based data found within the messaging datasets (contained within Viber and SMS). The error occurred because of a lack of information for the characterisation process to make a useful determination. However, on further analysis, it was found that the error had a minimal impact on subsequent merged data. The results of the clustering process and AIE algorithm showed the two algorithms can collaborate and identify more than 92% of evidential files.HCED Ira
    corecore