2,428 research outputs found

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Quantifying the need for supervised machine learning in conducting live forensic analysis of emergent configurations (ECO) in IoT environments

    Get PDF
    © 2020 The Author(s) Machine learning has been shown as a promising approach to mine larger datasets, such as those that comprise data from a broad range of Internet of Things devices, across complex environment(s) to solve different problems. This paper surveys existing literature on the potential of using supervised classical machine learning techniques, such as K-Nearest Neigbour, Support Vector Machines, Naive Bayes and Random Forest algorithms, in performing live digital forensics for different IoT configurations. There are also a number of challenges associated with the use of machine learning techniques, as discussed in this paper

    A Scaling Robust Copy-Paste Tampering Detection for Digital Image Forensics

    Get PDF
    AbstractIt is crucial in image forensics to prove the authenticity of the digital images. Due to the availability of the using sophisticated image editing software programs, anyone can manipulate the images easily. There are various types of digital image manipulation or tampering possible; like image compositing, splicing, copy-paste, etc. In this paper, we propose a passive scaling robust algorithm for the detection of Copy-Paste tampering. Sometimes the copied region of an image is scaled before pasting to some other location in the image. In such cases, the normal Copy-Paste detection algorithm fails to detect the forgeries. We have implemented and used an improved customized Normalized Cross Correlation for detecting highly correlated areas from the image and the image blocks, thereby detecting the tampered regions from an image. The experimental results demonstrate that the proposed approach can be effectively used to detect copy-paste forgeries accurately and is scaling robust

    Data Mining Methods Applied to a Digital Forensics Task for Supervised Machine Learning

    Get PDF
    Digital forensics research includes several stages. Once we have collected the data the last goal is to obtain a model in order to predict the output with unseen data. We focus on supervised machine learning techniques. This chapter performs an experimental study on a forensics data task for multi-class classification including several types of methods such as decision trees, bayes classifiers, based on rules, artificial neural networks and based on nearest neighbors. The classifiers have been evaluated with two performance measures: accuracy and Cohen’s kappa. The followed experimental design has been a 4-fold cross validation with thirty repetitions for non-deterministic algorithms in order to obtain reliable results, averaging the results from 120 runs. A statistical analysis has been conducted in order to compare each pair of algorithms by means of t-tests using both the accuracy and Cohen’s kappa metrics

    MemTri: A Memory Forensics Triage Tool using Bayesian Network and Volatility

    Get PDF
    This work explores the development of MemTri. A memory forensics triage tool that can assess the likelihood of criminal activity in a memory image, based on evidence data artefacts generated by several applications. Fictitious illegal suspect activity scenarios were performed on virtual machines to generate 60 test memory images for input into MemTri. Four categories of applications (i.e. Internet Browsers, Instant Messengers, FTP Client and Document Processors) are examined for data artefacts located through the use of regular expressions. These identified data artefacts are then analysed using a Bayesian Network, to assess the likelihood that a seized memory image contained evidence of illegal firearms trading activity. MemTri's normal mode of operation achieved a high artefact identification accuracy performance of 95.7% when the applications' processes were running. However, this fell significantly to 60% as applications processes' were terminated. To explore improving MemTri's accuracy performance, a second mode was developed, which achieved more stable results of around 80% accuracy, even after applications processes' were terminated

    Automatic Labelling and Document Clustering for Forensic Analysis

    Get PDF
    In computer forensic analysis, retrieved data is in unstructured text, whose analysis by computer examiners is difficult to be performed. In proposed approach the forensic analysis is done very systematically i.e. retrieved data is in unstructured format get particular structure by using high quality well known algorithm and automatic cluster labelling method. Indexing is performed on txt, doc, and pdf file which automatically estimate the number of clusters with automatic labelling to it. In the proposed approach DBSCAN algorithm and K-mean algorithm are used; which makes it very easy to retrieve most relevant information for forensic analysis also the automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. Two methods are used for document clustering for forensic analysis; the first method uses an x2 test of significance to detect different word usage across categories in the hierarchy which is well suited for testing dependencies when count data is available. The second method selects words which both occur frequently in a cluster and effectively discriminate the given cluster from the other clusters. Finally, we also present and discuss several practical results that can be useful for researchers of forensic analysis

    Hybrid feature selection technique for intrusion detection system

    Get PDF
    High dimensionality’s problems have make feature selection as one of the most important criteria in determining the efficiency of intrusion detection systems. In this study we have selected a hybrid feature selection model that potentially combines the strengths of both the filter and the wrapper selection procedure. The potential hybrid solution is expected to effectively select the optimal set of features in detecting intrusion. The proposed hybrid model was carried out using correlation feature selection (CFS) together with three different search techniques known as best-first, greedy stepwise and genetic algorithm. The wrapper-based subset evaluation uses a random forest (RF) classifier to evaluate each of the features that were first selected by the filter method. The reduced feature selection on both KDD99 and DARPA 1999 dataset was tested using RF algorithm with ten-fold cross-validation in a supervised environment. The experimental result shows that the hybrid feature selections had produced satisfactory outcome

    Using Visual Capabilities to Improve Efficiency in Computer Forensic Analysis

    Get PDF
    Computer forensics is the preservation, analysis, and interpretation of computer data. Computer forensics is dependent on the availability of software tools and applications. Such tools are critical components in law enforcement investigations. Due to the diversity of cyber crime and cyber assisted crime, advanced software tools are essential apparatus for typical law enforcement investigators, national security analysts, corporate emergency response teams, civil lawyers, risk management personnel, etc. Typical tools available to investigators are text-based, which are sorely inadequate given the volume of data needing analysis in today’s environment. Many modern tools essentially provide simple GUIs to simplify access to typical textbased commands but the capabilities are essentially the same. For simplicity we continue to refer to these as text-based and command-based in constrast to the visualization tools and associated direct manipulation interfaces we are attempting to develop. The reading of such large volumes of textual information is extremely time-consuming in contrast with the interpretation of images through which the user can interpret large amounts of information simultaneously. Forensic analysts have a growing need for new capabilities to aid in locating files holding evidence of criminal activity. Such capabilities must improve both the efficiency of the analysis process and the identification of additionally hidden files. This paper discusses visualization research that more perceptually and intuitively represents file characteristics. Additionally, we integrate interaction capabilities for more complete exploration, significantly improving analysis efficiency. Finally, we discuss the results of an applied user study designed specifically to measure the efficacy of the developed visualization capabilities in the analysis of computer forensic related data
    corecore