11,555 research outputs found

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Methodology and automated metadata extraction from multiple volume shadow copies

    Get PDF
    Modern day digital forensics investigations rely on timelines as a principal method for normalizing and chronologically categorizing artifacts recovered from computer systems. Timelines provide investigators with a chronological representation of digital evidence so they can depict altered and unaltered digital forensics data in-context to drive conclusions about system events and/or user activities. While investigators rely on many system artifacts such as file system time/date stamps, operating system artifacts, program artifacts, logs, and/or registry artifacts as input for deriving chronological representations, using only the available or most recent version of the artifacts may provide a limited picture of historical changes on a system. For instance, if previous versions of artifacts and/or previous artifact metadata changes are overwritten and/or are not retained on a system, analysis of current versions of artifacts and artifact metadata, such as time/date stamps and operating system/program/registry artifacts, may provide only a limited picture of activities for the system. Recently, the Microsoft Windows Operating System implemented a backup mechanism that is capable of retaining multiple versions of data storage units for a system, effectively providing a highly-detailed record of system changes. This backup mechanism, the Windows Volume Shadow Copy Service (VSS), exists as a service of modern Microsoft Windows Operating Systems and allows data backups to be performed while applications on a system continue to write to the system\u27s live volume(s). This allows a running system to preserve the system\u27s state to backup media at any given point while the system continues to change in real-time. After multiple VSS backups are recorded, digital investigators now have the ability to incorporate multiple versions of a system\u27s artifacts into a chronological representation, which provides a more comprehensive picture of the system\u27s historical changes. In order to effectively incorporate VSS backup, or Volume Shadow Copy (VSC), data into a chronological representation, the data must be accessed and extracted in a consistent, repeatable, and, if possible, automated manner. Previous efforts have produced a variety of manual and semi-automated methods for accessing and extracting VSC data in a repeatable manner. These methods are time consuming and often require significant storage resources if dealing with multiple VSCs. The product of this research effort is the advancement of the methodology to automate accessing and extracting directory-tree and file attribute metadata from multiple VSCs of the Windows 7 Operating System. The approach extracts metadata from multiple VSCs and combines it as one conglomerate data set. By capturing the historical changes recorded within VSC metadata, this approach enhances timeline generation. Additionally, it supports other projects which could use the metadata to visualize change-over-time by depicting how the individual metadata and the conglomerate data set changed (or remained unchanged) throughout an arbitrary snapshot of time

    A framework for the forensic investigation of unstructured email relationship data

    Get PDF
    Our continued reliance on email communications ensures that it remains a major source of evidence during a digital investigation. Emails comprise both structured and unstructured data. Structured data provides qualitative information to the forensics examiner and is typically viewed through existing tools. Unstructured data is more complex as it comprises information associated with social networks, such as relationships within the network, identification of key actors and power relations, and there are currently no standardised tools for its forensic analysis. Moreover, email investigations may involve many hundreds of actors and thousands of messages. This paper posits a framework for the forensic investigation of email data. In particular, it focuses on the triage and analysis of unstructured data to identify key actors and relationships within an email network. This paper demonstrates the applicability of the approach by applying relevant stages of the framework to the Enron email corpus. The paper illustrates the advantage of triaging this data to identify (and discount) actors and potential sources of further evidence. It then applies social network analysis techniques to key actors within the data set. This paper posits that visualisation of unstructured data can greatly aid the examiner in their analysis of evidence discovered during an investigation

    On the Identification of Information Extracted from Windows Physical Memory

    Get PDF
    Forensic investigation of the physical memory of computer systems is gaining the attention of experts in the digital forensics community. Forensic investigators find it helpful to seize and capture data from the physical memory and perform post-incident analysis when identifying potential evidence. However, there have been few investigations which have identified the quantity and quality of information that can be recovered from only the computer system memory (RAM) while the application is still running. In this paper, we present the results of investigations carried out to identify relevant information that has been extracted from the physical memory of computer systems running Windows XP. We found fragments of partial evidence from allocated memory segments. This evidence was dispersed in the physical memory that had been allocated to the application. The identification of this information is useful to forensic investigators as this approach can uncover what a user is doing on the application which can be used as evidence

    Extraction of User Activity through Comparison of Windows Restore Points

    Get PDF
    The extraction of past user activity is one of the main goals in the analysis of digital evidence. In this paper we present a methodology for extracting this activity by comparing multiple Restore Points found in the Windows XP operating system. We concentrate on comparing the copies of the registry hives found within these points. The registry copies represent a snapshot in time of the state of the system. Differences between them can reveal user activity from one instant to another. This approach is implemented and presented as a tool that is able to compare any set of offline hive files and present the results to the user. Investigative techniques are presented to use the software as efficiently as possible. The techniques range from general analysis, in which areas of high user activity are pinpointed, to specific techniques, where user activity relating to specific files and file types is found
    • …
    corecore