58 research outputs found

    Data Fingerprinting with Similarity Digests

    Full text link

    An Empirical Comparison of Widely Adopted Hash Functions in Digital Forensics: Does the Programming Language and Operating System Make a Difference?

    Get PDF
    Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that needs to be processed when working on cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C♯ and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing processes. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512. Keywords: Digital forensics, hashing, micro benchmarking, security, tool buildin

    An Empirical Comparison of Widely Adopted Hash Functions in Digital Forensics: Does the Programming Language and Operating System Make a Difference?

    Get PDF
    Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that needs to be processed when working on cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing processes. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512

    Towards a standardised strategy to collect and distribute application software artifacts

    Get PDF
    Reference sets contain known content that are used to identify relevant or filter irrelevant content. Application profiles are a type of reference set that contain digital artifacts associated with application software. An application profile can be compared against a target data set to identify relevant evidence of application usage in a variety of investigation scenarios. The research objective is to design and implement a standardised strategy to collect and distribute application software artifacts using application profiles. An advanced technique for creating application profiles was designed using a formalised differential analysis strategy. The design was implemented in a live differential forensic analysis tool, LiveDiff, to automate and simplify data collection. A storage mechanism was designed based on a previously standardised forensic data abstraction. The design was implemented in a new data abstraction, Application Profile XML (APXML), to provide storage, distribution and automated processing of collected artifacts

    Fuzzy-import hashing:A malware analysis approach

    Get PDF
    Malware has remained a consistent threat since its emergence, growing into a plethora of types and in large numbers. In recent years, numerous new malware variants have enabled the identification of new attack surfaces and vectors, and have become a major challenge to security experts, driving the enhancement and development of new malware analysis techniques to contain the contagion. One of the preliminary steps of malware analysis is to remove the abundance of counterfeit malware samples from the large collection of suspicious samples. This process assists in the management of man and machine resources effectively in the analysis of both unknown and likely malware samples. Hashing techniques are one of the fastest and efficient techniques for performing this preliminary analysis such as fuzzy hashing and import hashing. However, both hashing methods have their limitations and they may not be effective on their own, instead the combination of two distinctive methods may assist in improving the detection accuracy and overall performance of the analysis. This paper proposes a Fuzzy-Import hashing technique which is the combination of fuzzy hashing and import hashing to improve the detection accuracy and overall performance of malware analysis. This proposed Fuzzy-Import hashing offers several benefits which are demonstrated through the experimentation performed on the collected malware samples and compared against stand-alone techniques of fuzzy hashing and import hashing

    Augmented YARA Rules Fused with Fuzzy Hashing in Ransomware Triaging

    Get PDF
    Triaging is an initial stage of malware analysis to assess whether a sample is malware or not and the degree of similarity it holds with known malware. It can be applied to any malware category such as ransomware, which is a type of malware that blocks access to a system or data, usually by encrypting it. It has become the main modus operandi for cybercriminals to extort monies from victims due to the growth of cryptocurrencies. Consequently, it severely affects all types of users whether they be from corporates or ordinary home users. Ransomware can be prevented in several different ways, however, the simple and initial step in prevention is its triaging without execution. Several triaging methods are in use such as fuzzy hashing, import hashing and YARA rules, amongst all, YARA rules are one of the most popular and widely used methods. Nonetheless, its success or failure is dependent on the quality of rules employed for malware triaging. This paper performs ransomware triaging using fuzzy hashing, import hashing and YARA rules and demonstrates how YARA rules can be improved using fuzzy hashing to obtain relatively better triaging results. Subsequently, it proposes the augmented YARA rules fused with fuzzy hashing to obtain improved triaging results and performance efficiency in comparison to all three triaging methods individually. Finally, the paper demonstrates how the use of the fused YARA rules can improve triaging results irrespective of the type of malware
    • …
    corecore