246 research outputs found

    Image similarity using dynamic time warping of fractal features

    Get PDF
    Hashing algorithms such as MD/SHA variants have been used for years by forensic investigators to look for known artefacts of interest such as malicious files. However, such hashing algorithms are not effective when their hashes change with the slightest alteration in the file. Fuzzy hashing overcame this limitation to a certain extent by providing a close enough measure for slight modifications. As such, image forensics is an essential part of any digital crime investigation, especially in cases involving child pornography. Unfortunately such hashing algorithms can be thwarted easily by operations as simple as saving the original file in a different image format. This paper introduces a novel technique for measuring image similarity using Dynamic Time Warping (DTW) of fractal features taken from the frequency domain. DTW has traditionally been used successfully for speech recognition. Our experiments have shown that it is also effective for measuring image similarity while tolerating minor modifications, which is currently not capable by state-of-the-art tools

    Bytewise Approximate Matching: The Good, The Bad, and The Unknown

    Get PDF
    Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature

    Towards Syntactic Approximate Matching - A Pre-Processing Experiment

    Get PDF
    Over the past few years the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a le) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper we show that by using simple pre-processing, we signicantly can in uence the outcome. Although our test set is based on text-based le types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show, that it can be benecial to focus on the content of les only (depending on the use-case). While for this experiment we utilized text les, Additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which les are similar and how)

    Towards Syntactic Approximate Matching-A Pre-Processing Experiment

    Get PDF
    Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how)

    Scalable Graph Building from Text Data

    Get PDF
    International audienceIn this paper we propose NNCTPH, a new MapReduce algorithm that is able to build an approximate k-NN graph from large text datasets. The algorithm uses a modified version of Context Triggered Piecewise Hashing to bin the input data into buckets, and uses an exhaustive search inside the buckets to build the graph. It also uses multiple stages to join the different unconnected subgraphs. We experimentally test the algorithm on different datasets consisting of the subject of spam emails. Although the algorithm is still at an early development stage, it already proves to be four times faster than a MapReduce implementation of NN-Descent, for the same quality of produced graph

    Enhancing the Performance of Intrusion Detection System by Minimizing the False Alarm Detection Using Fuzzy Logic

    Get PDF
    According to the information technology and regarding to the revolutions of the computer worlds, this world has got important information and files that have to be secured from different types of attacks that corrupt and distort them. Thus, many algorithms have turned up to increase the level of security and to detect all types of such attacks. Furthermore, many algorithms such as Message Digest algorithm 5 (MD5) and Secure Hash Algorithm 1 (SHA-1) tend to detect whether the file is attacked, corrupt and distorted or not. In addition, there should be more algorithms to detect the range of harm which the files are exposed to in order to make sure we can use these files after they have been affected by such attacks. To be clear, MD5 and SHA-1 consider the file corrupt once it is attacked; regardless the rate of change .Therefore, the aim of this paper is to use an algorithm that allows certain rate of change according to the user, which is SSdeep algorithm. Meanwhile, it gives the rates of change depending on the importance of each file. Moreover, each rate of change determines whether we can make use of the file or not. I made assumption in creating four folders, each contains multiple files with minimum predefined allowed of similarity. Then graphical user interface is created to utilize the SSdeep algorithm and to permit user to define the allowed similarity on each folder or file depending on impotency of it. After applying the algorithm, I got results showing the benefits of such algorithm to make use of these attacked or modified files. Keywords: Intrusion Detection System, false alarm, fuzzy logic, computer securit

    A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

    Get PDF
    Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l , and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis- )advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Keywords: Fuzzy hashing, similarity preserving hash function, similarity digests, Hamming distance, computer forensics

    A Case-Based Reasoning Method for Locating Evidence During Digital Forensic Device Triage

    Get PDF
    The role of triage in digital forensics is disputed, with some practitioners questioning its reliability for identifying evidential data. Although successfully implemented in the field of medicine, triage has not established itself to the same degree in digital forensics. This article presents a novel approach to triage for digital forensics. Case-Based Reasoning Forensic Triager (CBR-FT) is a method for collecting and reusing past digital forensic investigation information in order to highlight likely evidential areas on a suspect operating system, thereby helping an investigator to decide where to search for evidence. The CBR-FT framework is discussed and the results of twenty test triage examinations are presented. CBR-FT has been shown to be a more effective method of triage when compared to a practitioner using a leading commercial application
    • …
    corecore