79 research outputs found

    Similarity Hashing Based on Levenshtein Distances

    No full text
    Part 2: Forensic TechniquesInternational audienceIt is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file.This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as +sdhash+

    Using approximate matching to reduce the volume of digital data

    No full text
    Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%

    Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

    No full text
    Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic Hash Functions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for Similarity Preserving Hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional Hash Functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of Similarity Preserving Hashing. Additionally, we extend the algorithm MRSH for Similarity Preserving Hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file

    Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees

    No full text
    Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of known-illegal files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a Hierarchical Bloom Filter Tree (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness

    An Augmented Reality Periscope for Submarines with Extended Visual Classification

    No full text
    Submarines are considered extremely strategic for any naval army due to their stealth capability. Periscopes are crucial sensors for these vessels, and emerging to the surface or periscope depth is required to identify visual contacts through this device. This maneuver has many procedures and usually has to be fast and agile to avoid exposure. This paper presents and implements a novel architecture for real submarine periscopes developed for future Brazilian naval fleet operations. Our system consists of a probe that is connected to the craft and carries a 360 camera. We project and take the images inside the vessel using traditional VR/XR devices. We also propose and implement an efficient computer vision-based MR technique to estimate and display detected vessels effectively and precisely. The vessel detection model is trained using synthetic images. So, we built and made available a dataset composed of 99,000 images. Finally, we also estimate distances of the classified elements, showing all the information in an AR-based interface. Although the probe is wired-connected, it allows for the vessel to stand in deep positions, reducing its exposure and introducing a new way for submarine maneuvers and operations. We validate our proposal through a user experience experiment using 19 experts in periscope operations

    Functional Vs Object-Oriented Distributed Languages

    No full text
    • …
    corecore