479 research outputs found

    Data Fingerprinting -- Identifying Files and Tables with Hashing Schemes

    Get PDF
    Master's thesis in Computer scienceINTRODUCTION: Although hash functions are nothing new, these are not limited to cryptographic purposes. One important field is data fingerprinting. Here, the purpose is to generate a digest which serves as a fingerprint (or a license plate) that uniquely identifies a file. More recently, fuzzy fingerprinting schemes — which will scrap the avalanche effect in favour of detecting local changes — has hit the spotlight. The main purpose of this project is to find ways to classify text tables, and discover where potential changes or inconsitencies have happened. METHODS: Large parts of this report can be considered applied discrete mathematics — and finite fields and combinatorics have played an important part. Rabin’s fingerprinting scheme was tested extensively and compared against existing cryptographic algorithms, CRC and FNV. Moreover, a self-designed fuzzy hashing algorithm with the preliminary name No-Frills Hash has been created and tested against Nilsimsa and Spamsum. NFHash is based on Mersenne primes, and uses a sliding window to create a fuzzy hash. Futhermore, the usefullness of lookup tables (with partial seeds) were also explored. The fuzzy hashing algorithm has also been combined with a k-NN classifier to get an overview over it’s ability to classify files. In addition to NFHash, Bloom filters combined with Merkle Trees have been the most important part of this report. This combination will allow a user to see where a change was made, despite the fact that hash functions are one-way. Large parts of this project has dealt with the study of other open-source libraries and applications, such as Cassandra and SSDeep — as well as how bitcoins work. Optimizations have played a crucial role as well; different approaches to a problem might lead to the same solution, but resource consumption can be very different. RESULTS: The results have shown that the Merkle Tree-based approach can track changes to a table very quickly and efficiently, due to it being conservative when it comes to CPU resources. Moreover, the self-designed algorithm NFHash also does well in terms of file classification when it is coupled with a k-NN classifyer. CONCLUSION: Hash functions refers to a very diverse set of algorithms, and not just algorithms that serve a limited purpose. Fuzzy Fingerprinting Schemes can still be considered to be at their infant stage, but a lot has still happened the last ten years. This project has introduced two new ways to create and compare hashes that can be compared to similar, yet not necessarily identical files — or to detect if (and to what extent) a file was changed. Note that the algorithms presented here should be considered prototypes, and still might need some large scale testing to sort out potential flaw

    Fingerprint Database Privacy Guard: an Open-source System that Secures Fingerprints with Locality Sensitive Hashing Algorithms

    Get PDF
    Fingerprint identification is one of the most accurate sources of identification, yet it is not widely used in public facilities for security concerns. Moreover, the cost of fingerprint system is inaccessible for small-budget business because of their high cost. Therefore, this study created an open-source solution to secure fingerprint samples in the database while using low-cost hardware components. Locality Sensitive Hashing Algorithms such as ORB and Image hash were compared in this study as a potential alternative to SURF. To test the design, fifteen samples were collected and stored in a database without verifying the quality of the samples. Then, thirteen other samples were read from the sensor and forty-five permutations were created from the first fifteen samples. The results showed that a low-cost system can secure fingerprint sample in a database using Open-source technologies, but the identification process needs some improvement. Also, the study showed that image hash is a good alternative to SURF when the sensors readings are a force to one position

    Using Fuzzy Matching of Queries to optimize Database workloads

    Full text link
    Directed Acyclic Graphs (DAGs) are commonly used in Databases and Big Data computational engines like Apache Spark for representing the execution plan of queries. We refer to such graphs as Query Directed Acyclic Graphs (QDAGs). This paper uses similarity hashing to arrive at a fingerprint such that the fingerprint embodies the compute requirements of the query for QDAGs. The fingerprint, thus obtained, can be used to predict the runtime behaviour of a query based on queries executed in the past having similar QDAGs. We discuss two approaches to arrive at a fingerprint, their pros and cons and how aspects of both approaches can be combined to improve the predictions. Using a hybrid approach, we demonstrate that we are able to predict runtime behaviour of a QDAG with more than 80% accuracy.Comment: 9 pages, 5 figure

    Options for Securing RTP Sessions

    Get PDF
    The Real-time Transport Protocol (RTP) is used in a large number of different application domains and environments. This heterogeneity implies that different security mechanisms are needed to provide services such as confidentiality, integrity, and source authentication of RTP and RTP Control Protocol (RTCP) packets suitable for the various environments. The range of solutions makes it difficult for RTP-based application developers to pick the most suitable mechanism. This document provides an overview of a number of security solutions for RTP and gives guidance for developers on how to choose the appropriate security mechanism

    Fast Filtering of Known PNG Files Using Early File Features

    Get PDF
    A common task in digital forensics investigations is to identify known contraband images. This is typically achieved by calculating a cryptographic digest, using hashing algorithms such as SHA256, for each image on a given media, comparing individual digests with a database of known contraband. However, the large capacities of modern storage media, and increased time pressure on forensics examiners, necessitates that more efficient processing mechanisms be developed. This work describes a technique for creating signatures for images of the PNG format which only requires a tiny fraction of the file to effectively distinguish between a large number of images. Highly distinct, and compact, such analysis lays the foundation for future work in fast forensics filtering using subsets of evidential data
    • …
    corecore