Search CORE

246 research outputs found

Image similarity using dynamic time warping of fractal features

Author: Ibrahim Ahmed
Valli Craig
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2015
Field of study

Hashing algorithms such as MD/SHA variants have been used for years by forensic investigators to look for known artefacts of interest such as malicious files. However, such hashing algorithms are not effective when their hashes change with the slightest alteration in the file. Fuzzy hashing overcame this limitation to a certain extent by providing a close enough measure for slight modifications. As such, image forensics is an essential part of any digital crime investigation, especially in cases involving child pornography. Unfortunately such hashing algorithms can be thwarted easily by operations as simple as saving the original file in a different image format. This paper introduces a novel technique for measuring image similarity using Dynamic Time Warping (DTW) of fractal features taken from the frequency domain. DTW has traditionally been used successfully for speech recognition. Our experiments have shown that it is also effective for measuring image similarity while tolerating minor modifications, which is currently not capable by state-of-the-art tools

Research Online @ ECU

Bytewise Approximate Matching: The Good, The Bad, and The Unknown

Author: Baggili Ibrahim
Breitinger Frank
Harichandran Vikram S.
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature

Digital Commons @ New Haven

Embry-Riddle Aeronautical University

Towards Syntactic Approximate Matching - A Pre-Processing Experiment

Author: Breitinger Frank
Jeong Doowon
Kang Hari
Lee Sangjin
Publication venue: (Print) 1558-7215
Publication date: 01/01/2016
Field of study

Over the past few years the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a le) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper we show that by using simple pre-processing, we signicantly can in uence the outcome. Although our test set is based on text-based le types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show, that it can be benecial to focus on the content of les only (depending on the use-case). While for this experiment we utilized text les, Additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which les are similar and how)

Embry-Riddle Aeronautical University

Towards Syntactic Approximate Matching-A Pre-Processing Experiment

Author: Breitinger Frank
Jeong Doowon
Kang Hari
Lee Sangjin
Publication venue: Digital Commons @ New Haven
Publication date: 01/01/2016
Field of study

Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how)

Digital Commons @ New Haven

Embry-Riddle Aeronautical University

Scalable Graph Building from Text Data

Author: Debatty Thibault
Mees Wim
Michiardi Pietro
Thonnard Olivier
Publication venue: PMLR
Publication date: 01/01/2014
Field of study

International audienceIn this paper we propose NNCTPH, a new MapReduce algorithm that is able to build an approximate k-NN graph from large text datasets. The algorithm uses a modified version of Context Triggered Piecewise Hashing to bin the input data into buckets, and uses an exhaustive search inside the buckets to build the graph. It also uses multiple stages to join the different unconnected subgraphs. We experimentally test the algorithm on different datasets consisting of the subject of spam emails. Although the algorithm is still at an early development stage, it already proves to be four times faster than a MapReduce implementation of NN-Descent, for the same quality of produced graph

Enhancing the Performance of Intrusion Detection System by Minimizing the False Alarm Detection Using Fuzzy Logic

Author: Alshroqee Mohammed O.
Batiha Khaled
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/01/2018
Field of study

According to the information technology and regarding to the revolutions of the computer worlds, this world has got important information and files that have to be secured from different types of attacks that corrupt and distort them. Thus, many algorithms have turned up to increase the level of security and to detect all types of such attacks. Furthermore, many algorithms such as Message Digest algorithm 5 (MD5) and Secure Hash Algorithm 1 (SHA-1) tend to detect whether the file is attacked, corrupt and distorted or not. In addition, there should be more algorithms to detect the range of harm which the files are exposed to in order to make sure we can use these files after they have been affected by such attacks. To be clear, MD5 and SHA-1 consider the file corrupt once it is attacked; regardless the rate of change .Therefore, the aim of this paper is to use an algorithm that allows certain rate of change according to the user, which is SSdeep algorithm. Meanwhile, it gives the rates of change depending on the importance of each file. Moreover, each rate of change determines whether we can make use of the file or not. I made assumption in creating four folders, each contains multiple files with minimum predefined allowed of similarity. Then graphical user interface is created to utilize the SSdeep algorithm and to permit user to define the allowed similarity on each folder or file depending on impotency of it. After applying the algorithm, I got results showing the benefits of such algorithm to make use of these attacked or modified files. Keywords: Intrusion Detection System, false alarm, fuzzy logic, computer securit

International Institute for Science, Technology and Education (IISTE): E-Journals

A ransomware detection method using fuzzy hashing for mitigating the risk of occlusion of information systems

Author: Jenkins Paul
Naik Nitin
Savage Nick
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2020
Field of study

Crossref

Portsmouth University Research Portal (Pure)

A Fuzzy Hashing Approach Based on Random Sequences and Hamming Distance

Author: Baier Harald
Breitinger Frank
Publication venue: Scholarly Commons
Publication date: 01/05/2012
Field of study

Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to ‘rebuild’ an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l= 128 ). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l , and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file’s bbHash. We discuss (dis- )advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Keywords: Fuzzy hashing, similarity preserving hash function, similarity digests, Hamming distance, computer forensics

TUbiblio

Embry-Riddle Aeronautical University

A Case-Based Reasoning Method for Locating Evidence During Digital Forensic Device Triage

Author: Horsman Graeme
Laing Christopher
Vickers Paul
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The role of triage in digital forensics is disputed, with some practitioners questioning its reliability for identifying evidential data. Although successfully implemented in the field of medicine, triage has not established itself to the same degree in digital forensics. This article presents a novel approach to triage for digital forensics. Case-Based Reasoning Forensic Triager (CBR-FT) is a method for collecting and reusing past digital forensic investigation information in order to highlight likely evidential areas on a suspect operating system, thereby helping an investigator to decide where to search for evidence. The CBR-FT framework is discussed and the results of twenty test triage examinations are presented. CBR-FT has been shown to be a more effective method of triage when compared to a practitioner using a leading commercial application

Northumbria Research Link

Teeside University's Research Repository