Search CORE

1,174 research outputs found

Fast Exact Search in Hamming Space with Multi-Index Hashing

Author: Fleet David J.
Norouzi Mohammad
Punjani Ali
Publication venue
Publication date: 24/04/2014
Field of study

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits

arXiv.org e-Print Archive

CiteSeerX

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

Author: A Poyias
D Arroyuelo
D Lemire
D Lemire
D Lemire
G Marsaglia
GH Gonnet
H Bannai
H Luan
J Fischer
J Fischer
J Jansson
J Kärkkäinen
J Ziv
J Ziv
JA Feldman
JG Cleary
K Chung
L Carter
P Tchebychev
RM Karp
RM Robinson
TA Welch
Y Nakashima
Publication venue
Publication date: 09/06/2017
Field of study

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

arXiv.org e-Print Archive

Crossref

A New Edit Distance for Fuzzy Hashing Applications

Author: Gayoso Martínez Víctor
Hernández Encinas Luis
Hernández Álvarez Fernando
Sánchez Ávila Carmen
Publication venue
Publication date: 27/07/2015
Field of study

7 páginas, 5 tablas, 2 algoritmos. Comunicación presentada en: The 2015 World Congress in Computer Science, Computer Engineering, and Applied Computing (WORLDCOMP'15). The 2015 International Conference on Security and Management (SAM'15), Las Vegas, USA, July 27 - 30Similarity preserving hashing applications, also known as fuzzy hashing functions, help to analyse the content of digital devices by performing a resemblance comparison between different files. In practice, the similarity matching procedure is a two-step process, where first a signature associated to the files under comparison is generated, and then a comparison of the signatures themselves is performed. Even though ssdeep is the best-known application in this field, the edit distance algorithm that ssdeep uses for performing the signature comparison is not well-suited for certain scenarios. In this contribution we present a new edit distance algorithm that better reflects the similarity of two strings, and that can be used by fuzzy hashing applications in order to improve their results.This work has been partially supported by Comunidad de Madrid (Spain) under the project S2013/ICE-3095-CM (CIBERDINE) and by Ministerio de Economía y Com- petitividad (Spain) under the grant TIN2014-55325-C2-1-R (ProCriCiS).Peer reviewe

Digital.CSIC

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

Slender PUF Protocol: A lightweight, robust, and secure authentication by substring matching

Author: Devadas Srinivas
Koushanfar Farinaz
Majzoobi Mehrdad
Rostami Masoud
Wallach Dan S.
Publication venue: IEEE
Publication date: 01/01/2012
Field of study

We introduce Slender PUF protocol, an efficient and secure method to authenticate the responses generated from a Strong Physical Unclonable Function (PUF). The new method is lightweight, and suitable for energy constrained platforms such as ultra-low power embedded systems for use in identification and authentication applications. The proposed protocol does not follow the classic paradigm of exposing the full PUF responses (or a transformation of the full string of responses) on the communication channel. Instead, random subsets of the responses are revealed and sent for authentication. The response patterns are used for authenticating the prover device with a very high probability.We perform a thorough analysis of the method’s resiliency to various attacks which guides adjustment of our protocol parameters for an efficient and secure implementation. We demonstrate that Slender PUF protocol, if carefully designed, will be resilient against all known machine learning attacks. In addition, it has the great advantage of an inbuilt PUF error tolerance. Thus, Slender PUF protocol is lightweight and does not require costly additional error correction, fuzzy extractors, and hash modules suggested in most previously known PUF-based robust authentication techniques. The low overhead and practicality of the protocol are confirmed by a set of hardware implementation and evaluations

DSpace at Rice University

Automated Evaluation of Approximate Matching Algorithms on Real Data

Author: Breitinger Frank
Roussev Vassil
Publication venue: Digital Commons @ New Haven
Publication date: 01/01/2014
Field of study

Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similarrepresentations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation. This work\u27s contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance

Elsevier - Publisher Connector

Crossref

Digital Commons @ New Haven