1,174 research outputs found
Fast Exact Search in Hamming Space with Multi-Index Hashing
There is growing interest in representing image data and feature descriptors
using compact binary codes for fast near neighbor search. Although binary codes
are motivated by their use as direct indices (addresses) into a hash table,
codes longer than 32 bits are not being used as such, as it was thought to be
ineffective. We introduce a rigorous way to build multiple hash tables on
binary code substrings that enables exact k-nearest neighbor search in Hamming
space. The approach is storage efficient and straightforward to implement.
Theoretical analysis shows that the algorithm exhibits sub-linear run-time
behavior for uniformly distributed codes. Empirical results show dramatic
speedups over a linear scan baseline for datasets of up to one billion codes of
64, 128, or 256 bits
Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the
Lempel-Ziv-Welch computation based on trie data structures. With a careful
selection of trie representations we can beat well-tuned popular trie data
structures like Judy, m-Bonsai or Cedar
A New Edit Distance for Fuzzy Hashing Applications
7 páginas, 5 tablas, 2 algoritmos. Comunicación presentada en: The 2015 World Congress in Computer Science, Computer Engineering, and Applied Computing (WORLDCOMP'15). The 2015 International Conference on Security and Management (SAM'15), Las Vegas, USA, July 27 - 30Similarity preserving hashing applications, also
known as fuzzy hashing functions, help to analyse the content
of digital devices by performing a resemblance comparison
between different files. In practice, the similarity matching
procedure is a two-step process, where first a signature
associated to the files under comparison is generated, and
then a comparison of the signatures themselves is performed.
Even though
ssdeep
is the best-known application in
this field, the edit distance algorithm that
ssdeep
uses for
performing the signature comparison is not well-suited for
certain scenarios. In this contribution we present a new edit
distance algorithm that better reflects the similarity of two
strings, and that can be used by fuzzy hashing applications
in order to improve their results.This work has been partially supported by Comunidad
de Madrid (Spain) under the project S2013/ICE-3095-CM
(CIBERDINE) and by Ministerio de Economía y Com-
petitividad (Spain) under the grant TIN2014-55325-C2-1-R
(ProCriCiS).Peer reviewe
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Slender PUF Protocol: A lightweight, robust, and secure authentication by substring matching
We introduce Slender PUF protocol, an efficient
and secure method to authenticate the responses
generated from a Strong Physical Unclonable Function
(PUF). The new method is lightweight, and suitable for
energy constrained platforms such as ultra-low power embedded
systems for use in identification and authentication
applications. The proposed protocol does not follow the
classic paradigm of exposing the full PUF responses (or
a transformation of the full string of responses) on the
communication channel. Instead, random subsets of the
responses are revealed and sent for authentication. The
response patterns are used for authenticating the prover
device with a very high probability.We perform a thorough
analysis of the method’s resiliency to various attacks
which guides adjustment of our protocol parameters for
an efficient and secure implementation. We demonstrate
that Slender PUF protocol, if carefully designed, will be
resilient against all known machine learning attacks. In
addition, it has the great advantage of an inbuilt PUF error
tolerance. Thus, Slender PUF protocol is lightweight and
does not require costly additional error correction, fuzzy
extractors, and hash modules suggested in most previously
known PUF-based robust authentication techniques. The
low overhead and practicality of the protocol are confirmed
by a set of hardware implementation and evaluations
Automated Evaluation of Approximate Matching Algorithms on Real Data
Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similarrepresentations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation. This work\u27s contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance
- …