6,515 research outputs found
Recommended from our members
GPERF : a perfect hash function generator
gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
K-mer abundance analysis is widely used for many purposes in nucleotide
sequence analysis, including data preprocessing for de novo assembly, repeat
detection, and sequencing coverage estimation. We present the khmer software
package for fast and memory efficient online counting of k-mers in sequencing
data sets. Unlike previous methods based on data structures such as hash
tables, suffix arrays, and trie structures, khmer relies entirely on a simple
probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits
online updating and retrieval of k-mer counts in memory which is necessary to
support online k-mer analysis algorithms. On sparse data sets this data
structure is considerably more memory efficient than any exact data structure.
In exchange, the use of a Count-Min Sketch introduces a systematic overcount
for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we
analyze the speed, the memory usage, and the miscount rate of khmer for
generating k-mer frequency distributions and retrieving k-mer counts for
individual k-mers. We also compare the performance of khmer to several other
k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC,
Turtle and KAnalyze. Finally, we examine the effectiveness of profiling
sequencing error, k-mer abundance trimming, and digital normalization of reads
in the context of high khmer false positive rates. khmer is implemented in C++
wrapped in a Python interface, offers a tested and robust API, and is freely
available under the BSD license at github.com/ged-lab/khmer
Faster tuple lattice sieving using spherical locality-sensitive filters
To overcome the large memory requirement of classical lattice sieving
algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS
2016] studied tuple lattice sieving, where tuples instead of pairs of lattice
vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017]
recently improved upon their results for arbitrary tuple sizes, for example
showing that a triple sieve can solve the shortest vector problem (SVP) in
dimension in time , using a technique similar to
locality-sensitive hashing for finding nearest neighbors.
In this work, we generalize the spherical locality-sensitive filters of
Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near
neighbor searching on dense data sets, and we apply these techniques to tuple
lattice sieving to obtain even better time complexities. For instance, our
triple sieve heuristically solves SVP in time . For
practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this
shows that a triple sieve uses less space and less time than the current best
near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology
ePrint Archive 2017/228, available at https://ia.cr/2017/122
Asymptotic Analysis of Plausible Tree Hash Modes for SHA-3
Discussions about the choice of a tree hash mode of operation for a
standardization have recently been undertaken. It appears that a single tree
mode cannot address adequately all possible uses and specifications of a
system. In this paper, we review the tree modes which have been proposed, we
discuss their problems and propose remedies. We make the reasonable assumption
that communicating systems have different specifications and that software
applications are of different types (securing stored content or live-streamed
content). Finally, we propose new modes of operation that address the resource
usage problem for the three most representative categories of devices and we
analyse their asymptotic behavior
Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries
We present the first thorough practical study of the Lempel-Ziv-78 and the
Lempel-Ziv-Welch computation based on trie data structures. With a careful
selection of trie representations we can beat well-tuned popular trie data
structures like Judy, m-Bonsai or Cedar
Lessons learned from the design of a mobile multimedia system in the Moby Dick project
Recent advances in wireless networking technology and the exponential development of semiconductor technology have engendered a new paradigm of computing, called personal mobile computing or ubiquitous computing. This offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The research performed in the MOBY DICK project is about designing such a mobile multimedia system. This paper discusses the approach made in the MOBY DICK project to solve some of these problems, discusses its contributions, and accesses what was learned from the project
Evaluation of network coding techniques for a sniper detection application
This paper experimentally studies the reliability and delay of flooding based multicast protocols for a sniper detection application. In particular using an emulator it studies under which conditions protocols based on network coding deliver performance improvements compared to classic flooding. It then presents an implementation of such protocols on mobile phones
- …