Search CORE

6,515 research outputs found

Recommended from our members

GPERF : a perfect hash function generator

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup

eScholarship - University of California

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

Author: Brown C. Titus
Canino-Koning Rosangela
Howe Adina Chuang
Pell Jason
Zhang Qingpeng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/07/2014
Field of study

K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Faster tuple lattice sieving using spherical locality-sensitive filters

Author: Laarhoven Thijs
Publication venue
Publication date: 08/05/2017
Field of study

To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehl\'{e} [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve can solve the shortest vector problem (SVP) in dimension

d

in time

2^{0.3717d + o(d)}

, using a technique similar to locality-sensitive hashing for finding nearest neighbors. In this work, we generalize the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near neighbor searching on dense data sets, and we apply these techniques to tuple lattice sieving to obtain even better time complexities. For instance, our triple sieve heuristically solves SVP in time

2^{0.3588d + o(d)}

. For practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this shows that a triple sieve uses less space and less time than the current best near-linear space double sieve.Comment: 12 pages + references, 2 figures. Subsumed/merged into Cryptology ePrint Archive 2017/228, available at https://ia.cr/2017/122

arXiv.org e-Print Archive

Pure OAI Repository

Asymptotic Analysis of Plausible Tree Hash Modes for SHA-3

Author: Atighehchi Kevin
Bonnecaze Alexis
Publication venue
Publication date: 01/01/2017
Field of study

Discussions about the choice of a tree hash mode of operation for a standardization have recently been undertaken. It appears that a single tree mode cannot address adequately all possible uses and specifications of a system. In this paper, we review the tree modes which have been proposed, we discuss their problems and propose remedies. We make the reasonable assumption that communicating systems have different specifications and that software applications are of different types (securing stored content or live-streamed content). Finally, we propose new modes of operation that address the resource usage problem for the three most representative categories of devices and we analyse their asymptotic behavior

arXiv.org e-Print Archive

HAL - Normandie Université

HAL AMU

Directory of Open Access Journals

Ruhr-Universität Bochum (RUB): Open Journal Systems

Cryptology ePrint Archive

Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

Author: A Poyias
D Arroyuelo
D Lemire
D Lemire
D Lemire
G Marsaglia
GH Gonnet
H Bannai
H Luan
J Fischer
J Fischer
J Jansson
J Kärkkäinen
J Ziv
J Ziv
JA Feldman
JG Cleary
K Chung
L Carter
P Tchebychev
RM Karp
RM Robinson
TA Welch
Y Nakashima
Publication venue
Publication date: 09/06/2017
Field of study

We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

arXiv.org e-Print Archive

Crossref

Lessons learned from the design of a mobile multimedia system in the Moby Dick project

Author: Havinga Paul J.M.
Smit Gerard J.M.
Publication venue: Springer
Publication date: 01/01/2000
Field of study

Recent advances in wireless networking technology and the exponential development of semiconductor technology have engendered a new paradigm of computing, called personal mobile computing or ubiquitous computing. This offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The research performed in the MOBY DICK project is about designing such a mobile multimedia system. This paper discusses the approach made in the MOBY DICK project to solve some of these problems, discusses its contributions, and accesses what was learned from the project

University of Twente Research Information

Evaluation of network coding techniques for a sniper detection application

Author: Argyraki Katerina J
Fragouli Christina
Karaağaç Abdulkadir
Keller Lorenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

This paper experimentally studies the reliability and delay of flooding based multicast protocols for a sniper detection application. In particular using an emulator it studies under which conditions protocols based on network coding deliver performance improvements compared to classic flooding. It then presents an implementation of such protocols on mobile phones

Crossref

Ghent University Academic Bibliography