Search CORE

2 research outputs found

Nucleotide String Indexing using Range Matching

Author: Rashelbach Alon
Rottensterich Ori
Silberstien Mark
Publication venue
Publication date: 06/08/2023
Field of study

The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both memory efficient and fast. We observe that nucleotide sequences can be represented as integer ranges and leverage a range-matching algorithm based on neural networks to perform the lookup. We prototype Ranger in software and integrate it into the popular Minimap2 tool. Ranger achieves almost identical end-to-end performance as the original Minimap2, while occupying 1.7

\times

and 1.2

\times

less memory for short- and long-reads, respectively. With a limited memory capacity, Ranger achieves up to 4.3

\times

speedup for short reads compared to FM-Index, and up to 4.2

\times

and 1.8

\times

speedups for short- and long-reads, compared to hash-tables. Ranger opens up new opportunities in the context of hardware acceleration by reducing the memory footprint of long-seed indexes used in state-of-the-art alignment accelerators by up to 23

\times

which results with 3

\times

faster alignment and negligible accuracy degradation. Moreover, its worst case memory bandwidth and latency can be bounded in advance without the need to inflate DRAM capacity

arXiv.org e-Print Archive

A Computational Approach to Packet Classification

Author: CAIDA.
Firestone Daniel
Kingma Diederik P
Labs Habana
Mao Hongzi
Rashelbach Alon
Valadarsky Asaf
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/07/2020
Field of study

Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules. We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm that guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the rules to be compressed into model weights that fit into the hardware cache. Further, it takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector instructions, achieving a rate of tens of nanoseconds per lookup. Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geometric mean compression factor of 4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.Comment: To appear in SIGCOMM 202

arXiv.org e-Print Archive

Crossref