351 research outputs found
Evaluating and Optimizing IP Lookup on Many core Processors
International audienceIn recent years, there has been a growing interest in multi/many core processors as a target architecture for high performance software router. Because of its key position in routers, hardware IP lookup implementation has been intensively studied with TCAM and FPGA based architecture. However, increasing interest in software implementation has also been observed. In this paper, we evaluate the performance of software only IP lookup on a many core chip, the TILEPro64 processor. For this purpose we have implemented two widely used IP lookup algorithms, DIR-24-8-BASIC and Tree Bitmap. We evaluate the performance of these two algorithms over the TILEPro64 processor with both synthetic and real-world traces. After a detailed analysis, we propose a hybrid scheme which provides high lookup speed and low worst case update overhead. Our work shows how to exploit the architectural features of TILEPro64 to improve the performance, including many optimization in both single-core and parallelism aspects. Experiment results show by using only 18 cores, we can achieve a lookup throughput of 60Mpps (almost 40Gbps) with low power consumption, which demonstrates great performance potentials in many core processor
Design and Evaluation of Packet Classification Systems, Doctoral Dissertation, December 2006
Although many algorithms and architectures have been proposed, the design of efficient packet classification systems remains a challenging problem. The diversity of filter specifications, the scale of filter sets, and the throughput requirements of high speed networks all contribute to the difficulty. We need to review the algorithms from a high-level point-of-view in order to advance the study. This level of understanding can lead to significant performance improvements. In this dissertation, we evaluate several existing algorithms and present several new algorithms as well. The previous evaluation results for existing algorithms are not convincing because they have not been done in a consistent way. To resolve this issue, an objective evaluation platform needs to be developed. We implement and evaluate several representative algorithms with uniform criteria. The source code and the evaluation results are both published on a web-site to provide the research community a benchmark for impartial and thorough algorithm evaluations. We propose several new algorithms to deal with the different variations of the packet classification problem. They are: (1) the Shape Shifting Trie algorithm for longest prefix matching, used in IP lookups or as a building block for general packet classification algorithms; (2) the Fast Hash Table lookup algorithm used for exact flow match; (3) the longest prefix matching algorithm using hash tables and tries, used in IP lookups or packet classification algorithms;(4) the 2D coarse-grained tuple-space search algorithm with controlled filter expansion, used for two-dimensional packet classification or as a building block for general packet classification algorithms; (5) the Adaptive Binary Cutting algorithm used for general multi-dimensional packet classification. In addition to the algorithmic solutions, we also consider the TCAM hardware solution. In particular, we address the TCAM filter update problem for general packet classification and provide an efficient algorithm. Building upon the previous work, these algorithms significantly improve the performance of packet classification systems and set a solid foundation for further study
Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries
An immutable multi-map is a many-to-many thread-friendly map data structure
with expected fast insert and lookup operations. This data structure is used
for applications processing graphs or many-to-many relations as applied in
static analysis of object-oriented systems. When processing such big data sets
the memory overhead of the data structure encoding itself is a memory usage
bottleneck. Motivated by reuse and type-safety, libraries for Java, Scala and
Clojure typically implement immutable multi-maps by nesting sets as the values
with the keys of a trie map. Like this, based on our measurements the expected
byte overhead for a sparse multi-map per stored entry adds up to around 65B,
which renders it unfeasible to compute with effectively on the JVM.
In this paper we propose a general framework for Hash-Array Mapped Tries on
the JVM which can store type-heterogeneous keys and values: a Heterogeneous
Hash-Array Mapped Trie (HHAMT). Among other applications, this allows for a
highly efficient multi-map encoding by (a) not reserving space for empty value
sets and (b) inlining the values of singleton sets while maintaining a (c)
type-safe API.
We detail the necessary encoding and optimizations to mitigate the overhead
of storing and retrieving heterogeneous data in a hash-trie. Furthermore, we
evaluate HHAMT specifically for the application to multi-maps, comparing them
to state-of-the-art encodings of multi-maps in Java, Scala and Clojure. We
isolate key differences using microbenchmarks and validate the resulting
conclusions on a real world case in static analysis. The new encoding brings
the per key-value storage overhead down to 30B: a 2x improvement. With
additional inlining of primitive values it reaches a 4x improvement
An algorithm for fast route lookup and update
Increase in routing table sizes, number of updates, traffic, speed of links and migration to IPv6 have made IP address lookup, based on longest prefix matching, a major bottleneck for high performance routers. Several schemes are evaluated and compared based on complexity analysis and simulation results. A trie based scheme, called Linked List Cascade Addressable Trie (LLCAT) is presented. The strength of LLCAT comes from the fact that it is easy to be implemented in hardware, and also routing table update operations are performed incrementally requiring very few memory operations guaranteed for worst case to satisfy requirements of dynamic routing tables in high speed routers. Application of compression schemes to this algorithm is also considered to improve memory consumption and search time. The algorithm is implemented in C language and simulation results with real-life data is presented along with detailed description of the algorithm
- …