38 research outputs found

    MLET: A Power Efficient Approach for TCAM Based, IP Lookup Engines in Internet Routers

    Full text link
    Routers are one of the important entities in computer networks specially the Internet. Forwarding IP packets is a valuable and vital function in Internet routers. Routers extract destination IP address from packets and lookup those addresses in their own routing table. This task is called IP lookup. Internet address lookup is a challenging problem due to the increasing routing table sizes. Ternary Content-Addressable Memories (TCAMs) are becoming very popular for designing high-throughput address lookup-engines on routers: they are fast, cost-effective and simple to manage. Despite the TCAMs speed, their high power consumption is their major drawback. In this paper, Multilevel Enabling Technique (MLET), a power efficient TCAM based hardware architecture has been proposed. This scheme is employed after an Espresso-II minimization algorithm to achieve lower power consumption. The performance evaluation of the proposed approach shows that it can save considerable amount of routing table's power consumption.Comment: 14 Pages, IJCNC 201

    Fast Packet Classification Using Bloom Filters

    Get PDF
    While the problem of general packet classification has received a great deal of attention from researchers over the last ten years, there is still no really satisfactory solution. Ternary Content Addressable Memory (TCAM), although widely used in practice, is both expensive and consumes a lot of power. Algorithmic solutions, which rely on commodity memory chips, are relatively inexpensive and power-efficient, but have not been able to match the generality and performance of TCAMs. In this paper we propose a new approach to packet classification, which combines architectural and algorithmic techniques. Our starting point is the well-known crossproducting algorithm, which is fast but has significant memory overhead due to the extra rules needed to represent the crossproducts. We show how to modify the crossproduct method in a way that drastically reduces the memory required, without compromising on performance. We avoid unnecessary accesses to off-chip memory by filtering off-chip accesses using on-chip Bloom filters. For packets that match p rules in a rule set, our algorithm requires just 4 + p + ǫ independent memory accesses on average, to return all matching rules, where ǫ á 1 is a small constant that depends on the false positive rate of the Bloom filters. Each memory access is just 256 bits, making it practical to classify small packets at OC-192 link rates using two commodity SRAM chips. For rule set sizes ranging from a few hundred to several thousand filters, the average rule set expansion factor attributable to the algorithm is just 1.2. The memory consumption per rule is 36 bytes in the average case

    Towards more power efficient IP lookup engines

    Get PDF
    The IP lookup in internet routers requires implementation of the longest prefix match algorithm. The software or hardware implementations of routing trie based approaches require several memory accesses in order to perform a single memory lookup, which limits the throughput considerably. On the other hand, IP lookup throughput requirements have been continuously increasing. This has led to ternary content addressable memory(TCAM) based IP lookup engines which can perform a single lookup every cycle. TCAM lookup engines are very power hungry due to the large number of entries which need to be simultaneously searched. This has led to two disparate streams of research into power reduction techniques. The first research stream focuses on the routing table compaction using logic minimization techniques. The second stream focuses on routing table partitioning. This work proposes to bridge the gap by employing strategies to combine these two leading state of the art schemes. The existing partitioning algorithms are generally employed on a binary routing trie precluding their application to a compacted routing table. The proposed scheme employs a ternary routing trie to facilitate the representation of the minimized routing table in combination with the ternary trie partitioning algorithm. The combined scheme offers up to 50% reduction in silicon area while maintaining the power economy of the partitioning scheme

    A Scalable High-Performance Memory-Less IP Address Lookup Engine Suitable for FPGA Implementation

    Get PDF
    RÉSUMÉ La recherche d'adresse IP est une opération très importante pour les routeurs Internet modernes. De nombreuses approches dans la littérature ont été proposées pour réaliser des moteurs de recherche d'adresse IP (Address Lookup Engine – ALE), à haute performance. Les ALE existants peuvent être classés dans l’une ou l’autre de trois catégories basées sur: les mémoires ternaires adressables par le contenu (TCAM), les Trie et les émulations de TCAM. Les approches qui se basent sur des TCAM sont coûteuses et elles consomment beaucoup d'énergie. Les techniques qui exploitent les Trie ont une latence non déterministe qui nécessitent généralement des accès à une mémoire externe. Les techniques qui exploitent des émulations de TCAM combinent généralement des TCAM avec des circuits à faible coût. Dans ce mémoire, l'objectif principal est de proposer une architecture d'ALE qui permet la recherche rapide d’adresses IP et qui apporte une solution aux principales lacunes des techniques basées sur des TCAM et sur des Trie. Atteindre une vitesse de traitement suffisante dans l'ALE est un aspect important. Des accélérateurs matériels ont été adoptés pour obtenir une le résultat de recherche à haute vitesse. Le FPGA permettent la mise en œuvre d’accélérateurs matériels reconfigurables spécialisés. Cinq architectures d’ALE de type émulation de TCAM sont proposés dans ce mémoire : une sérielle, une parallèle, une architecture dite IP-Split, une variante appelée IP-Split-Bucket et une version de l’IP-Split-Bucket qui supporte les mises à jours. Chaque architecture est construite à partir de l’architecture précédente de manière progressive dans le but d’en améliorer les performances. L'architecture sérielle utilise des mémoires pour stocker la table d’adresses de transmission et un comparateur pour effectuer une recherche sérielle sur les entrées. L'architecture parallèle stocke les entrées de la table dans les ressources logiques d’un FPGA, et elle emploie une recherche parallèle en utilisant N comparateurs pour une table avec N entrées. L’architecture IP-Split emploie un niveau de décodeurs pour éviter des comparaisons répétitives dans les entrées équivalentes de la table. L'architecture IP-Split-Bucket est une version améliorée de l'architecture précédente qui utilise une méthode de partitionnement visant à optimiser l'architecture IP-Split. L’IP-Split-Bucket qui supporte les mises à jour est la dernière architecture proposée. Elle soutient la mise à jour et la recherche à haute vitesse d'adresses IP. Les résultats d’implémentations montrent que l'architecture d’ALE qui offre les meilleures performances est l’IP-Split-Bucket, qui n’a pas recours à une ou plusieurs mémoires. Pour une table d’adresses de transmission IPv4 réelle comportant 524 k préfixes, l'architecture IP-Split-Bucket atteint un débit de 103,4 M paquets par seconde et elle consomme respectivement 23% et 22% des tables de conversion (LUTs) et des bascules (FFs) sur une puce Xilinx XC7V2000T.----------ABSTRACT High-performance IP address lookup is highly demanded for modern Internet routers. Many approaches in the literature describe a special purpose Address Lookup Engines (ALE), for IP address lookup. The existing ALEs can be categorised into the following techniques: Ternary Content Addressable Memories-based (TCAM-based), trie-based and TCAM-emulation. TCAM-based techniques are expensive and consume a lot of power, since they employ TCAMs in their architecture. Trie-based techniques have nondeterministic latency and external memory accesses, since they store the Forwarding Information Base (FIB) in the memory using a trie data structure. TCAM-emulation techniques commonly combine TCAMs with lower-cost circuits that handle less time-critical activities. In this thesis, the main objective is to propose an ALE architecture with fast search that addresses the main shortcomings of TCAM-based and trie-based techniques. Achieving an admissible throughput in the proposed ALE is its fundamental requirement due to the recent improvements of network systems and growth of Internet of Things (IoTs). For that matter, hardware accelerators have been adopted to achieve a high speed search. In this work, Field Programmable Gate Arrays (FPGAs) are specialized reconfigurable hardware accelerators chosen as the target platform for the ALE architecture. Five TCAM-emulation ALE architectures are proposed in this thesis: the Full-Serial, the Full-Parallel, the IP-Split, the IP-Split-Bucket and the Update-enabled IP-Split-Bucket architectures. Each architecture builds on the previous one with progressive improvements. The Full-Serial architecture employs memories to store the FIB and one comparator to perform a serial search on the FIB entries. The Full-Parallel architecture stores the FIB entries into the logical resources of the FPGA and employs a parallel search using one comparator for each FIB entry. The IP-Split architecture employs a level of decoders to avoid repetitive comparisons in the equivalent entries of the FIB. The IP-Split-Bucket architecture is an upgraded version of the previous architecture using a partitioning scheme aiming to optimize the IP-Split architecture. Finally, the Update-enabled IP-Split-Bucket supports high-update rate IP address lookup. The most efficient proposed architecture is the IP-Split-Bucket, which is a novel high-performance memory-less ALE. For a real-world FIB with 524 k IPv4 prefixes, IP-Split-Bucket achieves a throughput of 103.4M packets per second and consumes respectively 23% and 22% of the Look Up Tables (LUTs) and Flip-Flops (FFs) of a Xilinx XC7V2000T chip

    Models, Algorithms, and Architectures for Scalable Packet Classification

    Get PDF
    The growth and diversification of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of traffic in the Internet, are being called upon to not only handle increased volumes of traffic at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to defined traffic flows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classification problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and filter sets and increasingly complex filters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Prefix Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classification techniques, a thorough analysis of packet classification filter sets, the design and analysis of a suite of performance evaluation tools for packet classification algorithms and devices, and a new packet classification algorithm that scales to support high-speed links and large filter sets classifying on additional packet fields

    Efficient binary cutting packet classification

    Get PDF
    Packet classification is the process of distributing packets into ‘flows’ in an internet router. Router processes all packets which belong to predefined rule sets in similar manner& classify them to decide upon what all services packet should receive. It plays an important role in both edge and core routers to provideadvanced network service such as quality of service, firewalls and intrusion detection. These services require the ability to categorize & isolate packet traffic in different flows for proper processing. Packet classification remains a classical problem, even though lots of researcher working on the problem. Existing algorithms such asHyperCuts,boundary cutting and HiCuts have achieved an efficient performance by representing rules in geometrical method in a classifier and searching for a geometric subspace to which each inputpacket belongs. Some fixed interval-based cutting not relating to the actual space that eachrule covers is ineffective and results in a huge storage requirement. However, the memoryconsumption of these algorithms remains quite high when high throughput is required.Hence in this paper we are proposing a new efficient splitting criterion which is memory andtime efficient as compared to other mentioned techniques. Our proposed approach known as (ABC) Adaptive Binary Cuttingproducesa set of different-sized cuts at each decision step, with the goal to balance the distribution offilters and to reduce the filter duplication effect. The proposed algorithmuses stronger andmore straightforward criteria for decision treeconstruction. Experimental results will showthe effectiveness of proposed algorithm as compared to existing algorithm using differentparameters such as time & memory. In this paper, no symmetrical size cut at each decision node, with aim to make a distribution of filters balanced and also to reduce redundancy in filter

    On using content addressable memory for packet classification

    Get PDF
    Packet switched networks such as the Internet require packet classification at every hop in order to ap-ply services and security policies to traffic flows. The relentless increase in link speeds and traffic volume imposes astringent constraints on packet classification solutions. Ternary Content Addressable Memory (TCAM) devices are favored by most network component and equipment vendors due to the fast and de-terministic lookup performance afforded by their use of massive parallelism. While able to keep up with high speed links, TCAMs suffer from exorbitant power consumption, poor scalability to longer search keys and larger filter sets, and inefficient support of multiple matches. The research community has responded with algorithms that seek to meet the lookup rate constraint with greater efficiency through the use of com-modity Random Access Memory (RAM) technology. The most promising algorithms efficiently achieve high lookup rates by leveraging the statistical structure of real filter sets. Due to their dependence on filter set characteristics, it is difficult to provision processing and memory resources for implementations that support a wide variety of filter sets. We show how several algorithmic advances may be leveraged to im-prove the efficiency, scalability, incremental update and multiple match performance of CAM-based packet classification techniques without degrading the lookup performance. Our approach, Label Encoded Content Addressable Memory (LECAM), represents a hybrid technique that utilizes decomposition, label encoding, and a novel Content Addressable Memory (CAM) architecture. By reducing the number of implementation parameters, LECAM provides a vehicle to carry several of the recent algorithmic advances into practice. We provide a thorough overview of CAM technologies and packet classification algorithms, along with a detailed discussion of the scaling issues that arise with longer search keys and larger filter sets. We also provide a comparative analysis of LECAM and standard TCAM using a collection of real and synthetic filter sets of various sizes and compositions
    corecore