66 research outputs found

    Energy Efficient Hardware Accelerators for Packet Classification and String Matching

    Get PDF
    This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

    Multi-engine packet classification hardware accelerator

    Get PDF
    As line rates increase, the task of designing high performance architectures with reduced power consumption for the processing of router traffic remains important. In this paper, we present a multi-engine packet classification hardware accelerator, which gives increased performance and reduced power consumption. It follows the basic idea of decision-tree based packet classification algorithms, such as HiCuts and HyperCuts, in which the hyperspace represented by the ruleset is recursively divided into smaller subspaces according to some heuristics. Each classification engine consists of a Trie Traverser which is responsible for finding the leaf node corresponding to the incoming packet, and a Leaf Node Searcher that reports the matching rule in the leaf node. The packet classification engine utilizes the possibility of ultra-wide memory word provided by FPGA block RAM to store the decision tree data structure, in an attempt to reduce the number of memory accesses needed for the classification. Since the clock rate of an individual engine cannot catch up to that of the internal memory, multiple classification engines are used to increase the throughput. The implementations in two different FPGAs show that this architecture can reach a searching speed of 169 million packets per second (mpps) with synthesized ACL, FW and IPC rulesets. Further analysis reveals that compared to state of the art TCAM solutions, a power savings of up to 72% and an increase in throughput of up to 27% can be achieved

    Ternary content addressable memory for longest prefix matching based on random access memory on field programmable gate array

    Get PDF
    Conventional ternary content addressable memory (TCAM) provides access to stored data, which consists of '0', '1' and ‘don't care’, and outputs the matched address. Content lookup in TCAM can be done in a single cycle, which makes it very important in applications such as address lookup and deep-packet inspection. This paper proposes an improved TCAM architecture with fast update functionality. To support longest prefix matching (LPM), LPM logic are needed to the proposed TCAM. The latency of the proposed LPM logic is dependent on the number of matching addresses in address prefix comparison. In order to improve the throughput, parallel LPM logic is added to improve the throughput by 10× compared to the one without. Although with resource overhead, the cost of throughput per bit is less as compared to the one without parallel LPM logic

    High-Performance Packet Processing Engines Using Set-Associative Memory Architectures

    Get PDF
    The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds. In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing. Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing. By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection. In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table. It then directs each packet to its next hop toward its final destination. A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule. Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS). The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks. In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression. In this thesis, we introduce a family of hardware solutions that combine the above requirements. These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory). CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle. However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time. The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions. We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently. We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes

    Models, Algorithms, and Architectures for Scalable Packet Classification

    Get PDF
    The growth and diversiïŹcation of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of trafïŹc in the Internet, are being called upon to not only handle increased volumes of trafïŹc at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to deïŹned trafïŹc ïŹ‚ows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classiïŹcation problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and ïŹlter sets and increasingly complex ïŹlters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest PreïŹx Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classiïŹcation techniques, a thorough analysis of packet classiïŹcation ïŹlter sets, the design and analysis of a suite of performance evaluation tools for packet classiïŹcation algorithms and devices, and a new packet classiïŹcation algorithm that scales to support high-speed links and large ïŹlter sets classifying on additional packet ïŹelds

    A Scalable High-Performance Memory-Less IP Address Lookup Engine Suitable for FPGA Implementation

    Get PDF
    RÉSUMÉ La recherche d'adresse IP est une opĂ©ration trĂšs importante pour les routeurs Internet modernes. De nombreuses approches dans la littĂ©rature ont Ă©tĂ© proposĂ©es pour rĂ©aliser des moteurs de recherche d'adresse IP (Address Lookup Engine – ALE), Ă  haute performance. Les ALE existants peuvent ĂȘtre classĂ©s dans l’une ou l’autre de trois catĂ©gories basĂ©es sur: les mĂ©moires ternaires adressables par le contenu (TCAM), les Trie et les Ă©mulations de TCAM. Les approches qui se basent sur des TCAM sont coĂ»teuses et elles consomment beaucoup d'Ă©nergie. Les techniques qui exploitent les Trie ont une latence non dĂ©terministe qui nĂ©cessitent gĂ©nĂ©ralement des accĂšs Ă  une mĂ©moire externe. Les techniques qui exploitent des Ă©mulations de TCAM combinent gĂ©nĂ©ralement des TCAM avec des circuits Ă  faible coĂ»t. Dans ce mĂ©moire, l'objectif principal est de proposer une architecture d'ALE qui permet la recherche rapide d’adresses IP et qui apporte une solution aux principales lacunes des techniques basĂ©es sur des TCAM et sur des Trie. Atteindre une vitesse de traitement suffisante dans l'ALE est un aspect important. Des accĂ©lĂ©rateurs matĂ©riels ont Ă©tĂ© adoptĂ©s pour obtenir une le rĂ©sultat de recherche Ă  haute vitesse. Le FPGA permettent la mise en Ɠuvre d’accĂ©lĂ©rateurs matĂ©riels reconfigurables spĂ©cialisĂ©s. Cinq architectures d’ALE de type Ă©mulation de TCAM sont proposĂ©s dans ce mĂ©moire : une sĂ©rielle, une parallĂšle, une architecture dite IP-Split, une variante appelĂ©e IP-Split-Bucket et une version de l’IP-Split-Bucket qui supporte les mises Ă  jours. Chaque architecture est construite Ă  partir de l’architecture prĂ©cĂ©dente de maniĂšre progressive dans le but d’en amĂ©liorer les performances. L'architecture sĂ©rielle utilise des mĂ©moires pour stocker la table d’adresses de transmission et un comparateur pour effectuer une recherche sĂ©rielle sur les entrĂ©es. L'architecture parallĂšle stocke les entrĂ©es de la table dans les ressources logiques d’un FPGA, et elle emploie une recherche parallĂšle en utilisant N comparateurs pour une table avec N entrĂ©es. L’architecture IP-Split emploie un niveau de dĂ©codeurs pour Ă©viter des comparaisons rĂ©pĂ©titives dans les entrĂ©es Ă©quivalentes de la table. L'architecture IP-Split-Bucket est une version amĂ©liorĂ©e de l'architecture prĂ©cĂ©dente qui utilise une mĂ©thode de partitionnement visant Ă  optimiser l'architecture IP-Split. L’IP-Split-Bucket qui supporte les mises Ă  jour est la derniĂšre architecture proposĂ©e. Elle soutient la mise Ă  jour et la recherche Ă  haute vitesse d'adresses IP. Les rĂ©sultats d’implĂ©mentations montrent que l'architecture d’ALE qui offre les meilleures performances est l’IP-Split-Bucket, qui n’a pas recours Ă  une ou plusieurs mĂ©moires. Pour une table d’adresses de transmission IPv4 rĂ©elle comportant 524 k prĂ©fixes, l'architecture IP-Split-Bucket atteint un dĂ©bit de 103,4 M paquets par seconde et elle consomme respectivement 23% et 22% des tables de conversion (LUTs) et des bascules (FFs) sur une puce Xilinx XC7V2000T.----------ABSTRACT High-performance IP address lookup is highly demanded for modern Internet routers. Many approaches in the literature describe a special purpose Address Lookup Engines (ALE), for IP address lookup. The existing ALEs can be categorised into the following techniques: Ternary Content Addressable Memories-based (TCAM-based), trie-based and TCAM-emulation. TCAM-based techniques are expensive and consume a lot of power, since they employ TCAMs in their architecture. Trie-based techniques have nondeterministic latency and external memory accesses, since they store the Forwarding Information Base (FIB) in the memory using a trie data structure. TCAM-emulation techniques commonly combine TCAMs with lower-cost circuits that handle less time-critical activities. In this thesis, the main objective is to propose an ALE architecture with fast search that addresses the main shortcomings of TCAM-based and trie-based techniques. Achieving an admissible throughput in the proposed ALE is its fundamental requirement due to the recent improvements of network systems and growth of Internet of Things (IoTs). For that matter, hardware accelerators have been adopted to achieve a high speed search. In this work, Field Programmable Gate Arrays (FPGAs) are specialized reconfigurable hardware accelerators chosen as the target platform for the ALE architecture. Five TCAM-emulation ALE architectures are proposed in this thesis: the Full-Serial, the Full-Parallel, the IP-Split, the IP-Split-Bucket and the Update-enabled IP-Split-Bucket architectures. Each architecture builds on the previous one with progressive improvements. The Full-Serial architecture employs memories to store the FIB and one comparator to perform a serial search on the FIB entries. The Full-Parallel architecture stores the FIB entries into the logical resources of the FPGA and employs a parallel search using one comparator for each FIB entry. The IP-Split architecture employs a level of decoders to avoid repetitive comparisons in the equivalent entries of the FIB. The IP-Split-Bucket architecture is an upgraded version of the previous architecture using a partitioning scheme aiming to optimize the IP-Split architecture. Finally, the Update-enabled IP-Split-Bucket supports high-update rate IP address lookup. The most efficient proposed architecture is the IP-Split-Bucket, which is a novel high-performance memory-less ALE. For a real-world FIB with 524 k IPv4 prefixes, IP-Split-Bucket achieves a throughput of 103.4M packets per second and consumes respectively 23% and 22% of the Look Up Tables (LUTs) and Flip-Flops (FFs) of a Xilinx XC7V2000T chip

    Power Saving Strategies and Technologies in Network Equipment Opportunities and Challenges, Risk and Rewards

    Full text link
    Drawing from todays best-in-class solutions, we identify power-saving strategies that have succeeded in the past and look forward to new ideas and paradigms. We strongly believe that designing energy-efficient network equipment can be compared to building sports cars, task-oriented, focused and fast. However, unlike track-bound sports cars, ultra-fast and purpose-built silicon yields better energy efficiency when compared to more generic family sedan designs that mitigate go-to-market risks by being the masters of many tasks. Thus, we demonstrate that the best opportunities for power savings come via protocol simplification, best-of-breed technology, and silicon and software optimization, to achieve the least amount of processing necessary to move packets. We also look to the future of networking from a new angle, where energy efficiency and environmental concerns are viewed as fundamental design criteria and forces that need to be harnessed to continually create more powerful networking equipment.Comment: IEEE SAINT 2008 proceedings, July 28th - Aug 1st 2008, PCFNS worksho

    On using content addressable memory for packet classiïŹcation

    Get PDF
    Packet switched networks such as the Internet require packet classiïŹcation at every hop in order to ap-ply services and security policies to trafïŹc ïŹ‚ows. The relentless increase in link speeds and trafïŹc volume imposes astringent constraints on packet classiïŹcation solutions. Ternary Content Addressable Memory (TCAM) devices are favored by most network component and equipment vendors due to the fast and de-terministic lookup performance afforded by their use of massive parallelism. While able to keep up with high speed links, TCAMs suffer from exorbitant power consumption, poor scalability to longer search keys and larger ïŹlter sets, and inefïŹcient support of multiple matches. The research community has responded with algorithms that seek to meet the lookup rate constraint with greater efïŹciency through the use of com-modity Random Access Memory (RAM) technology. The most promising algorithms efïŹciently achieve high lookup rates by leveraging the statistical structure of real ïŹlter sets. Due to their dependence on ïŹlter set characteristics, it is difïŹcult to provision processing and memory resources for implementations that support a wide variety of ïŹlter sets. We show how several algorithmic advances may be leveraged to im-prove the efïŹciency, scalability, incremental update and multiple match performance of CAM-based packet classiïŹcation techniques without degrading the lookup performance. Our approach, Label Encoded Content Addressable Memory (LECAM), represents a hybrid technique that utilizes decomposition, label encoding, and a novel Content Addressable Memory (CAM) architecture. By reducing the number of implementation parameters, LECAM provides a vehicle to carry several of the recent algorithmic advances into practice. We provide a thorough overview of CAM technologies and packet classiïŹcation algorithms, along with a detailed discussion of the scaling issues that arise with longer search keys and larger ïŹlter sets. We also provide a comparative analysis of LECAM and standard TCAM using a collection of real and synthetic ïŹlter sets of various sizes and compositions
    • 

    corecore