Search CORE

66 research outputs found

Energy Efficient Hardware Accelerators for Packet Classification and String Matching

Author: Kennedy Alan
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 21/09/2010
Field of study

This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

Irish Universities

DCU Online Research Access Service

Multi-engine packet classification hardware accelerator

Author: Kennedy Alan
Liu Bin
Liu Zhen
Wang Xiaojun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

As line rates increase, the task of designing high performance architectures with reduced power consumption for the processing of router traffic remains important. In this paper, we present a multi-engine packet classification hardware accelerator, which gives increased performance and reduced power consumption. It follows the basic idea of decision-tree based packet classification algorithms, such as HiCuts and HyperCuts, in which the hyperspace represented by the ruleset is recursively divided into smaller subspaces according to some heuristics. Each classification engine consists of a Trie Traverser which is responsible for finding the leaf node corresponding to the incoming packet, and a Leaf Node Searcher that reports the matching rule in the leaf node. The packet classification engine utilizes the possibility of ultra-wide memory word provided by FPGA block RAM to store the decision tree data structure, in an attempt to reduce the number of memory accesses needed for the classification. Since the clock rate of an individual engine cannot catch up to that of the internal memory, multiple classification engines are used to increase the throughput. The implementations in two different FPGAs show that this architecture can reach a searching speed of 169 million packets per second (mpps) with synthesized ACL, FW and IPC rulesets. Further analysis reveals that compared to state of the art TCAM solutions, a power savings of up to 72% and an increase in throughput of up to 27% can be achieved

Crossref

Irish Universities

DCU Online Research Access Service

Ternary content addressable memory for longest prefix matching based on random access memory on field programmable gate array

Author: Kay Ng Shao
Marsono M. N.
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/08/2019
Field of study

Conventional ternary content addressable memory (TCAM) provides access to stored data, which consists of '0', '1' and ‘don't care’, and outputs the matched address. Content lookup in TCAM can be done in a single cycle, which makes it very important in applications such as address lookup and deep-packet inspection. This paper proposes an improved TCAM architecture with fast update functionality. To support longest prefix matching (LPM), LPM logic are needed to the proposed TCAM. The latency of the proposed LPM logic is dependent on the number of matching addresses in address prefix comparison. In order to improve the throughput, parallel LPM logic is added to improve the throughput by 10× compared to the one without. Although with resource overhead, the cost of throughput per bit is less as compared to the one without parallel LPM logic

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

High-Performance Packet Processing Engines Using Set-Associative Memory Architectures

Author: Hanna Michel
Publication venue
Publication date: 28/09/2013
Field of study

The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds. In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing. Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing. By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection. In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table. It then directs each packet to its next hop toward its final destination. A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule. Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS). The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks. In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression. In this thesis, we introduce a family of hardware solutions that combine the above requirements. These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory). CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle. However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time. The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions. We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently. We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes

D-Scholarship@Pitt

Models, Algorithms, and Architectures for Scalable Packet Classification

Author: Taylor David Edward
Turner Jonathan S.
Publication venue: Washington University Open Scholarship
Publication date: 28/07/2004
Field of study

The growth and diversiﬁcation of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of trafﬁc in the Internet, are being called upon to not only handle increased volumes of trafﬁc at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to deﬁned trafﬁc ﬂows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classiﬁcation problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and ﬁlter sets and increasingly complex ﬁlters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Preﬁx Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classiﬁcation techniques, a thorough analysis of packet classiﬁcation ﬁlter sets, the design and analysis of a suite of performance evaluation tools for packet classiﬁcation algorithms and devices, and a new packet classiﬁcation algorithm that scales to support high-speed links and large ﬁlter sets classifying on additional packet ﬁelds

Washington University St. Louis: Open Scholarship

A Scalable High-Performance Memory-Less IP Address Lookup Engine Suitable for FPGA Implementation

Author: Sarbishei Ideh
Publication venue
Publication date: 01/11/2016
Field of study

RÉSUMÉ La recherche d'adresse IP est une opération très importante pour les routeurs Internet modernes. De nombreuses approches dans la littérature ont été proposées pour réaliser des moteurs de recherche d'adresse IP (Address Lookup Engine – ALE), à haute performance. Les ALE existants peuvent être classés dans l’une ou l’autre de trois catégories basées sur: les mémoires ternaires adressables par le contenu (TCAM), les Trie et les émulations de TCAM. Les approches qui se basent sur des TCAM sont coûteuses et elles consomment beaucoup d'énergie. Les techniques qui exploitent les Trie ont une latence non déterministe qui nécessitent généralement des accès à une mémoire externe. Les techniques qui exploitent des émulations de TCAM combinent généralement des TCAM avec des circuits à faible coût. Dans ce mémoire, l'objectif principal est de proposer une architecture d'ALE qui permet la recherche rapide d’adresses IP et qui apporte une solution aux principales lacunes des techniques basées sur des TCAM et sur des Trie. Atteindre une vitesse de traitement suffisante dans l'ALE est un aspect important. Des accélérateurs matériels ont été adoptés pour obtenir une le résultat de recherche à haute vitesse. Le FPGA permettent la mise en œuvre d’accélérateurs matériels reconfigurables spécialisés. Cinq architectures d’ALE de type émulation de TCAM sont proposés dans ce mémoire : une sérielle, une parallèle, une architecture dite IP-Split, une variante appelée IP-Split-Bucket et une version de l’IP-Split-Bucket qui supporte les mises à jours. Chaque architecture est construite à partir de l’architecture précédente de manière progressive dans le but d’en améliorer les performances. L'architecture sérielle utilise des mémoires pour stocker la table d’adresses de transmission et un comparateur pour effectuer une recherche sérielle sur les entrées. L'architecture parallèle stocke les entrées de la table dans les ressources logiques d’un FPGA, et elle emploie une recherche parallèle en utilisant N comparateurs pour une table avec N entrées. L’architecture IP-Split emploie un niveau de décodeurs pour éviter des comparaisons répétitives dans les entrées équivalentes de la table. L'architecture IP-Split-Bucket est une version améliorée de l'architecture précédente qui utilise une méthode de partitionnement visant à optimiser l'architecture IP-Split. L’IP-Split-Bucket qui supporte les mises à jour est la dernière architecture proposée. Elle soutient la mise à jour et la recherche à haute vitesse d'adresses IP. Les résultats d’implémentations montrent que l'architecture d’ALE qui offre les meilleures performances est l’IP-Split-Bucket, qui n’a pas recours à une ou plusieurs mémoires. Pour une table d’adresses de transmission IPv4 réelle comportant 524 k préfixes, l'architecture IP-Split-Bucket atteint un débit de 103,4 M paquets par seconde et elle consomme respectivement 23% et 22% des tables de conversion (LUTs) et des bascules (FFs) sur une puce Xilinx XC7V2000T.----------ABSTRACT High-performance IP address lookup is highly demanded for modern Internet routers. Many approaches in the literature describe a special purpose Address Lookup Engines (ALE), for IP address lookup. The existing ALEs can be categorised into the following techniques: Ternary Content Addressable Memories-based (TCAM-based), trie-based and TCAM-emulation. TCAM-based techniques are expensive and consume a lot of power, since they employ TCAMs in their architecture. Trie-based techniques have nondeterministic latency and external memory accesses, since they store the Forwarding Information Base (FIB) in the memory using a trie data structure. TCAM-emulation techniques commonly combine TCAMs with lower-cost circuits that handle less time-critical activities. In this thesis, the main objective is to propose an ALE architecture with fast search that addresses the main shortcomings of TCAM-based and trie-based techniques. Achieving an admissible throughput in the proposed ALE is its fundamental requirement due to the recent improvements of network systems and growth of Internet of Things (IoTs). For that matter, hardware accelerators have been adopted to achieve a high speed search. In this work, Field Programmable Gate Arrays (FPGAs) are specialized reconfigurable hardware accelerators chosen as the target platform for the ALE architecture. Five TCAM-emulation ALE architectures are proposed in this thesis: the Full-Serial, the Full-Parallel, the IP-Split, the IP-Split-Bucket and the Update-enabled IP-Split-Bucket architectures. Each architecture builds on the previous one with progressive improvements. The Full-Serial architecture employs memories to store the FIB and one comparator to perform a serial search on the FIB entries. The Full-Parallel architecture stores the FIB entries into the logical resources of the FPGA and employs a parallel search using one comparator for each FIB entry. The IP-Split architecture employs a level of decoders to avoid repetitive comparisons in the equivalent entries of the FIB. The IP-Split-Bucket architecture is an upgraded version of the previous architecture using a partitioning scheme aiming to optimize the IP-Split architecture. Finally, the Update-enabled IP-Split-Bucket supports high-update rate IP address lookup. The most efficient proposed architecture is the IP-Split-Bucket, which is a novel high-performance memory-less ALE. For a real-world FIB with 524 k IPv4 prefixes, IP-Split-Bucket achieves a throughput of 103.4M packets per second and consumes respectively 23% and 22% of the Look Up Tables (LUTs) and Flip-Flops (FFs) of a Xilinx XC7V2000T chip

PolyPublie

Power Saving Strategies and Technologies in Network Equipment Opportunities and Challenges, Risk and Rewards

Author: Ceuppens Luc
Kharitonov Daniel
Sardella Alan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/06/2009
Field of study

Drawing from todays best-in-class solutions, we identify power-saving strategies that have succeeded in the past and look forward to new ideas and paradigms. We strongly believe that designing energy-efficient network equipment can be compared to building sports cars, task-oriented, focused and fast. However, unlike track-bound sports cars, ultra-fast and purpose-built silicon yields better energy efficiency when compared to more generic family sedan designs that mitigate go-to-market risks by being the masters of many tasks. Thus, we demonstrate that the best opportunities for power savings come via protocol simplification, best-of-breed technology, and silicon and software optimization, to achieve the least amount of processing necessary to move packets. We also look to the future of networking from a new angle, where energy efficiency and environmental concerns are viewed as fundamental design criteria and forces that need to be harnessed to continually create more powerful networking equipment.Comment: IEEE SAINT 2008 proceedings, July 28th - Aug 1st 2008, PCFNS worksho

arXiv.org e-Print Archive

Crossref

On using content addressable memory for packet classiﬁcation

Author: Spitznagel Edward W.
Taylor David E.
Publication venue: Washington University Open Scholarship
Publication date: 03/03/2005
Field of study

Packet switched networks such as the Internet require packet classiﬁcation at every hop in order to ap-ply services and security policies to trafﬁc ﬂows. The relentless increase in link speeds and trafﬁc volume imposes astringent constraints on packet classiﬁcation solutions. Ternary Content Addressable Memory (TCAM) devices are favored by most network component and equipment vendors due to the fast and de-terministic lookup performance afforded by their use of massive parallelism. While able to keep up with high speed links, TCAMs suffer from exorbitant power consumption, poor scalability to longer search keys and larger ﬁlter sets, and inefﬁcient support of multiple matches. The research community has responded with algorithms that seek to meet the lookup rate constraint with greater efﬁciency through the use of com-modity Random Access Memory (RAM) technology. The most promising algorithms efﬁciently achieve high lookup rates by leveraging the statistical structure of real ﬁlter sets. Due to their dependence on ﬁlter set characteristics, it is difﬁcult to provision processing and memory resources for implementations that support a wide variety of ﬁlter sets. We show how several algorithmic advances may be leveraged to im-prove the efﬁciency, scalability, incremental update and multiple match performance of CAM-based packet classiﬁcation techniques without degrading the lookup performance. Our approach, Label Encoded Content Addressable Memory (LECAM), represents a hybrid technique that utilizes decomposition, label encoding, and a novel Content Addressable Memory (CAM) architecture. By reducing the number of implementation parameters, LECAM provides a vehicle to carry several of the recent algorithmic advances into practice. We provide a thorough overview of CAM technologies and packet classiﬁcation algorithms, along with a detailed discussion of the scaling issues that arise with longer search keys and larger ﬁlter sets. We also provide a comparative analysis of LECAM and standard TCAM using a collection of real and synthetic ﬁlter sets of various sizes and compositions

Washington University St. Louis: Open Scholarship