79 research outputs found

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF

    High performance modified bit-vector based packet classification module on low-cost FPGA

    Get PDF
    The packet classification plays a significant role in many network systems, which requires the incoming packets to be categorized into different flows and must take specific actions as per functional and application requirements. The network system speed is continuously increasing, so the demand for the packet classifier also increased. Also, the packet classifier's complexity is increased further due to multiple fields should match against a large number of rules. In this manuscript, an efficient and high performance modified bitvector (MBV) based packet classification (PC) is designed and implemented on low-cost Artix-7 FPGA. The proposed MBV based PC employs pipelined architecture, which offers low latency and high throughput for PC. The MBV based PC utilizes <2% slices, operating at 493.102 MHz, and consumes 0.1 W total power on Artix-7 FPGA. The proposed PC considers only 4 clock cycles to classify the incoming packets and provides 74.95 Gbps throughput. The comparative results in terms of hardware utilization and performance efficiency of proposed work with existing similar PC approaches are analyzed with better constraints improvement

    VR-ZYCAP: A versatile resourse-level ICAP controller for ZYNQ SOC

    Get PDF
    This article belongs to the Special Issue Architecture and CAD for Field-Programmable Gate Arrays (FPGAs)Hybrid architectures integrating a processor with an SRAM-based FPGA fabric—for example, Xilinx ZynQ SoC—are increasingly being used as a single-chip solution in several market segments to replace multi-chip designs. These devices not only provide advantages in terms of logic density, cost and integration, but also provide run-time in-field reconfiguration capabilities. However, the current reconfiguration capabilities provided by vendor tools are limited to the module level. Therefore, incremental run-time configuration memory changes require a lengthy compilation time for off-line bitstream generation along with storage and reconfiguration time overheads with traditional vendor methodologies. In this paper, an internal configuration access port (ICAP) controller that provides a versatile fine-grain resource-level incremental reconfiguration of the programmable logic (PL) resources in ZynQ SoC is presented. The proposed controller implemented in PL, called VR-ZyCAP, can reconfigure look-up tables (LUTs) and Flip-Flops (FF). The run-time reconfiguration of FF is achieved through a reset after reconfiguration (RAR)-featured partial bitstream to avoid the unintended state corruption of other memory elements. Along with versatility, our proposed controller improves the reconfiguration time by 30 times for FFs compared to state-of-the-art works while achieving a nearly 400-fold increase in speed for LUTs when compared to vendor-supported software approaches. In addition, it achieves competitive resource utilization when compared to existing approaches.This research was funded by Spanish Ministry of Science and Innovation under the ACHILLES project, grant number PID2019-104207RB-I00 and by Taif University Researchers Supporting fund, grant number (TURSP-2020/144), Taif University, Taif, Saudi Arabia

    A Scalable High-Performance Memory-Less IP Address Lookup Engine Suitable for FPGA Implementation

    Get PDF
    RÉSUMÉ La recherche d'adresse IP est une opération très importante pour les routeurs Internet modernes. De nombreuses approches dans la littérature ont été proposées pour réaliser des moteurs de recherche d'adresse IP (Address Lookup Engine – ALE), à haute performance. Les ALE existants peuvent être classés dans l’une ou l’autre de trois catégories basées sur: les mémoires ternaires adressables par le contenu (TCAM), les Trie et les émulations de TCAM. Les approches qui se basent sur des TCAM sont coûteuses et elles consomment beaucoup d'énergie. Les techniques qui exploitent les Trie ont une latence non déterministe qui nécessitent généralement des accès à une mémoire externe. Les techniques qui exploitent des émulations de TCAM combinent généralement des TCAM avec des circuits à faible coût. Dans ce mémoire, l'objectif principal est de proposer une architecture d'ALE qui permet la recherche rapide d’adresses IP et qui apporte une solution aux principales lacunes des techniques basées sur des TCAM et sur des Trie. Atteindre une vitesse de traitement suffisante dans l'ALE est un aspect important. Des accélérateurs matériels ont été adoptés pour obtenir une le résultat de recherche à haute vitesse. Le FPGA permettent la mise en œuvre d’accélérateurs matériels reconfigurables spécialisés. Cinq architectures d’ALE de type émulation de TCAM sont proposés dans ce mémoire : une sérielle, une parallèle, une architecture dite IP-Split, une variante appelée IP-Split-Bucket et une version de l’IP-Split-Bucket qui supporte les mises à jours. Chaque architecture est construite à partir de l’architecture précédente de manière progressive dans le but d’en améliorer les performances. L'architecture sérielle utilise des mémoires pour stocker la table d’adresses de transmission et un comparateur pour effectuer une recherche sérielle sur les entrées. L'architecture parallèle stocke les entrées de la table dans les ressources logiques d’un FPGA, et elle emploie une recherche parallèle en utilisant N comparateurs pour une table avec N entrées. L’architecture IP-Split emploie un niveau de décodeurs pour éviter des comparaisons répétitives dans les entrées équivalentes de la table. L'architecture IP-Split-Bucket est une version améliorée de l'architecture précédente qui utilise une méthode de partitionnement visant à optimiser l'architecture IP-Split. L’IP-Split-Bucket qui supporte les mises à jour est la dernière architecture proposée. Elle soutient la mise à jour et la recherche à haute vitesse d'adresses IP. Les résultats d’implémentations montrent que l'architecture d’ALE qui offre les meilleures performances est l’IP-Split-Bucket, qui n’a pas recours à une ou plusieurs mémoires. Pour une table d’adresses de transmission IPv4 réelle comportant 524 k préfixes, l'architecture IP-Split-Bucket atteint un débit de 103,4 M paquets par seconde et elle consomme respectivement 23% et 22% des tables de conversion (LUTs) et des bascules (FFs) sur une puce Xilinx XC7V2000T.----------ABSTRACT High-performance IP address lookup is highly demanded for modern Internet routers. Many approaches in the literature describe a special purpose Address Lookup Engines (ALE), for IP address lookup. The existing ALEs can be categorised into the following techniques: Ternary Content Addressable Memories-based (TCAM-based), trie-based and TCAM-emulation. TCAM-based techniques are expensive and consume a lot of power, since they employ TCAMs in their architecture. Trie-based techniques have nondeterministic latency and external memory accesses, since they store the Forwarding Information Base (FIB) in the memory using a trie data structure. TCAM-emulation techniques commonly combine TCAMs with lower-cost circuits that handle less time-critical activities. In this thesis, the main objective is to propose an ALE architecture with fast search that addresses the main shortcomings of TCAM-based and trie-based techniques. Achieving an admissible throughput in the proposed ALE is its fundamental requirement due to the recent improvements of network systems and growth of Internet of Things (IoTs). For that matter, hardware accelerators have been adopted to achieve a high speed search. In this work, Field Programmable Gate Arrays (FPGAs) are specialized reconfigurable hardware accelerators chosen as the target platform for the ALE architecture. Five TCAM-emulation ALE architectures are proposed in this thesis: the Full-Serial, the Full-Parallel, the IP-Split, the IP-Split-Bucket and the Update-enabled IP-Split-Bucket architectures. Each architecture builds on the previous one with progressive improvements. The Full-Serial architecture employs memories to store the FIB and one comparator to perform a serial search on the FIB entries. The Full-Parallel architecture stores the FIB entries into the logical resources of the FPGA and employs a parallel search using one comparator for each FIB entry. The IP-Split architecture employs a level of decoders to avoid repetitive comparisons in the equivalent entries of the FIB. The IP-Split-Bucket architecture is an upgraded version of the previous architecture using a partitioning scheme aiming to optimize the IP-Split architecture. Finally, the Update-enabled IP-Split-Bucket supports high-update rate IP address lookup. The most efficient proposed architecture is the IP-Split-Bucket, which is a novel high-performance memory-less ALE. For a real-world FIB with 524 k IPv4 prefixes, IP-Split-Bucket achieves a throughput of 103.4M packets per second and consumes respectively 23% and 22% of the Look Up Tables (LUTs) and Flip-Flops (FFs) of a Xilinx XC7V2000T chip
    • …
    corecore