46 research outputs found

    Data Structures and Algorithms for Scalable NDN Forwarding

    Get PDF
    Named Data Networking (NDN) is a recently proposed general-purpose network architecture that aims to address the limitations of the Internet Protocol (IP), while maintaining its strengths. NDN takes an information-centric approach, focusing on named data rather than computer addresses. In NDN, the content is identified by its name, and each NDN packet has a name that specifies the content it is fetching or delivering. Since there are no source and destination addresses in an NDN packet, it is forwarded based on a lookup of its name in the forwarding plane, which consists of the Forwarding Information Base (FIB), Pending Interest Table (PIT), and Content Store (CS). In addition, as an in-network caching element, a scalable Repository (Repo) design is needed to provide large-scale long-term content storage in NDN networks. Scalable NDN forwarding is a challenge. Compared to the well-understood approaches to IP forwarding, NDN forwarding performs lookups on packet names, which have variable and unbounded lengths, increasing the lookup complexity. The lookup tables are larger than in IP, requiring more memory space. Moreover, NDN forwarding has a read-write data plane, requiring per-packet updates at line rates. Designing and evaluating a scalable NDN forwarding node architecture is a major effort within the overall NDN research agenda. The goal of this dissertation is to demonstrate that scalable NDN forwarding is feasible with the proposed data structures and algorithms. First, we propose a FIB lookup design based on the binary search of hash tables that provides a reliable longest name prefix lookup performance baseline for future NDN research. We have demonstrated 10 Gbps forwarding throughput with 256-byte packets and one billion synthetic forwarding rules, each containing up to seven name components. Second, we explore data structures and algorithms to optimize the FIB design based on the specific characteristics of real-world forwarding datasets. Third, we propose a fingerprint-only PIT design that reduces the memory requirements in the core routers. Lastly, we discuss the Content Store design issues and demonstrate that the NDN Repo implementation can leverage many of the existing databases and storage systems to improve performance

    A null convention logic based platform for high speed low energy IP packet forwarding

    Get PDF
    By 2020, it is predicted that there will be over 5 billion people and 38.5 billion Internet-ofThings devices on the Internet. The data generated by all these users and devices will have to be transported quickly and efficiently. Routers forming the backbone of this Internet already support multiple 100 Gbps ports meaning that they would have to perform upwards of 200 Million destination addresses lookups per second in the packet forwarding block that lies in the router ‘data-path’. At the same time, there is also a huge demand to make the network infrastructure more energy efficient. The work presented in this thesis is motivated by the observation that traditional synchronous digital systems will have increasing difficulty keeping up with these conflicting demands. Further, with reducing device geometries, extremes in “process, voltage and temperature” (PVT) variability will undermine reliable synchronous operation. It is expected that asynchronous design techniques will be able to overcome many of these problems and offer a means of lowering energy while maintaining high throughput and low latency. This thesis investigates existing address lookup algorithms and investigates the possibility of combining various approaches to improve energy efficiency without affecting lookup performance. A quasi delay-insensitive asynchronous methodology - Null Convention Logic (NCL) - is then applied to this combined design. Techniques that take advantage of the characteristics of the design methodology and the lookup algorithm to further improve the area, energy and latency characteristics are also analysed. The IP address lookup scheme utilised here is a recent algorithmic approach that uses compact binary-tries and was selected for its high memory efficiency and throughput. The design is pipelined, and the prefix information is stored in large RAMs. A Boolean synchronous implementation of the algorithm is simulated to provide an initial performance benchmark. It is observed that during the address lookup process nearly 68% of the trie accesses are to nodes that contained no prefix information. Bloom filter structures that use non-cryptographic hashes and single-bit memory are introduced into the address lookup process to prevent these unnecessary accesses, thereby reducing the energy consumption. Three non-cryptographic hashing algorithms (CRC32, Jenkins and Murmur) are also analysed for their suitability in Bloom filters, and the CRC32 is found to offer the most suitable trade-off between complexity and performance. As a first step to applying the NCL design methodology, NCL implementations of the hashing algorithms are created and evaluated. A significant finding from these experiments is that, unlike Boolean systems, latency and throughput in NCL systems are only loosely coupled. An example Jenkins hash implementation with eight pipeline stages and a cycle time of 3.2 ns exhibits a total latency of 6 ns, whereas an equivalent synchronous implementation with a similar clock period exhibits a latency of 25.6 ns. Further investigations reveal that completion detection circuits within the NCL pipelines impair throughput significantly. Two enhancements to the NCL circuit library aimed particularly at optimising NCL completion detection are proposed and analysed. These are shown to enable completion detection circuits to be built with the same delay but with 30% smaller area and about 75% lower peak current compared to the conventional approach using gates from the standard NCL library. An NCL SRAM structure is also proposed to augment the conventional 6-T cell array with circuits to generate the handshaking signals for managing the NCL data flow. Additionally, a dedicated column of cells called the Null-storage column is added, which indicates if a particular address in the RAM stores no Data, i.e., it is in its Null state. This additional hardware imposes a small area overhead of about 10% but allows accesses to Null locations to be completed in 50% less time and consume 40% less energy than accesses to valid Data locations. An experimental NCL-based address lookup system is then designed that includes all of the developed NCL modules. Statistical delay models derived from circuit-level simulations of individual modules are used to emulate realistic circuit delay variability in the behavioural modules written in Verilog. Simulations of the assembled system demonstrate that unlike what was observed with the synchronous design, with NCL, the design that does not employ Bloom filters, but only the Null-storage column RAMs for prefix storage, exhibits the smallest area on the chip and also consumes the least energy per address lookup. It is concluded that to derive maximum benefit out of an asynchronous design approach; it is necessary to carefully select the architectural blocks that combine the peculiarities of the implemented algorithm with the capabilities of the NCL design methodology

    Design and Evaluation of Packet Classification Systems, Doctoral Dissertation, December 2006

    Get PDF
    Although many algorithms and architectures have been proposed, the design of efficient packet classification systems remains a challenging problem. The diversity of filter specifications, the scale of filter sets, and the throughput requirements of high speed networks all contribute to the difficulty. We need to review the algorithms from a high-level point-of-view in order to advance the study. This level of understanding can lead to significant performance improvements. In this dissertation, we evaluate several existing algorithms and present several new algorithms as well. The previous evaluation results for existing algorithms are not convincing because they have not been done in a consistent way. To resolve this issue, an objective evaluation platform needs to be developed. We implement and evaluate several representative algorithms with uniform criteria. The source code and the evaluation results are both published on a web-site to provide the research community a benchmark for impartial and thorough algorithm evaluations. We propose several new algorithms to deal with the different variations of the packet classification problem. They are: (1) the Shape Shifting Trie algorithm for longest prefix matching, used in IP lookups or as a building block for general packet classification algorithms; (2) the Fast Hash Table lookup algorithm used for exact flow match; (3) the longest prefix matching algorithm using hash tables and tries, used in IP lookups or packet classification algorithms;(4) the 2D coarse-grained tuple-space search algorithm with controlled filter expansion, used for two-dimensional packet classification or as a building block for general packet classification algorithms; (5) the Adaptive Binary Cutting algorithm used for general multi-dimensional packet classification. In addition to the algorithmic solutions, we also consider the TCAM hardware solution. In particular, we address the TCAM filter update problem for general packet classification and provide an efficient algorithm. Building upon the previous work, these algorithms significantly improve the performance of packet classification systems and set a solid foundation for further study

    Algorithms and Architectures for Network Search Processors

    Get PDF
    The continuous growth in the Internet’s size, the amount of data traffic, and the complexity of processing this traffic gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predefined keys. Address lookup, packet classification, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource efficient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can offer very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-effective, power-efficient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom filters. A Bloom filter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the off-chip memory and their Bloom filter representations are kept in the on-chip memory. An item needs to be looked up in the off-chip table only when it is found in the on-chip Bloom filters. By filtering the off-chip memory accesses in this fashion, the search operations can be significantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-effectiveness, speed, and power-efficiency

    Mémoires associatives algorithmiques pou l'opération de recherche du plus long préfixe sur FPGA

    Get PDF
    RÉSUMÉ Les réseaux prédiffusés programmables — en anglais Field Programmable Gate Arrays (FPGAs)— sont omniprésents dans les centres de données, pour accélérer des tâches d’indexations et d’apprentissage machine, mais aussi plus récemment, pour accélérer des opérations réseaux. Dans cette thèse, nous nous intéressons à l’opération de recherche du plus long préfixe en anglais Longest Prefix Match (LPM) — sur FPGA. Cette opération est utilisée soit pour router des paquets, soit comme un bloc de base dans un plan de données programmable. Bien que l’opération LPM soit primordiale dans un réseau, celle-ci souffre d’inefficacité sur FPGA. Dans cette thèse, nous démontrons que la performance de l’opération LPM sur FPGA peut être substantiellement améliorée en utilisant une approche algorithmique, où l’opération LPM est implémentée à l’aide d’une structure de données. Par ailleurs, les résultats présentés permettent de réfléchir à une question plus large : est-ce que l’architecture des FPGA devrait être spécialisée pour les applications réseaux ? Premièrement, pour l’application de routage IPv6 dans le réseau Internet, nous présentons SHIP. Cette solution exploite les caractéristiques des préfixes pour construire une structure de données compacte, pouvant être implémentée de manière efficace sur FPGA. SHIP utilise l’approche ńdiviser pour régnerż pour séparer les préfixes en groupes de faible cardinalité et ayant des caractéristiques similaires. Les préfixes contenus dans chaque groupe sont en-suite encodés dans une structure de données hybride, où l’encodage des préfixes est adapté suivant leurs caractéristiques. Sur FPGA, SHIP augmente l’efficacité de l’opération LPM comparativement à l’état de l’art, tout en supportant un débit supérieur à 100 Gb/s. Deuxièment, nous présentons comment implémenter efficacement l’opération LPM pour un plan de données programmable sur FPGA. Dans ce cas, contrairement au routage de pa-quets, aucune connaissance à priori des préfixes ne peut être utilisée. Par conséquent, nous présentons un cadre de travail comprenant une structure de données efficace, indépendam-ment des caractéristiques des préfixes contenus, et des méthodes permettant d’implémenter efficacement la structure de données sur FPGA. Un arbre B, étendu pour l’opération LPM, est utilisé en raison de sa faible complexité algorithmique. Nous présentons une méthode pour allouer à la compilation le minimum de ressources requis par l’abre B pour encoder un ensemble de préfixes, indépendamment de leurs caractéristiques. Plusieurs méthodes sont ensuite présentées pour augmenter l’efficacité mémoire après implémentation de la structure de données sur FPGA. Évaluée sur plusieurs scénarios, cette solution est capable de traiter plus de 100 Gb/s, tout en améliorant la performance par rapport à l’état de l’art.----------ABSTRACT FPGAs are becoming ubiquitous in data centers. First introduced to accelerate indexing services and machine learning tasks, FPGAs are now also used to accelerate networking operations, including the LPM operation. This operation is used for packet routing and as a building block in programmable data planes. However, for the two uses cases considered, the LPM operation is inefficiently implemented in FPGAs. In this thesis, we demonstrate that the performance of LPM operation can be significantly improved using an algorithmic approach, where the LPM operation is implemented using a data structure. In addition, using the results presented in this thesis, we can answer a broader question: Should the FPGA architecture be specialized for networking? First, we present the SHIP data structure that is tailored to routing IPv6 packets in the Internet network. SHIP exploits the prefix characteristics to build a compact data structure that can be efficiently mapped to FPGAs. First, SHIP uses a "divide and conquer" approach to bin prefixes in groups with a small cardinality and sharing similar characteristics. Second, a hybrid-trie-tree data structure is used to encode the prefixes held in each group. The hybrid data structure adapts the prefix encoding method to their characteristics. Then, we demonstrated that SHIP can be efficiently implemented in FPGAs. Implemented on FPGAs, the proposed solution improves the memory efficiency over the state of the art solutions, while supporting a packet throughput greater than 100 Gbps.While the prefixes and their characteristics are known when routing packets in the Internet network, this is not true for programmable data planes. Hence, the second solution, designed for programmable data planes, does not exploit any prior knowledge of the prefix stored. We present a framework comprising an efficient data structure to encode the prefixes and methods to map the data structure efficiently to FPGAs. First, the framework leverages a B-tree, extended to support the LPM operation, for its low algorithmic complexity. Second, we present a method to allocate at compile time the minimum amount of resources that can be used by the B-tree. Third, our framework selects the B-tree parameters to increase the post-implementation memory efficiency and generates the corresponding hardware architecture. Implemented on FPGAs, this solution supports packet throughput greater than 100 Gbps, while improving the performance over the state of the art

    Reconfigurable Data Planes for Scalable Network Virtualization

    Get PDF
    Abstract—Network virtualization presents a powerful approach to share physical network infrastructure among multiple virtual networks. Recent advances in network virtualization advocate the use of field-programmable gate arrays (FPGAs) as flexible high performance alternatives to conventional host virtualization techniques. However, the limited on-chip logic and memory resources in FPGAs severely restrict the scalability of the virtualization platform and necessitate the implementation of efficient forwarding structures in hardware. The research described in this manuscript explores the implementation of a scalable heterogeneous network virtualization platform which integrates virtual data planes implemented in FPGAs with software data planes created using host virtualization techniques. The system exploits data plane heterogeneity to cater to the dynamic service requirements of virtual networks by migrating networks between software and hardware data planes. We demonstrate data plane migration as an effective technique to limit the impact of traffic on unmodified data planes during FPGA reconfiguration. Our system implements forwarding tables in a shared fashion using inexpensive off-chip memories and supports both Internet Protocol (IP) and non-IP based data planes. Experimental results show that FPGA-based data planes can offer two orders of magnitude better throughput than their software counterparts and FPGA reconfiguration can facilitate data plane customization within 12 seconds. An integrated system that supports up to 15 virtual networks has been validated on the NetFPGA platform

    Models, Algorithms, and Architectures for Scalable Packet Classification

    Get PDF
    The growth and diversification of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of traffic in the Internet, are being called upon to not only handle increased volumes of traffic at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to defined traffic flows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classification problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and filter sets and increasingly complex filters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Prefix Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classification techniques, a thorough analysis of packet classification filter sets, the design and analysis of a suite of performance evaluation tools for packet classification algorithms and devices, and a new packet classification algorithm that scales to support high-speed links and large filter sets classifying on additional packet fields

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors