189 research outputs found

    ADVANCED HASHING SCHEMES FOR PACKETFORWARDING USING SET ASSOCIATIVEMEMORY ARCHITECTURES

    Get PDF
    Building a high performance IP packet forwarding (PF) engine remains a challenge due to increasingly stringent throughput requirements and the growing sizes of IP forwarding tables.The router has to match the incoming packet's IP address against the forwarding table.The matching process has to be done in wire speed which is why scalability and low power consumption are features that PF engines must maintain.It is common for PF engines to use hash tables; however, the classic hashing downsides have to be dealt with (e.g., collisions, worst case memory access time, ... etc.).While open addressing hash tables, in general, provide good average case search performance, their memory utilization and worst case performance can degrade quickly due to collisions that leads to bucket overflows.Set associative memory can be used for hardware implementations of hash tables with the property that each bucket of a hash table can be searched in one memory cycle.Hence, PF engine architectures based on associative memory will outperform those based on the conventional Ternary Content Addressable Memory (TCAM) in terms of power and scalability.The two standard solutions to the overflow problem are either to use some sort of predefined probing (e.g., linear or quadratic) or to use multiple hash functions.This work presents two new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently.The first scheme is a hash probing scheme that is called Content-based HAsh Probing, or CHAP.CHAP is a probing scheme that is based on the content of the hash table to avoid the classical side effects of predefined hash probing methods (i.e., primary and secondary clustering phenomena) and at the same time reduces the overflow.The second scheme, called Progressive Hashing, or PH, is a general multiple hash scheme that reduces the overflow as well.PH splits the prefixes into groups where each group is assigned one hash function, then reuse some hash functions in a progressive fashion to reduce the overflow.We show by experimenting with real IP lookup tables that both schemes outperform other hashing schemes

    High-Performance Packet Processing Engines Using Set-Associative Memory Architectures

    Get PDF
    The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds. In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing. Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing. By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection. In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table. It then directs each packet to its next hop toward its final destination. A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule. Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS). The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks. In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression. In this thesis, we introduce a family of hardware solutions that combine the above requirements. These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory). CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle. However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time. The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions. We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently. We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes

    CHAP : Enabling efficient hardware-based multiple hash schemes for IP lookup

    Get PDF
    Building a high performance IP lookup engine remains a challenge due to increasingly stringent throughput requirements and the growing size of IP tables. An emerging approach for IP lookup is the use of set associative memory architecture, which is basically a hardware implementation of an open addressing hash table with the property that each row of the hash table can be searched in one memory cycle. While open addressing hash tables, in general, provide good average-case search performance, their memory utilization and worst-case performance can degrade quickly due to bucket overflows. This paper presents a new simple hash probing scheme called CHAP (Content-based HAsh Probing) that tackles the hash overflow problem. In CHAP, the probing is based on the content of the hash table, thus avoiding the classical side effects of probing. We show through experimenting with real IP tables how CHAP can effectively deal with the overflow. © IFIP International Federation for Information Processing 2009

    Branch Prediction For Network Processors

    Get PDF
    Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine. Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures

    Shared memory with hidden latency on a family of mesh-like networks

    Get PDF

    Algorithms and Architectures for Network Search Processors

    Get PDF
    The continuous growth in the Internet’s size, the amount of data traffic, and the complexity of processing this traffic gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predefined keys. Address lookup, packet classification, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource efficient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can offer very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-effective, power-efficient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom filters. A Bloom filter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the off-chip memory and their Bloom filter representations are kept in the on-chip memory. An item needs to be looked up in the off-chip table only when it is found in the on-chip Bloom filters. By filtering the off-chip memory accesses in this fashion, the search operations can be significantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-effectiveness, speed, and power-efficiency

    Software-Driven and Virtualized Architectures for Scalable 5G Networks

    Full text link
    In this dissertation, we argue that it is essential to rearchitect 4G cellular core networks–sitting between the Internet and the radio access network–to meet the scalability, performance, and flexibility requirements of 5G networks. Today, there is a growing consensus among operators and research community that software-defined networking (SDN), network function virtualization (NFV), and mobile edge computing (MEC) paradigms will be the key ingredients of the next-generation cellular networks. Motivated by these trends, we design and optimize three core network architectures, SoftMoW, SoftBox, and SkyCore, for different network scales, objectives, and conditions. SoftMoW provides global control over nationwide core networks with the ultimate goal of enabling new routing and mobility optimizations. SoftBox attempts to enhance policy enforcement in statewide core networks to enable low-latency, signaling-efficient, and customized services for mobile devices. Sky- Core is aimed at realizing a compact core network for citywide UAV-based radio networks that are going to serve first responders in the future. Network slicing techniques make it possible to deploy these solutions on the same infrastructure in parallel. To better support mobility and provide verifiable security, these architectures can use an addressing scheme that separates network locations and identities with self-certifying, flat and non-aggregatable address components. To benefit the proposed architectures, we designed a high-speed and memory-efficient router, called Caesar, for this type of addressing schemePHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146130/1/moradi_1.pd

    Fully Programming the Data Plane: A Hardware/Software Approach

    Get PDF
    Les réseaux définis par logiciel — en anglais Software-Defined Networking (SDN) — sont apparus ces dernières années comme un nouveau paradigme de réseau. SDN introduit une séparation entre les plans de gestion, de contrôle et de données, permettant à ceux-ci d’évoluer de manière indépendante, rompant ainsi avec la rigidité des réseaux traditionnels. En particulier, dans le plan de données, les avancées récentes ont porté sur la définition des langages de traitement de paquets, tel que P4, et sur la définition d’architectures de commutateurs programmables, par exemple la Protocol Independent Switch Architecture (PISA). Dans cette thèse, nous nous intéressons a l’architecture PISA et évaluons comment exploiter les FPGA comme plateforme de traitement efficace de paquets. Cette problématique est étudiée a trois niveaux d’abstraction : microarchitectural, programmation et architectural. Au niveau microarchitectural, nous avons proposé une architecture efficace d’un analyseur d’entêtes de paquets pour PISA. L’analyseur de paquets utilise une architecture pipelinée avec propagation en avant — en anglais feed-forward. La complexité de l’architecture est réduite par rapport à l’état de l’art grâce a l’utilisation d’optimisations algorithmiques. Finalement, l’architecture est générée par un compilateur P4 vers C++, combiné à un outil de synthèse de haut niveau. La solution proposée atteint un débit de 100 Gb/s avec une latence comparable à celle d’analyseurs d’entêtes de paquets écrits à la main. Au niveau de la programmation, nous avons proposé une nouvelle méthodologie de conception de synthèse de haut niveau visant à améliorer conjointement la qualité logicielle et matérielle. Nous exploitons les fonctionnalités du C++ moderne pour améliorer à la fois la modularité et la lisibilité du code, tout en conservant (ou améliorant) les résultats du matériel généré. Des exemples de conception utilisant notre méthodologie, incluant pour l’analyseur d’entête de paquets, ont été rendus publics.----------ABSTRACT: Software-Defined Networking (SDN) has emerged in recent years as a new network paradigm to de-ossify communication networks. Indeed, by offering a clear separation of network concerns between the management, control, and data planes, SDN allows each of these planes to evolve independently, breaking the rigidity of traditional networks. However, while well spread in the control and management planes, this de-ossification has only recently reached the data plane with the advent of packet processing languages, e.g. P4, and novel programmable switch architectures, e.g. Protocol Independent Switch Architecture (PISA). In this work, we focus on leveraging the PISA architecture by mainly exploiting the FPGA capabilities for efficient packet processing. In this way, we address this issue at different abstraction levels: i) microarchitectural; ii) programming; and, iii) architectural. At the microarchitectural level, we have proposed an efficient FPGA-based packet parser architecture, which is a major PISA’s component. The proposed packet parser follows a feedforward pipeline architecture in which the internal microarchitectural has been meticulously optimized for FPGA implementation. The architecture is automatically generated by a P4- to-C++ compiler after several rounds of graph optimizations. The proposed solution achieves 100 Gb/s line rate with latency comparable to hand-written packet parsers. The throughput scales from 10 Gb/s to 160 Gb/s with moderate increase in resource consumption. Both the compiler and the packet parser codebase have been open-sourced to permit reproducibility. At the programming level, we have proposed a novel High-Level Synthesis (HLS) design methodology aiming at improving software and hardware quality. We have employed this novel methodology when designing the packet parser. In our work, we have exploited features of modern C++ that improves at the same time code modularity and readability while keeping (or improving) the results of the generated hardware. Design examples using our methodology have been publicly released
    corecore