10 research outputs found

    Efficient register renaming and recovery for high-performance processors

    Full text link
    © © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”Modern superscalar processors implement register renaming using either random access memory (RAM) or content-addressable memories (CAM) tables. The design of these structures should address both access time and misprediction recovery penalty. Although direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. The presence of associative ports in CAMs, however, prevents them from scaling with the number of physical registers and pipeline width, negatively impacting performance, area, and energy consumption at the rename stage. In this paper, we present a new hybrid RAM CAM register renaming scheme, which combines the best of both approaches. In a steady state, a RAM provides fast and energy-efficient access to register mappings. On misspeculation, a low-complexity CAM enables immediate recovery. Experimental results show that in a four-way state-ofthe- art superscalar processor, the new approach provides almost the same performance as an ideal CAM-based renaming scheme, while dissipating only between 17% and 26% of the original energy and, in some cases, consuming less energy than purely RAM-based renaming schemes. Overall, the silicon area required to implement the hybrid RAM CAM scheme does not exceed the area required by conventional renaming mechanisms.This work was supported in part by the Spanish MINECO under Grant TIN2012-38341-C04-01.Petit Martí, SV.; Ubal Tena, R.; Sahuquillo Borrás, J.; López Rodríguez, PJ. (2014). Efficient register renaming and recovery for high-performance processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 22(7):1506-1514. https://doi.org/10.1109/TVLSI.2013.2270001S1506151422

    A Ternary Unification Framework for optimizing TCAM-based packet classification systems

    Full text link

    High-Performance Packet Processing Engines Using Set-Associative Memory Architectures

    Get PDF
    The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds. In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing. Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing. By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection. In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table. It then directs each packet to its next hop toward its final destination. A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule. Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS). The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks. In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression. In this thesis, we introduce a family of hardware solutions that combine the above requirements. These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory). CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle. However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time. The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions. We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently. We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes

    Performance Analysis of TCAMs in Switches

    Get PDF
    The Catalyst 6500 is a modern commercial switch, capable of processing millions of packets per second through the utilization of specialized hardware. One of the main hardware components aiding the switch in performing its task is the Ternary Content Addressable Memory (TCAM). TCAMs update themselves with data relevant to routing and switching based on the traffic flowing through the switch. This enables the switch to forward future packets destined to a location that has already been previously discovered - at a very high speed. The problem is TCAMs have a limited size, and once they reach their capacity, the switch has to rely on software to perform the switching and routing - a much slower process than performing Hardware Switching that utilizes the TCAM. A framework has been developed to analyze the switch’s performance once the TCAM has reached its capacity, as well as measure the penalty associated with a cache miss. This thesis concludes with some recommendations and future work

    Software-Driven and Virtualized Architectures for Scalable 5G Networks

    Full text link
    In this dissertation, we argue that it is essential to rearchitect 4G cellular core networks–sitting between the Internet and the radio access network–to meet the scalability, performance, and flexibility requirements of 5G networks. Today, there is a growing consensus among operators and research community that software-defined networking (SDN), network function virtualization (NFV), and mobile edge computing (MEC) paradigms will be the key ingredients of the next-generation cellular networks. Motivated by these trends, we design and optimize three core network architectures, SoftMoW, SoftBox, and SkyCore, for different network scales, objectives, and conditions. SoftMoW provides global control over nationwide core networks with the ultimate goal of enabling new routing and mobility optimizations. SoftBox attempts to enhance policy enforcement in statewide core networks to enable low-latency, signaling-efficient, and customized services for mobile devices. Sky- Core is aimed at realizing a compact core network for citywide UAV-based radio networks that are going to serve first responders in the future. Network slicing techniques make it possible to deploy these solutions on the same infrastructure in parallel. To better support mobility and provide verifiable security, these architectures can use an addressing scheme that separates network locations and identities with self-certifying, flat and non-aggregatable address components. To benefit the proposed architectures, we designed a high-speed and memory-efficient router, called Caesar, for this type of addressing schemePHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146130/1/moradi_1.pd

    Representation of Classification Functions by Head-Tail Expressions

    Get PDF
    九州工業大学博士学位論文 学位記番号:情工博甲第291号 学位授与年月日:平成26年3月25日1 Introduction||2 Preliminary||3 GeneratingPrefixSum-of-ProductsExpressionsforIntervalFunctions||4 Derivation ofHead-TailExpressions for Interval Functions||5 Head-Tail Expressions for Single-Field Classification Functions||6 Head-TailExpressions forMulti-FieldClassificationFunctions||7 Conclusion and Future Work||Acknowledgements||List of PublicationsPacket classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. First, this thesis derives the prefix sum-of-products expression (PreSOP) and the number of products in a PreSOP for an interval function. Second, it derives Ψ(n,τ p), the number of n-variable interval functions that can be represented with τp products. Finally, it shows that more than 99.9% of the n-variable interval functions can be represented with ?32 n ? 1? products when n is sufficiently large. These results are useful for fast PreSOP generator and for estimating the size of Ternary Content Addressable Memories (TCAMs) for packet classification. Second, this thesis shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n : A) functions, lessthan LT(n : B) functions, and interval functions IN0(n : A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression (HT) represents an interval function with at most n words in a ternary content addressable memory (TCAM) realization. It also shows the average numbers of factors to represent interval functions by HTs for up to n = 16, which were obtained by a computer simulation. It also conjectures that, for sufficiently large n, the average number of factors to represent n-variable interval functions by HTs is at most 23 n ? 59. Experimental results also show that, for n ? 10, to represent interval functions, HTs require at least 20% fewer factors than MSOPs, on the average. Third, this thesis presents a method to generate head-tail expressions for single-field classification functions. First, it introduces a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, it shows a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification. Finally, this thesis shows methods to simplify rules in TCAMs for packet classification. First method, it partitions the rules into groups so that each group has the same source address, destination address and protocol. After that, it implifies rules in each group by removing redundant rules. A computer program was developed to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 rules, 73% for ACL3 rules, and 87% for overall rules. This algorithm is useful to reduce TCAMs for packet classification. In the second method, we reduce the number of words in TCAM for multi-field classification functions by using head-tail expressions. It presents MFHT, an O(r2)-algorithm to generate simplified TCAMs for two-field classification functions, where r is the number of rules. Experimental results show that MFHT achieves a 58% reduction of words for random rules and a 52% reduction of words for ACL and FW rules. Moreover, MFHT is fast. The methods are useful for simplifying TCAM for packet classification

    Power and Memory Efficient Hashing Schemes for Some Network Applications

    Get PDF
    Hash tables (HTs) are used to implement various lookup schemes and they need to be efficient in terms of speed, space utilization, and power consumptions. For IP lookup, the hashing schemes are attractive due to their deterministic O(1) lookup performance and low power consumptions, in contrast to the TCAM and Trie based approaches. As the size of IP lookup table grows exponentially, scalable lookup performance is highly desirable. For next generation high-speed routers, this is a vital requirement when IP lookup remains in the critical data path and demands a predictable throughput. However, recently proposed hash schemes, like a Bloomier filter HT and a Fast HT (FHT) suffer from a number of flaws, including setup failures, update overheads, duplicate keys, and pointer overheads. In this dissertation, four novel hashing schemes and their architectures are proposed to address the above concerns by using pipelined Bloom filters and a Fingerprint filter which are designed for a memory-efficient approximate match. For IP lookups, two new hash schemes such as a Hierarchically Indexed Hash Table (HIHT) and Fingerprint-based Hash Table (FPHT) are introduced to achieve a a perfect match is assured without pointer overhead. Further, two hash mechanisms are also proposed to provide memory and power efficient lookup for packet processing applications. Among four proposed schemes, the HIHT and the FPHT schemes are evaluated for their performance and compared with TCAM and Trie based IP lookup schemes. Various sizes of IP lookup tables are considered to demonstrate scalability in terms of speed, memory use, and power consumptions. While an FPHT uses less memory than an HIHT, an FPHT-based IP lookup scheme reduces power consumption by a factor of 51 and requires 1.8 times memory compared to TCAM-based and trie-based IP lookup schemes, respectively. In dissertation, a multi-tiered packet classifier has been proposed that saves at most 3.2 times power compared to the existing parallel packet classifier. Intrinsic hashing schemes lack of high throughput, unlike partitioned Ternary Content Addressable Memory (TCAM)-based scheme that are capable of parallel lookups despite large power consumption. A hybrid CAM (HCAM) architecture has been introduced. Simulation results indicate HCAM to achieve the same throughput as contemporary schemes while it uses 2.8 times less memory and 3.6 times less power compared to the contemporary schemes

    Implementation of packet processing functions in high capacity internet routers.

    Get PDF
    Internet predstavlja jedan od najvažnijih temelja razvoja modernog društva i učestvuje u svim aspektima svakodnevnog života - poslovnom, socijalnom, zabavnom, edukativnom itd. Internet je postigao globalni uspeh zahvaljujući svojoj robusnosti i mogućnosti da povezuje različite tehnologije u jednu meñusobno povezanu mrežu. Osnovu arhitekture Interneta čine ruteri koji omogućavaju globalnu povezanost svih delova Internet mreže. Pošto ruteri čine osnovnu gradivnu jedinicu Interneta, performanse i mogućnosti rutera imaju ogroman uticaj na kvalitet rada Internet mreže. Broj Internet korisnika neprestano raste. Takoñe, razvijaju se i nove aplikacije i servisi koji zahtevaju sve veće protoke, usled čega se u Internet mreži instaliraju linkovi sve većih kapaciteta. Kao posledica, količina saobraćaja na Internetu neprestano raste, pa samim tim Internet ruteri postaju sve opterećeniji, naročito u jezgru Internet mreže gde je saobraćaj najintezivniji. Internet ruteri moraju neprestano da se usavršavaju i unapreñuju, da bi mogli veoma brzo obrañivati ogromne količine podataka. Dodatne otežavajuće okolnosti sa stanovišta obrade podataka u ruterima su potreba za uvoñenjem mehanizama kvaliteta servisa i multikast saobraćaj koji je sve popularniji. Mnogi istraživači i naučnici rade na unapreñivanju funkcionalnosti rutera i razvoju novih rešenja i algoritama koji treba da omoguće efikasniji rad rutera. Meñutim, velik problem u razvoju novih rešenja i unapreñenja postojećih funkcija je zatvorenost rutera komercijalnih proizvoñača pa samim tim razvijana rešenja se tipično ispituju zasebno bez potpune integracije sa svim funkcijama rutera. Ovakav način ispitivanja je nepotpun jer ne omogućava kompletan uvid u kvalitet rada novog rešenja u realnom okruženju. Da bi se izbegli navedeni problemi, razvojni tim pod vodstvom dr Aleksandre Smiljanić je u okviru projekta „Sistemska integracija Internet rutera“ podržanog od strane Ministarstva za Nauku i tehnološki razvoj Republike Srbije započeo razvoj prototipa Internet rutera. Konačni cilj projekta je bio razvoj komercijalnog proizvoda, meñutim, pored ovog cilja namera je bila i da se obezbedi otvorena platforma istraživačima i studentima na kojoj bi mogli da proučavaju internu strukturu i arhitekturu rutera i da razvijaju i testiraju nova rešenja u realnom okruženju.Internet is one of the most important parts of the modern society. It participates in all aspects of everyday’s life - business, social, entertainment, education etc. Internet achieved global success thanks to its robustness and internetworking between various technologies. Routers enable Internet’s global connectivity and thus represent the foundation of the Internet. As routers are the main components of the Internet, their performances and capabilities have great impact on Internet quality performances. The number of Internet users continuously grows. New applications and services that demand high throughput are constantly developed, and as consequence higher capacity links are installed. The Internet traffic continuously grows, so Internet routers are more and more loaded with traffic, especially in the Internet core, where Internet traffic is most intensive. Therefore, Internet routers must be always upgraded to support high speed processing of large amount of the Internet traffic. QoS mechanisms and multicast traffic represent additional difficulties in the future router development. Many researchers and scientists are involved in router development process that includes development of new solutions and algorithms that enable more efficient router performances. However, the main problem in the development process is the closed router architecture in routers of commercial companies, thus developed solutions are tested without complete integration with the rest of the router functions. This leads to incomplete development and testing. To avoid aforementioned problems, research team led by Aleksandra Smiljanić started Internet router prototype development in the project „System integration of the Internet router“ supported by the Serbian Ministry of Science. The main goal of the project was development of the commercial router. Also, very important goal was development of the open source platform for researchers and students that would be used for the education purposes, as well as the research purposes where new solutions could be tested in the real environment. Internet routers contain two planes - data plane and control plane. Data plane is implemented in hardware and is responsible for fast IP packet processing. Control plane is implemented in software and is responsible for communication with router’s environment (neighbor routers, administrators and etc.). In this PhD thesis IP packet processors are developed and implemented. IP packet processors represent the most important part of the data plane
    corecore