861 research outputs found
Branch Prediction For Network Processors
Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine.
Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures
HeteroSketch: coordinating network-wide monitoring in heterogeneous and dynamic networks
CNS-2107086 - National Science Foundation; CNS-2106946 - National Science FoundationPublished versio
Models, Algorithms, and Architectures for Scalable Packet Classification
The growth and diversification of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of traffic in the Internet, are being called upon to not only handle increased volumes of traffic at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to defined traffic flows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classification problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and filter sets and increasingly complex filters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Prefix Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classification techniques, a thorough analysis of packet classification filter sets, the design and analysis of a suite of performance evaluation tools for packet classification algorithms and devices, and a new packet classification algorithm that scales to support high-speed links and large filter sets classifying on additional packet fields
High-Performance Packet Processing Engines Using Set-Associative Memory Architectures
The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds.
In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing.
Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing.
By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection.
In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table.
It then directs each packet to its next hop toward its final destination.
A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule.
Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS).
The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks.
In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression.
In this thesis, we introduce a family of hardware solutions that combine the above requirements.
These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory).
CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle.
However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time.
The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions.
We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently.
We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes
Recommended from our members
Design and Implementation of a High Performance Network Processor with Dynamic Workload Management
Internet plays a crucial part in today\u27s world. Be it personal communication, business transactions or social networking, internet is used everywhere and hence the speed of the communication infrastructure plays an important role. As the number of users increase the network usage increases i.e., the network data rates ramped up from a few Mb/s to Gb/s in less than a decade. Hence the network infrastructure needed a major upgrade to be able to support such high data rates. Technological advancements have enabled the communication links like optical fibres to support these high bandwidths, but the processing speed at the nodes remained constant. This created a need for specialised devices for packet processing in order to match the increasing line rates which led to emergence of network processors. Network processors were both programmable and flexible. To support the growing number of internet applications, a single core network processor has transformed into a multi/many core network processor with multiple cores on a single chip rather than just one core. This improved the packet processing speeds and hence the performance of a network node. Multi-core network processors catered to the needs of a high bandwidth networks by exploiting the inherent packet-level parallelism in a network. But these processors still had intrinsic challenges like load balancing. In order to maximise throughput of these multi-core network processors, it is important to distribute the traffic evenly across all the cores. This thesis describes a multi-core network processor with dynamic workload management. A multi-core network processor, which performs multiple applications is designed to act as a test bed for an effective workload management algorithm. An effective workload management algorithm is designed in order to distribute the workload evenly across all the available cores and hence maximise the performance of the network processor. Runtime statistics of all the cores were collected and updated at run time to aid in deciding the application to be performed on a core to to enable even distribution of workload among the cores. Hence, when an overloading of a core is detected, the applications to be performed on the cores are re-assigned. For testing purposes, we built a flexible and a reusable platform on NetFPGA 10G board which uses a FPGA-based approach to prototyping network devices. The performance of the designed workload management algorithm is tested by measuring the throughput of the system for varying workloads
Design of a Scalable Path Service for the Internet
Despite the world-changing success of the Internet, shortcomings in its routing and forwarding system have become increasingly apparent. One symptom is an escalating tension between users and providers over the control of routing and forwarding of packets: providers understandably want to control use of their infrastructure, and users understandably want paths with sufficient quality-of-service (QoS) to improve the performance of their applications. As a result, users resort to various “hacks” such as sending traffic through intermediate end-systems, and the providers fight back with mechanisms to inspect and block such traffic.
To enable users and providers to jointly control routing and forwarding policies, recent research has considered various architectural approaches in which provider- level route determination occurs separately from forwarding. With this separation, provider-level path computation and selection can be provided as a centralized service: users (or their applications) send path queries to a path service to obtain provider- level paths that meet their application-specific QoS requirements. At the same time, providers can control the use of their infrastructure by dictating how packets are forwarded across their network. The separation of routing and forwarding offers many advantages, but also brings a number of challenges such as scalability. In particular, the path service must respond to path queries in a timely manner and periodically collect topology information containing load-dependent (i.e., performance) routing information.
We present a new design for a path service that makes use of expensive pre- computations, parallel on-demand computations on performance information, and caching of recently computed paths to achieve scalability. We demonstrate that, us- ing commodity hardware with a modest amount of resources, the path service can respond to path queries with acceptable latency under a realistic workload. The ser- vice can scale to arbitrarily large topologies through parallelism. Finally, we describe how to utilize the path service in the current Internet with existing Internet applica- tions
- …