604 research outputs found

    Models, Algorithms, and Architectures for Scalable Packet Classification

    Get PDF
    The growth and diversification of the Internet imposes increasing demands on the performance and functionality of network infrastructure. Routers, the devices responsible for the switch-ing and directing of traffic in the Internet, are being called upon to not only handle increased volumes of traffic at higher speeds, but also impose tighter security policies and provide support for a richer set of network services. This dissertation addresses the searching tasks performed by Internet routers in order to forward packets and apply network services to packets belonging to defined traffic flows. As these searching tasks must be performed for each packet traversing the router, the speed and scalability of the solutions to the route lookup and packet classification problems largely determine the realizable performance of the router, and hence the Internet as a whole. Despite the energetic attention of the academic and corporate research communities, there remains a need for search engines that scale to support faster communication links, larger route tables and filter sets and increasingly complex filters. The major contributions of this work include the design and analysis of a scalable hardware implementation of a Longest Prefix Matching (LPM) search engine for route lookup, a survey and taxonomy of packet classification techniques, a thorough analysis of packet classification filter sets, the design and analysis of a suite of performance evaluation tools for packet classification algorithms and devices, and a new packet classification algorithm that scales to support high-speed links and large filter sets classifying on additional packet fields

    On using content addressable memory for packet classification

    Get PDF
    Packet switched networks such as the Internet require packet classification at every hop in order to ap-ply services and security policies to traffic flows. The relentless increase in link speeds and traffic volume imposes astringent constraints on packet classification solutions. Ternary Content Addressable Memory (TCAM) devices are favored by most network component and equipment vendors due to the fast and de-terministic lookup performance afforded by their use of massive parallelism. While able to keep up with high speed links, TCAMs suffer from exorbitant power consumption, poor scalability to longer search keys and larger filter sets, and inefficient support of multiple matches. The research community has responded with algorithms that seek to meet the lookup rate constraint with greater efficiency through the use of com-modity Random Access Memory (RAM) technology. The most promising algorithms efficiently achieve high lookup rates by leveraging the statistical structure of real filter sets. Due to their dependence on filter set characteristics, it is difficult to provision processing and memory resources for implementations that support a wide variety of filter sets. We show how several algorithmic advances may be leveraged to im-prove the efficiency, scalability, incremental update and multiple match performance of CAM-based packet classification techniques without degrading the lookup performance. Our approach, Label Encoded Content Addressable Memory (LECAM), represents a hybrid technique that utilizes decomposition, label encoding, and a novel Content Addressable Memory (CAM) architecture. By reducing the number of implementation parameters, LECAM provides a vehicle to carry several of the recent algorithmic advances into practice. We provide a thorough overview of CAM technologies and packet classification algorithms, along with a detailed discussion of the scaling issues that arise with longer search keys and larger filter sets. We also provide a comparative analysis of LECAM and standard TCAM using a collection of real and synthetic filter sets of various sizes and compositions

    JA-trie: Entropy-based packet classification

    Get PDF
    Any improvement in packet classification performance is crucial to ensure Internet functions continue to track the ever-increasing link capacities. Packet classification is the foundation of many Internet functions: from fundamental packet-forwarding to advanced features such as Quality of Service en-forcement, monitoring and security functions. This work proposes a novel trie-based classification algorithm, named Jump-Ahead Trie (JA-trie), utilizing an entropy-based pre-processing phase and a novel approach to wildcard matching. Through extensive experimental tests, we demonstrate that our proposed algorithm is able to outperform a range of state-of-the-art classification algorithms.This work was jointly supported by the EPSRC INTERNET Project EP/H040536/1, by the National Science Foundation under Grant No. CNS-0855268, and by the MIUR project GreenNet (FIRB 2010).This is the accepted manuscript. The final version is available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6900878

    Branch Prediction For Network Processors

    Get PDF
    Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine. Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures

    Packet Filtering Module For PFQ Packet Capturing Engine.

    Get PDF
    The evolution of commodity hardware is pushing parallelism forward as the key factor that can allow software to attain hardware-class performance while still retaining its advantages. On one side, commodity CPUs are providing more and more cores (the next-generation Intel Xeon E 7500 CPUs will soon make 10 cores processors a commodity product), with a complex cache hierarchy which makes aware data placement crucial to good performance. On the other side, server NIC‘s are adapting to these new trends by increasing themselves their level of parallelism. While traditional 1Gbps NICs exchanged data with the CPU through a single ring of shared memory buffers, modern 10Gbps cards support multiple queues: multiple cores can therefore receive and transmit packets in parallel. In particular, incoming packets can be de-multiplexed across CPUs based on a hash function (the so-called RSS technology) or on the MAC address (the VMD-q technology, designed for servers hosting multiple virtual machines). The Linux kernel has recently begun to support these new technologies. Though there is lot of network monitoring software‘s, most of them have not yet been designed with high parallelism in mind. Therefore a novel packet capturing engine, named PFQ was designed, that allows efficient capturing and in-kernel aggregation, as well as connection-aware load balancing. Such an engine is based on a novel lockless queue and allows parallel packet capturing to let the user-space application arbitrarily define its degree of parallelism. Therefore, both legacy applications and natively parallel ones can benefit from such capturing engine. In addition, PFQ outperforms its competitors both in terms of captured packets and CPU consumption. In this thesis, a new packet filtering block is designed implemented and added to the existing PFQ capture engine which helps in dropping out unnecessary packets before they are copied into the kernel space thus improves the overall performance of the engine considerably. Because network monitors often want only a small subset of network traffic, a dramatic performance gain is realized by filtering out unwanted packets in interrupt context

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark
    corecore