278 research outputs found

    Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications

    Get PDF
    A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices

    Design of High Performance Packet Classification Architecture for Communication Networks

    Get PDF
    Packet classification is a crucial technique for secure communication and networking. Security tools and internet services use packet classification technique which involves checking of packets against predefined rules stored in a classifier. The performance of the available software solutions of classification is not desirable and efficient for wire speed processing in high speed networks. Ternary Content Addressable Memory (TCAM), Bit-Vector (BV), field split bit vector (FSBV) and StrideBV algorithm are hardware based packet classification algorithms. In this paper, simple and memory efficient approach for packet classification has been proposed using Xnor gate instead of using lookup tables called XnorBV approach. Packet header fields of Internet protocol (IP) addresses and protocol layer are classified using Xnor gate against predefined ruleset which also support ternary bit pattern of ‘1’, ‘0’ and ‘*’ while port numbers of packet header support range match by comparing port numbers against lower bound and upper bound. The proposed parallel pipelined architecture can sustain a high throughput of +100 Gbps and low latency. The proposed method is memory efficient than other existing techniques, also supports prefix, range and exact match without use of range to prefix conversion. Also proposed XnorBV architecture is independent of ruleset feature and supports multiple dimension classification

    Energy Efficient Hardware Accelerators for Packet Classification and String Matching

    Get PDF
    This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

    High performance modified bit-vector based packet classification module on low-cost FPGA

    Get PDF
    The packet classification plays a significant role in many network systems, which requires the incoming packets to be categorized into different flows and must take specific actions as per functional and application requirements. The network system speed is continuously increasing, so the demand for the packet classifier also increased. Also, the packet classifier's complexity is increased further due to multiple fields should match against a large number of rules. In this manuscript, an efficient and high performance modified bitvector (MBV) based packet classification (PC) is designed and implemented on low-cost Artix-7 FPGA. The proposed MBV based PC employs pipelined architecture, which offers low latency and high throughput for PC. The MBV based PC utilizes <2% slices, operating at 493.102 MHz, and consumes 0.1 W total power on Artix-7 FPGA. The proposed PC considers only 4 clock cycles to classify the incoming packets and provides 74.95 Gbps throughput. The comparative results in terms of hardware utilization and performance efficiency of proposed work with existing similar PC approaches are analyzed with better constraints improvement

    Interconnect architectures for dynamically partially reconfigurable systems

    Get PDF
    Dynamically partially reconfigurable FPGAs (Field-Programmable Gate Arrays) allow hardware modules to be placed and removed at runtime while other parts of the system keep working. With their potential benefits, they have been the topic of a great deal of research over the last decade. To exploit the partial reconfiguration capability of FPGAs, there is a need for efficient, dynamically adaptive communication infrastructure that automatically adapts as modules are added to and removed from the system. Many bus and network-on-chip (NoC) architectures have been proposed to exploit this capability on FPGA technology. However, few realizations have been reported in the public literature to demonstrate or compare their performance in real world applications. While partial reconfiguration can offer many benefits, it is still rarely exploited in practical applications. Few full realizations of partially reconfigurable systems in current FPGA technologies have been published. More application experiments are required to understand the benefits and limitations of implementing partially reconfigurable systems and to guide their further development. The motivation of this thesis is to fill this research gap by providing empirical evidence of the cost and benefits of different interconnect architectures. The results will provide a baseline for future research and will be directly useful for circuit designers who must make a well-reasoned choice between the alternatives. This thesis contains the results of experiments to compare different NoC and bus interconnect architectures for FPGA-based designs in general and dynamically partially reconfigurable systems. These two interconnect schemes are implemented and evaluated in terms of performance, area and power consumption using FFT (Fast Fourier Transform) andANN(Artificial Neural Network) systems as benchmarks. Conclusions drawn from these results include recommendations concerning the interconnect approach for different kinds of applications. It is found that a NoC provides much better performance than a single channel bus and similar performance to a multi-channel bus in both parallel and parallel-pipelined FFT systems. This suggests that a NoC is a better choice for systems with multiple simultaneous communications like the FFT. Bus-based interconnect achieves better performance and consume less area and power than NoCbased scheme for the fully-connected feed-forward NN system. This suggests buses are a better choice for systems that do not require many simultaneous communications or systems with broadcast communications like a fully-connected feed-forward NN. Results from the experiments with dynamic partial reconfiguration demonstrate that buses have the advantages of better resource utilization and smaller reconfiguration time and memory than NoCs. However, NoCs are more flexible and expansible. They have the advantage of placing almost all of the communication infrastructure in the dynamic reconfiguration region. This means that different applications running on the FPGA can use different interconnection strategies without the overhead of fixed bus resources in the static region. Another objective of the research is to examine the partial reconfiguration process and reconfiguration overhead with current FPGA technologies. Partial reconfiguration allows users to efficiently change the number of running PEs to choose an optimal powerperformance operating point at the minimum cost of reconfiguration. However, this brings drawbacks including resource utilization inefficiency, power consumption overhead and decrease in system operating frequency. The experimental results report a 50% of resource utilization inefficiency with a power consumption overhead of less than 5% and a decrease in frequency of up to 32% compared to a static implementation. The results also show that most of the drawbacks of partial reconfiguration implementation come from the restrictions and limitations of partial reconfiguration design flow. If these limitations can be addressed, partial reconfiguration should still be considered with its potential benefits.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201

    An Impulse-C Hardware Accelerator for Packet Classification Based on Fine/Coarse Grain Optimization

    Get PDF
    Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. The Packet Classification with Incremental Update (PCIU) algorithm, Ahmed et al. (2010), is a novel and efficient packet classification algorithm with a unique incremental update capability that demonstrated excellent results and was shown to be scalable for many different tasks and clients. While a pure software implementation can generate powerful results on a server machine, an embedded solution may be more desirable for some applications and clients. Embedded, specialized hardware accelerator based solutions are typically much more efficient in speed, cost, and size than solutions that are implemented on general-purpose processor systems. This paper seeks to explore the design space of translating the PCIU algorithm into hardware by utilizing several optimization techniques, ranging from fine grain to coarse grain and parallel coarse grain approaches. The paper presents a detailed implementation of a hardware accelerator of the PCIU based on an Electronic System Level (ESL) approach. Results obtained indicate that the hardware accelerator achieves on average 27x speedup over a state-of-the-art Xeon processor
    corecore