Abstract-Intrusion detection and prevention system have to define more and more patterns to identify the diversification intrusions. Pattern matching, the main part of almost every modern intrusion detection system, should provide exceptionally high performance and ability of reconfiguration. FPGA based pattern matching sub-system becomes a popular solution for modern intrusion detection system. But there is still significant space to improve the FPGA resource efficiency. In this paper, we present a novel pattern matching implementation using the Half Byte Comparators (HBC). HBC based pattern matching approach can increase the area efficiency. But the operating frequency will be a little decrease. We also explored some methods to improve the operating frequency in this paper. The result shows for matching more than 22,000 characters (All the rules in SNORT v2.0) our implementation achieving an area efficiency of more than 3.13 matched characters per logic cell, achieving an operating frequency of about 325 MHz (2.6Gbps) on a Virtex-II pro device. When using quad parallelism to increase the matching throughput, the area efficiency of a logic cell is decrease to 0.71 characters for a throughput of almost 8.5 Gbps.
I. INTRODUCTION
Network security becomes a hot topic nowadays. Methods commonly used to protect against network attacks include firewalls with packet filter to filter out obviously dangerous packets, and Intrusion Detection Systems (IDS) which use much more sophisticated rules and pattern matching to distinguish potential dangerous packets. But these techniques require huge computing powers of network security devices. The traditional software solution is not competent for the high speed networks nowadays [14] . Hardware based solution can meet the performance requirements of the today and tomorrow's networks. The key module of the hardware based network security device is pattern matching.
The signature of an attack may exist at any position of data packets in network traffic. In order to identify ifthere is any of the predefined patterns existing in the target packet, pattern matching module should inspect the packet byte by byte. In general, the input of pattern matching system is one byte per clock period. In order to improve the throughput of the pattern matching module, the input will be parallel N-bytes per clock period. The output of string matching system are matching signal and pattern index. The matching signal indicates whether there is predefined pattern matched. The pattern index indicates the existence of predefined pattern in the target data packets. The patterns defined by the SNORT [10] , a well known open source software based IDS, are often used in all kinds of IDS. It defines thousands of patterns in its anti-attack rules. In order to check input packets in wire speed, the pattern matching module should compare the packet data with all the predefined patterns synchronously when the packet passes by. The parallel compare is the most important and complex part in hardware based pattern matching system.
Hardware based pattern matching system has the advantages of high speed and parallel processing [6] . It [16] . Every character needs two Half-byte comparators, for higher 4bits and lower 4bits respectively. In our design, the same higher/lower 4 bits comparators are fully shared.
Taking "ABCA" as an example pattern, we illustrate the system architecture in Figure 1 . The ASCII code of"ABCA" 0x4 and Oxl (delayed 3 cycles) for "A", 0x4 and 0x2 (delayed 2 cycles) for "B", 0x4 and 0x3 (delayed 1 cycle) for "C", 0x4 and 0x4 (delayed 0 cycle) for "D". Matching circuit generates a final signal which tells the encoder if the pattern did match. Figure 2 shows the implementation for all SNORT If the start character of the predefined matching pattern occurs at the higher byte, output of the upper AND gate in figure 3 will be asserted. If it occurs at the lower byte, output of the bottom AND gate in figure 3 will be asserted high. Ether of the two conditions will cause the OR gate outputs asserted high. For any value of M, this scheme can be used.
B. Decreasing the numbers offan out
Because of the output of delay register array shared, the fan out of some register will be as large as hard to implement in FPGA with high system frequency. In order to decrease the numbers of fan out of these registers, we add some registers to share in the fan out. Figure 4 illustrates the sharing method to decrease the numbers of fan out.
a) b)
C. Decreasing the combination logic delay Propagation and gate delay of combination logic can be ended by registers. The combination logic in Xinlinx [16] makes up of LUTs and routing resources. The inputs of AND gates larger than 4 bits will cause the stacking of LUTs, which will increase the propagation delay. In order to decrease the propagation delay, registers and small AND gates are used. Figure 5 shows the method to decrease the propagation delay. Figure 3 shows a pattern module of M = 2. There are two conditions for the alignment of pattern exists in the target packet. The start character of the predefined pattern may occur at the higher byte or lower bytes of the input parallel two bytes. We must design two pattern matching circuits for the two conditions. Thus, the resource in FPGA is doubled. a;) b) Figure 5 , Using registers to the huge combination logic. 32 inputs AND gate in a) can be divided into two groups of small AND gate (8 inputs and 4 inputs) which separated by registers in b).
V. EVALUATIONS
There are two important metrics generally used to evaluate the pattern matching modules: performance and area cost. Performance can be presented by the system operation frequency and throughputs. And area efficiency can be presented by the implemented characters per logic cell. In order to get these two metrics, the implementation of pattern matching system should be done. Considering the mapping from patterns to RTL code is simple in our design, we developed a program that can translate the specified patterns into synthesizable Verilog HDL. We implemented the SNORT rule set v2.0 [9] Figure 6 shows the evaluation result ofperformance when the parallel parameter equals 1. And figure 7 shows the evaluation result of performance when the parallel parameter equals 4. Figure 8 shows the evaluation result of area efficiency when the parallel parameter equals 1. And figure 9 shows the evaluation result of area efficiency when the parallel parameter equals 4. The operating frequency is decrease when implemented character increasing but the area efficiency is increased when implemented character increasing.
VI CONCLUSIONS
In this paper, we present a novel pattern matching approach using the HBCs. With our approach, tens of thousands of characters can be implemented in a single FPGA chip. The following methods are used in our approach. First, we reduce the area cost of character using (i) 32 HBCs to buildup all the possible ASCII character and (ii) register array and combination logic to implement all possible character combination for SNORT signatures. Second we achieve high operating frequencies by (iii) using parallel HBC group for processing M bytes parallel input and (iv) using state-of-the-art pipelines for faster circuits.
In a Xilinx Vertex II pro 30 FPGA, we implemented the entire rule set for SNORT version 2.0 (more than 22,000 characters for 2,000 patterns). The pattern matching module 
