This paper proposes a hardware-based parallel pattern matching engine using a memory-based bit-split string matcher architecture. The proposed bit-split string matcher separates the transition 
Introduction
The pattern matching engine is an essential device to detect target patterns from packet payloads for the deep packet inspection (DPI). As the number of target patterns increases, a pattern matching engine should adopt multiple string matchers in order to recognize matches with target patterns in parallel. Due to the increasing wire speed, the hardware-based pattern matching engine could be preferred for high throughput. For the hardware-based pattern matching engine, the memory-based deterministic finite automaton (DFA) guarantees the regularity with the linear processing time of a pattern matching [1] . However, memory requirements are proportional to the number of states and the size of input symbols for each DFA.
In order to minimize total memory requirements for the memory-based DFA, the bit-split pattern matching engines based on Aho-Corasick algorithm [2] were proposed in [3] and [4] , where homogeneous finite-state machine (FSM) tiles were adopted in each hardware-based string matcher. In the existing bit-split pattern matching engines, a character input symbol was split into multiple input bit position groups for each FSM tile. In an FSM tile, each state contained state transitions for all input bits. Due to the limitation of the number of mapped target patterns onto a string matcher, many state transitions could go towards the initial state.
This paper proposes a bit-split string matcher architecture for the hardware-based parallel pattern matching. The memory requirements of each string matcher can decrease 
Proposed Bit-Split String Matcher
In the proposed bit-split string matcher, each homogeneous FSM tile takes n bits of one character (or one byte) as an input at each cycle. For example, when each FSM tile takes four bits as an input symbol, the number of FSM tiles is two.
In an output state of each FSM tile, the k-th bit of a partial match vector (PMV) represents whether the k-th pattern is matched or not in the state. The predetermined number of bits in a PMV, P, limits the number of target patterns that can be mapped onto a string matcher. A matched pattern can be recognized by the full match vector, which is obtained from the logical AND operation of PMVs from all FSM tiles. The architecture of the proposed FSM tile is described in Fig. 1(a) . Transition, state, and PMV lookup tables are in an FSM tile. The number in the angle bracket of the first row represents the width of entries. T entries can be stored in the transition table, where the t-th entry indicates the address of the s-th entry in a state table. The s-th entry contains the base address of the next state transition, which is stored in transition field. The base address means the starting address in the transition table for the s-th entry. In order to calculate the address offset in the transition table for the next state transition, the input field is introduced. Each bit in an input field represents the existence of the state transition towards the non-initial state for each input symbol. The value of an input symbol i indicates the i-th bit position from the LSB in the input field, where the 0-th bit position means the LSB. The counting value c can be obtained by summing the number of valid bits from the LSB to the (i − 1)-th bit position. The value of the bit in the i-th position is denoted as input [i] . When input [i] is zero, the next transition pointer is set as zero; otherwise, the next transition pointer is the total sum of the counting value c and the base address of the next transition pointer. For example, the DFA in Fig. 1(b) contains four target patterns: "he," "hers," "his," and "she," where four LSBs are the input symbol. On the arrow, the hexadecimal value of the input symbol is shown in parenCopyright c 2010 The Institute of Electronics, Information and Communication Engineers theses. The state nodes with gray color are output states. The dotted lines represent the failing pointers. Several table contents for the DFA are illustrated in Fig. 1(c) . The pattern search starts from the initial node S 0 in the first row of the state table. Let's assume that an input is 0x8 in S 0 . The value in input field for S 0 is 0d264, or 0b100001000. Because the number of true bits (0b1) from the LSB to the seventh bit is one, the offset is one, so that the next transition pointer is two. Therefore, the next state can be S 1 .
In addition, PMVs can be stored into a PMV lookup table. The s-th entry in the state table contains a pattern match index (PMI), which indicates a unique PMV in the lookup table. As shown in [4] , considering the worst case for identifying unique patterns, the number of entries in the PMV lookup table was set as the number of bits in a PMV, P.
Based on several notations mentioned above, the memory requirements of a string matcher are given by:
where S denotes the maximum number of states in an FSM tile.
Pattern Mapping onto String Matchers
The number of state transitions stored in the transition table is not proportional to the number of states. Therefore, unlike the existing bit-split pattern matching approaches in [3] and [4] , the number of maximum state transitions is not calculated with the maximum number of states; the number of maximum state transitions, T , should be separately predetermined. For the proposed string matcher, the maximum number of target patterns that can be mapped onto a string matcher is limited by the width of each PMV, P. In addition, the numbers of states in an FSM tile should not be greater than the maximum number of states S . Due to the resource limitations above, the number of target patterns mapped onto each string matcher is limited. Pattern mapping is performed based on the order determined by lexicographical sorting in [3] , which increases the number of shared common prefixes. Until all target patterns are mapped, the pattern mapping is repeated. The pattern mapping for the proposed string matcher is described as follows: (STEP 1) Initially, P target patterns are adopted. For the adopted target patterns, STEP 1 checks whether all FSM tiles can be built under the predetermined maximum numbers of state transitions and states, T and S . If all FSM tiles can be built, the iteration of STEP 1 is stopped and STEP 2 is executed; otherwise, the number of the adopted unmapped target patterns decreases by one, and then whether all FSM tiles can be built is checked.
(STEP 2) With the mappable target patterns obtained in STEP 2, Aho-Corasick algorithm is applied to add failing pointers. The PMVs stored in each FSM can be obtained by the construction described in [3] .
In order to determine the optimal number of FSMs in a string matcher, three cases with homogeneous FSM tiles are theoretically analyzed: two FSMs with four bit input, four FSMs with two bit input, and eight FSMs with one bit input. Based on Eq. (1), the last case of eight FSM tiles with one bit input is not adopted for the proposed string matcher, compared with the first and second cases. Considering the memory requirements calculated using Eq. (1), the second case could be preferred if the following condition can be satisfied:
According to target patterns, the pattern length distributions and the number of shared common prefixes can be different. The optimal parameter values, which include the maximum number of states transitions, therefore, will be obtained by analyzing experimental results for real rule sets.
Experimental Results
In our experiments, four large rule sets were extracted from Snort v2.8 rules [5] . Several parameters were swept in order to obtain optimal parameter values; the number of states in an FSM tile S was one of 128 and 256; the number of bits in a PMV P was one of 16, 32, and 64. In the proposed pattern matching engine, the maximum number of state transitions in an FSM tile T was one of 192, 256, 384, and 512, when S was 128; T was one of 384, 512, 768, and 1024, when S was 256. From Eq. (2), four bits of one character were adopted for the input of an FSM tile. For all four rule sets, memory requirements were minimized when S and P were 128 and 16, respectively. This means that when the number of target patterns that are mapped onto each string matcher is small, total memory requirements could be reduced. Figure 2 illustrates a summary of total memory requirements by varying the maximum number of state transitions T when S and P were 128 and 16, repectively. The numbers in parentheses denote the average length and the number of target patterns for each rule set, respectively. As shown in Fig. 2 , memory requirements were minimized when T was 256. In this case, for the FSM tiles with four input bits, the Table 1 shows the performance enhancements for four rule sets, in terms of reduced total memory requirements. Even though the number of adopted string matchers increased, total memory requirements could be reduced because the memory requirements of each string matcher became smaller. For each rule set, over [3] and over [4] represent performance enhancements over the bit-split pattern matching engines in [3] and [4] , respectively. Especially, memory requirements for spyware and web-client rule sets were greatly reduced. This was mainly because the average numbers of required state transitions for spyware and web-client rule sets could be smaller than those of other rule sets. In Table 1 , total memory requirements were reduced on average by 34.89%, and 14.50%.
Conclusion
In this paper, the proposed bit-split string matcher can eliminate the memory requirements for storing state transitions towards the initial state. With the memory-efficient string matcher, a hardware-based parallel bit-split pattern matching engine minimizes total memory requirements for mapping real rule sets. Considering the performance enhancements, we conclude that the pattern matching engine with the proposed bit-split string matchers is useful for reducing storage cost.
