Abstract: This paper presents for hardware-based parallel pattern matching scheme that adopts heterogeneous bit-split string matchers for deep packet inspection (DPI) devices. Considering the pattern lengths, a set of target patterns is partitioned into two subsets for short and long patterns. By adopting the appropriate bit-split string matcher types for the two subsets, the memory requirements can be optimized for the bit-split parallel pattern matching engine. Experimental results show that the total memory requirements decrease by 39.40% and 20.52%, in comparison with the existing bit-split pattern matching approaches.
Introduction
In order to recognize illegal packet payload contents at a line speed, the pattern matching engine is necessary in DPI devices for intrusion detection systems. In pattern matching engines, multiple string matchers detect mapped patterns from payload contents in parallel. Due to the deterministic linear execution time and the fixed number of output transitions in a state, the memory and deterministic finite automaton (DFA)-based pattern matching might be preferable [1] . However, memory requirements can be proportional to the number of states and the size of input symbol. In the bit-split pattern matching approaches in [2] and [3] , each character input symbol was split into bit groups for multiple finite-state machine (FSM) tiles in a string matcher. The number of states was reduced by sharing common prefixes based on the Aho-Corasick algorithm [4] . Target patterns were mapped onto homogeneous string matchers. Even though the regularity was guaranteed by adopting homogeneous string matchers, memory usage was not be able to be efficient due to the various lengths of target patterns that are mapped onto each string matcher.
This paper proposes a memory-efficient parallel pattern matching scheme with heterogeneous bit-split string matchers for two subsets of short and long target patterns. Several parameters are scaled to optimize memory usage of string matchers for each subset. Therefore, the memory-efficient pattern matching engine can be obtained by mapping short and long target patterns just using two different types of string matchers.
Target bit-split pattern matching engine architecture
In the target bit-split string matcher, each homogeneous FSM tile takes n bits of one character at each cycle. Fig. 1 In addition, the PMIs of different output states can indicate the same PMV in the bit-split string matcher. For example, let's assume that an FSM tile takes two least significant bits for input symbol. The matches with patterns ab and cd have the same PMV in the FSM tile. Therefore, the predetermined number of PMVs in the lookup table, V, can be smaller than P. When the number of PMIs and the number of unique PMVs are I and V, the memory requirements of a string matcher are given by:
where S is the number of states in an FSM tile.
In the pattern matching engine architecture, a and b string matchers are adopted in the short and long pattern matcher, respectively. The pattern matching engine can be implemented with embedded memory and small logic (e.g., AND gates). In this case, memory blocks are the major part of the embedded string matchers. Therefore, the total memory requirements can be calculated by:
where Mem short and Mem long denote the memory requirements of one string matcher for the short and the long patterns, respectively.
Pattern Mapping for Parallel Pattern Matching Engine
In the proposed pattern mapping, a set of bit-split DFA contents for the adopted string matchers is obtained. The pattern length is defined as the number of characters in a pattern. The pattern mapping algorithm partitions a set of total target patterns into two subsets α and β by comparing each target pattern length with a predetermined value l. The pseudocode of the pattern mapping is described in Fig. 2 . In a function Get partition, if the length of a target pattern is shorter than l, the target pattern becomes an element of α; otherwise, it can be an element of β. Target patterns in each subset are sorted lexicographically based on character code values to increase the number of shared common prefixes [2] . Using a function Get mapping and the determined order of target patterns, target patterns are mapped onto string matchers until there are no unmapped target patterns. The proposed pattern mapping provides mapping results for total string matchers of α and β.
For each subset, several parameters of the string matcher are calibrated for minimizing the total memory requirements. First, in the string matcher for the subset α, the number of states in an FSM tile, S, can be small because the maximum length of target patterns in α is shorter than l. In this case, the number of unused states could decrease by balancing target pattern lengths in a string matcher. In addition, the probability that the same PMV is shared by the output states of multiple target patterns can increase for α due to the short pattern lengths in α; therefore, the number of PMVs in the lookup table can be reduced. On the other hand, the number of PMIs in α is set as the number of states in an FSM tile, S. This is mainly due to the high probability that a target pattern can be subpatterns of other target patterns. For example, a target pattern ab can be subpatterns of target patterns cab and bbab. In addition, for α, the number of output states can be greater than the number of target patterns mapped onto the string matchers. On the other hand, the numbers of PMIs and PMVs in string matchers for β are equal to the number of bits in a PMV. Due to the large target pattern length in β, the probability that a PMV is shared by several target patterns might be low, compared with the case in α. In addition, the probability that a target pattern becomes subpatterns of other target patterns can decrease with the pattern length. According to the distribution of target pattern lengths in a pattern set and the value of l, the total memory requirements of string matchers for α and β can be different. The optimal l for real pattern sets, therefore, is determined by analyzing experimental results.
Experimental results
The proposed algorithm was implemented using the C++. The memory requirements were calculated based on Eq. 1 and Eq. 2. In our experiments, four sets of target patterns were extracted from Snort v2.8 rule sets [5] . Based on the design analysis in [2] , an FSM tile was assumed to take two bits of one character as an input. Several parameters were swept in order to obtain optimal parameter values for the minimized total memory requirements. The number of states in an FSM tile, S, for the subset α was one of 64, 128, and 256. On the other hand, S for the subset β was one of 128 and 256, considering the maximum length of each target pattern set. The number of bits in a PMV P was one of 8, 16, 32, and 64. The number of PMVs in a lookup table, V, for α was set as the half of P. A set of target patterns was partitioned into α and β by varying l with one of 10, 15, and 20. For the apple-to-apple comparisons, the bit-split pattern matching approaches in [2] and [3] were implemented, where several parameters for their pattern matching engine architectures were swept and then the minimized total memory requirements were obtained. In the evaluations for the four pattern sets, the minimized memory requirements were obtained when l is 15. Therefore, Table I shows the minimized memory requirements and the comparisons with the existing bit-split pattern matching approaches for l = 15. In the first column, #T and ave denote the number of target patterns and the average length of target patterns for each pattern set, respectively. The optimal S and P for α and β were listed for each pattern set. As shown in third column of Table I , the obtained numbers of states S for α and β are 64 and 128 for the first three pattern sets. This means that memory usage was optimized when S was smallest. On the other hand, for β of the web-client pattern set, the total memory requirements could be minimized when S was 256. This was mainly because the average length of target patterns in the web-client pattern set was greater than those in other pattern sets.
For each pattern set, over [2] and over [3] represent the ratio of the memory requirements of the proposed pattern matching to the memory requirements of the bit-split pattern matching approaches in [2] and [3] , respectively. For the web-client pattern set, due to the large number of states in an FSM for β and the long average pattern length, many states in each FSM could not be used. Therefore, the reduction ratio of memory requirements for the web-client pattern set was smaller than those of other pattern sets. In Table  I , the total memory requirements were reduced on average by 39.40% and 20.52%. Considering the experimental results, we conclude that the proposed bit-split parallel pattern matching is useful for reducing storage cost.
