ABSTRACT-Pattern recognition has many applications in design automation. A generalized pattern recognition algorithm is presented in this paper which can efficiently extract similar patterns in programs. Compared to previous pattern-based techniques, our approach overcomes their limitation in handling control-flow-aware patterns, and leads to more opportunities for optimization. Our algorithm uses a feature-based filtering approach for fast pruning, and an elegant graph similarity metric called the generalized edit distance for measuring variations in CDFGs. Furthermore, our pattern recognition algorithm is applied to solve the area optimization problem in behavioral synthesis. Our experimental results show up to a 40% area reduction on a set of real-world benchmarks with a moderate 9% latency overhead, compared to synthesis results without pattern extractions; and up to a 30% area reduction, compared to the results using only data-flow patterns.
INTRODUCTION
Regularity plays a very important role in circuit design, which leads to fewer multiplexing logics, better resource sharing and better physical layout. We believe that efficiently handling regularity is one of the advantages of designers in manual RTL coding. However, the tight time-to-market requirements have made it increasingly difficult for designers to thoroughly analyze every design in a short time; therefore, pattern recognition, which helps designers automatically extract and utilize regularities in applications of design optimizations, has drawn wide interest at every level of circuit designs, from layout to behavioral synthesis.
Among all the pattern recognition algorithms, the graph-matching-based approach provides the best trade-off between efficacy and scalability. Recently, these approaches have been applied to solve problems in various domains, such as data mining, biochemistry, and behavioral synthesis [4, 8, 9, 10, 11, 13] . The work in [10] generates the candidate templates with a clustering algorithm based on occurrence frequency; the work in [9] proposed a polynomial time algorithm with respect to the input/output port number; another approach in [11] constructs subgraphs by combining single input cones. All three are designed for customized instruction generation for ASIP synthesis. Recently, a pattern-based behavioral synthesis framework was proposed in [4] . It uses the concept of graph edit distance to capture graph similarity. However, all the work mentioned above are constricted to extracting patterns only in data flow graphs, without considering higher level control flow structures. In [17] both control and data flow level differences have been considered for program partition, but their method merely compares signatures between two partitions without accurate similarity evaluation.
In fact, control flows, such as loops and branches, have big impacts on the final QoR in many cases and introduce more opportunities for optimizations. The challenges for pattern extraction in CDFGs are quite different from that in DFGs: first, CDFG graphs have two kinds of edgescontrol flow edges and data flow edges. Second, the matching between two collections of graphs needed in the related control-flow-aware pattern recognition process can not be directly solved by known pattern matching techniques which are applicable to only two graphs.
In this paper we develop a generalized control data flow pattern recognition algorithm to identify regularity in the behavioral specification and help users to reduce the resource usage of a certain design. To our best knowledge, this is the first automated approach to considering control-flow patterns for behavioral synthesis. In particular, the contributions of our approach include: (1) A general hierarchical approach which automatically recognizes control data flow patterns from programs. (2) A graph similarity evaluation metric to estimate general edit distance between CDFG patterns. (3) A two-level feature-based filtering scheme to reduce the amount of expensive similarity evaluations effectively. (4) An efficient and accurate pattern selection strategy which helps to select optimal pattern combinations from discovered patterns.
The remainder of the paper is organized as follows. Section II discusses preliminaries knowledge; Section III presents our CDFG pattern recognition algorithm and related techniques; pattern selection strategy is discussed in Section IV; Section V shows the experimental results, and Section VI is the conclusion part. In the control-flow-aware pattern mining problem, a CDFG basic block normally consists of multiple data flow graphs, as shown in Figure 1 , therefore the original edit distance [1] In our approach, the concept of generalized edit distance is exploited to describe the similarity degree of CDFGs. For example, the generalized edit distance between CDFGs in Figure 1 (a) and 1(b) is three, since relabeling the node "+" with "*" at three places in 1(a) can transform it to the graph in 1(b). GED provides an effective metric to capture the similarity of CDFG subgraphs with more than one data flow graphs inside, and also gives the minimum number of edit operations needed to share common structures among similar subgraphs. 
II

III. PATTERN RECOGNITION
This section introduces our generalized pattern recognition algorithm, which is used to discover patterns with similar control flow, as well as data flow structures. The similarity between two CDFG patterns is measured in terms of generalized edit distance, and a feature-based filtering technique is applied to reduce total number of GED computations.
A. CDFG Subgraph Labeling
In our approach each pattern is represented by a labeled graph. Notice that a DFG node can be labeled by the operation it represents, while each supernode in CFG is a basic block and it is not straightforward to find a label. Thus further similarity evaluation technique discussed in Section III(C) is used to decide whether two supernodes in CFG can be assigned the same label or not. Once the label of a basic block has been obtained, we will append it to the original DFG label of the nodes it contains. Therefore the label of a node in a given CDFG graph is a combination of its own DFG label and the label for the supernode (basic block) it belongs to. And distinct labels are assigned to data dependence edges according to their commutability.
B. Two-level Feature-based Filter
The computation of generalized edit distance is expensive, considering currently there is no polynomial-time algorithm for solving the graph isomorphism problem. Therefore filtering techniques have been adopted in the pattern recognition algorithm to reduce the number of GED calculations. In our approach, a signature called two-level feature vector is introduced for each CDFG subgraph, based on the work in [4] . However, the features discussed in [4] are constricted to data flow graph. In our control-flow-aware approach the feature information needs to be collected at both levels. Figure 2 shows the features for a DFG graph and a CFG graph, along with their feature vectors. The symbol ε denotes an empty node if the input or output of a node is not in this graph. As discussed in [4] , DFG nodes with multiple outputs will also be counted as ε node. Notice that we assume supernode 1 and 2 are similar, therefore the number of the second feature in (c) is two instead of one. The combination of DFV and CFV is used in our approach to capture structural properties of the original CDFG graph. The L 1 distance of two feature vectors, which is easy to compute, serves as an indicator of the similarity degree between the corresponding two graphs.
DEFINITION 4: In a CDFG graph G = (V G ,E G ), a DFG-feature is a subgraph S = {u, l ,r | u, l ,r∈ V op }, such that (l, u) and (r, u)∈E d ; a CFG-feature is a subgraph S =
(d) ε + + - - ε ε ε (a) (b) ε - + + + - ε ε ε DFV = (1,2,1) CFV = (1,2,1) bb0 bb1 bb2 bb1 bb3 bb3 ε ε ε (c) bb0 bb1 bb2 bb3 ε ε ε ε ε ε (d) ε + + - - ε ε ε (a) (b) ε - + + + - ε ε ε DFV = (1,2,1) CFV = (1,2,1)
Theorem 1 [4]. Let d(G 1 ; G 2 ) be the edit distance between two data flow graphs G 1 and G 2 , and DFV(G i ) be the data flow feature vector of
Theorem 1 from [4] tells us that given an edit distance limit l dist , the maximal number of possible DFG feature misses between two CDFG subgraphs is 4 *l dist . However, this reveals no information for the similarity degree in their control flow graphs. In order to develop an upper bound for the number of possible CFG feature misses, we propose a data structure called feature map matrix. Each row of the feature map matrix corresponds to a DFG feature, while each column corresponds to a CFG feature. Each entry records whether a DFG feature appears in a target CFG feature. Figure 3 shows the feature map matrix built for the CDFG in Figure  2 (c), with its CFG/DFG features. F i in Figure 3 (a) corresponds to the i th CFG feature in Figure 2 (d) and all the occurrences of a CFG feature will be recorded in the matrix. For instance, columns 3 and 4 are two occurrences of F 2 in the original CDFG. In Figure 3 (b) and 3(c), we can observe that DFG features f 1 and f 3 appear once in CFG feature F 1 , therefore the corresponding entries are set to one. Now, with the newly built feature map matrix M and the maximal number of allowed DFG feature misses 4*l dist , we can find out the upper bound for possible CFG feature misses l missUb . If a DFG feature f i is missing, all the CFG features containing f i in their inside data flow structures will be destroyed correspondingly, namely any column j satisfying M(i,j) = 1 needs to be removed from the matrix. Since the total number of missing DFG features is no more than 4*l dist , any set of missing CFG features can not cover more than 4*l dist rows. Therefore l missUb equals the maximal number of columns hit by 4*l dist rows in M -here "column j is hit by row i" means M(i,j) equals one.
Theorem 2. Let l dist be the given edit distance limit, and CFV(G i ) be the CFG feature vector of G i . If the generalized edit distance between G 1 and G 2 does not exceed l dist , we have: ||CFV(G 1 )-CFV(G 2 ) || 1 ≤ l missUb
Proof: Assume the generalized edit distance between G 1 and G 2 is less than l dist , Theorem 1 tells us that the DFV distance between G 1 and G 2 is no more than 4*l dist , therefore their CFV distance ||CFV(G 1 )-CFV(G 2 ) || 1 can not exceed l missUb , which is the maximal CFG feature misses brought by 4*l dist DFG feature misses.
To find the maximal number of columns hit by k rows is a classic max-cover problem, which has been proved to be NP-complete [18] . For the runtime trade-off, we can approximate the optimal solution by using a greedy algorithm. The greedy algorithm first selects a row that hits the largest number of columns, then removes this row and the columns covering it. This operation will continue until 4*l dist rows have been selected. l missGreed equals the number of columns removed by this algorithm.
Theorem 3. [14]
Let l missGreed and l missUB be the greedy and optimal solution for the max-cover problem, we have:
Theorem 3 from [14] shows the optimal solution is approximated by the greedy solution within a ratio of e/(e-1), therefore 1.59* l missGreed can be used as an estimation of l missUb . If the actual CFG feature misses between two graph candidates are greater than l missUb , we can safely assert that the difference in their control flow structure will result in a generalized edit distance larger than l dist , which means that further GED computation is not necessary. Experimental results show that on average only about 3 to 7 GED calculations are needed with hundreds of pattern candidates, which brings dramatic runtime speedup.
C. Similarity Evaluation
Given two CDFG subgraphs G 1 and G 2 which have passed the two-level feature-based filter, similarity evaluation will first be performed between their control flow structures, in which each basic block is treated as a supernode. After that, we will look into the data flow graphs inside to do further comparison. Figure 4 shows the data flow structure inside a supernode in one of our test cases -BH. We can observe that the data flow structure inside each supernode is not connected and consists of several separate subgraphs which we call subgraph fragments, marked with dashed rectangle in Figure 4 . This situation is very common. Here we do not allow edge insertion between two separate subgraph fragments, since it will introduce data dependency between two originally parallel data flow graphs, and make it hard to measure the latency of a pattern. Under this constraint we can approximate generalized edit distance as below:
Given two sets of subgraph fragments SF 1 (1,2, .., N) .
If the numbers of subgraph fragments of the two sets are different, we can introduce dummy graphs with 0 nodes and 0 edges to make it equal. To calculate GED between two sets of subgraph fragments, we construct a fragmentedit-distance matrix as follow: Figure 5 records the edit distance between the i th subgraph fragment in set 1 and the j th fragment in set 2. When we build M, the feature vector distance constraint ||DFV(fg 1i ) -DFV(fg 2j ) || 1 ≤ 4*l dist must be satisfied, otherwise, an infinite value will be assigned to the corresponding entry. In our experiments, we find that for most cases, the number of fragments with more than ten nodes is less than five; therefore, even though we need to compute edit distance between every two fragments, the cost is still acceptable.
With the fragment edit-distance matrix, our problem is to find an optimal index permutation (p 1 ,p 2 ,…,p N ) of (1,2,…,N) , so that the sum of edit distance between the i th fragment in the first set and p i th fragment in the second set is minimal, for i = 1 to N. This problem is similar to the assignment problem, and can be formulated as below: In the optimal solution, if the m th subgraph fragment in the first set is matched to the n th fragment in the second set, X mn will be set to 1, otherwise 0. Hopfield network has been developed to solve this problem efficiently in polynomial time [2] .
D. Pattern Recognition Algorithm
With all the important techniques discussed, our pattern recognition algorithm is presented in this section. The algorithm does pattern extraction in both data flow graphs and control flow graphs, and the two-level recognition process is performed in an interweaving way.
Our CDFG pattern recognition process discovers patterns in a breadth-first order. All the subgraphs of size k in a given CDFG are enumerated, among which the non-similar subgraphs will be directly removed by our proposed two-level feature-based filter. Then the remaining subgraph pairs will perform GED calculation to measure their similarity accurately. The pseudo code of the algorithm is shown in Algorithm 1.
Our algorithm iteratively finds patterns of size k starting from k = 1. At step k + 1, all the size k CDFG pattern instances are extended by one supernode using the subgraph enumeration techniques discussed in Section III(A). If a subgraph s k is not a pattern instance of a certain pattern P at step k, it is impossible to be a subgraph of another pattern instance larger than k, which means we don't need to further extend it.
Lines 15 to 25 show the pruning process in Algorithm 1. After a new subgraph s k+1 is generated, it will be compared to the existing patterns by calculating the CFG level edit distance between itself and existing patterns. First, the CFV of a subgraph is calculated and used as a signature to find the patterns which have similar control flow structures. After getting the list of possible pattern candidates, the generalized edit distances are calculated by the techniques discussed in Section III(C). If s k+1 matches a pattern P, it will be added to the pattern instance list of P, otherwise a new pattern will be generated based on s k+1 . When all size k + 1 subgraphs are processed, each newly generated pattern will be examined to see whether it satisfies the significance limit, which is measured by the pattern size times the number of its instances. If not, the pattern itself and its instances will be removed together. Create a new pattern based on s k+1 , add to P; 24:
Add s k+1 to S k+1 ; 25: for all new pattern P i ∈P do 26:
if |INST(P i )|*|P i | < l sig then 27:
Remove P i from P; 28:
Remove INST(P i ) from S k+1 ;
IV. PATTERN SELECTION
Given a set of patterns discovered by the pattern recognition process, pattern selection algorithms attempt to find an appropriate set of patterns with the largest gain.
is the number of instances of P and area overhead is the area overhead brought by newly introduced multiplexors when resources are shared among similar patterns. The second term measures the "flatness" of P -"flat" means the critical path of P is small compared to the number of nodes in P. The variable α is a user-given parameter to trade off between resource reduction and latency overhead. Now, with a given set of pattern candidates, the problem is how to find a subset of non-conflicting patterns to maximize total gain. Here "non-conflicting" means non-overlapping, and no loop will be in control flow graph formed after selecting a certain set of patterns.
For example, given a CDFG graph G which consists of seven supernodes, indexed from 0 to 6, assume our pattern recognition algorithm finds three patterns P 0 , P 1 and P 2 in G. The corresponding pattern groups are denoted by { P 0 | 1 ; 2 }, { P 1 | 0 1 ; 1 3 ; 4 5 } and { P 2 | 3 4 ; 5 6 }. That is, pattern P 0 has two 1-node instances, and the node index for each instance is 1 and 2; P 1 has three 2-node instances, and the node indices for each instance is 0 and 1, 1 and 3, 4 and 5; P 2 has two 2-node instances with node indices 3,4 and 5,6. Assume no data flow loops exist among the three patterns. To exclude the overlapping case, each pattern group first will be partitioned to eliminate intrinsic instance overlapping. In the example above, the first and second instance of pattern P 1 have a common node 1, thus the corresponding pattern group needs to be partitioned. After that we get four new pattern groups: { P 0 | 1 ; 2 }, { P 1 | 0 1 ; 4 5 }, { P 2 | 1 3 ; 4 5 }, { P 3 | 3 4 ; 5 6 }. .
A CNF representation F(p 0 , … , p 3 ) is used in our approach to describe the non-overlapping constraint among pattern group candidates. If p 0 is set to 1, the corresponding pattern P 0 will be selected; therefore those pattern groups having instances overlapping with the instances of P 0 will be invalidated. For example, pattern group 0 has two one-node instances; these nodes also appear in pattern group 1 and 2, then its non-overlapping constraint can be represented by setting f 0 = (¬p 0 + ¬ p 1 2 • f 3 = 1, and our objective is to maximize total area reduction. With this formulation, our problem can be reduced to a Binate Covering problem, and the bounding technique in [16] can be used to compute the optimal solution. The pattern recognition framework can be applied to many practical problems, including ASIP custom instruction set selection, and resource sharing in high-level synthesis, etc. Here, we evaluate the performance of our approach in one of its applications -FPGA area reduction in high level synthesis.
V. EXPERIMENTAL RESULTS
SSDM
A. Experimental Setup
The pattern-based synthesis flow is implemented in our behavioral synthesis system [5] . The whole design flow is shown in Figure 6 . Our synthesis tool takes high-level languages and parses them into control data flow graphs. The GMT toolkit [3] is used for graph edit distance calculation. The synthesis engine will perform the pattern-based synthesis flow to reduce resource usage with certain design constraints. Xilinx Virtex-4 FPGA and ISE 9.1 tool [15] are used in our experiments.
Our test cases include a set of real-life programs: IDCT, SYNFLIT, BH, BLKSORT, HEAP, and LEXTREE. All test cases contain common control flows, and the last three cases are from the SPEC2006 benchmark.
B. Pattern Recognition Results
In this section we discuss the pattern recognition results. Figure 7 shows the two selected pattern instances in one of our test cases IDCT. There are two similar for loops in IDCT, and the corresponding pattern instances include basic blocks 1,2 and basic blocks 3,4 respectively. The effectiveness of our proposed pruning techniques using FV is shown in Table I and we can observe a substantial decrease in the number of average GED computations. In Table I , #Line is the size of each test case, #Pattern and #Inst is the number of patterns and total instances respectively, #Avg. Calc is the average number of GED calculations needed before a subgraph matches with a certain pattern, and #MAX is the maximal size of patterns in terms of DFG nodes. 
C. Resource Reduction Results
This section shows the pattern-based FPGA resource reduction results on the test cases mentioned before. Our work is compared to a traditional behavioral synthesis flow [6, 7] , and the DFG pattern-based synthesis result in [4] . Table II shows the QoR of our proposed pattern-based synthesis algorithm compared to the other two approaches. In Table II , the second, third and fifth columns are the synthesis results for the number of registers used without pattern-based technique [6, 7] , with a DFG pattern-based technique [4] , and with CDFG pattern-based technique, respectively. Similarly, columns 7 to 11 list the amount and comparison of logic elements usage.
Overall, our pattern-based synthesis flow can achieve a 24% resource reduction on average over the traditional behavioral synthesis flow. For most test cases, CDFG pattern extraction outperforms work in [4] , which can not efficiently deal with sharing at the basic block level. The performance improvement is especially substantial in test cases BLKSORT, where similar patterns are distributed in separate basic blocks instead of the same one. The average runtime overhead is less than one minute.
We also use a heuristic method to balance latency trade-off. If the latency increases beyond a user-given limit with the current pattern selection result, we will recursively "relax" the area reduction by partially recovering shared resources until the latency falls into the acceptable region. However, in our experiments, latency doesn't show drastic changes after our pattern algorithm is applied. There is about 9% latency overhead and 3.5% clock period increase on average.
VI.
CONCLUSIONS In this paper we present a generalized control-flow aware pattern recognition algorithm which can efficiently extract patterns from behavioral specifications. To our knowledge, this is the first published work that can identify approximate patterns in control data flow graphs. Furthermore, the pattern recognition framework is evaluated in resource reduction problem and shows a 24% improvement on average in our experiments. 
VII. ACKNOWLEDGEMENT
