Abstract-Productivity for digital circuit design is being outpaced by the rate at which silicon is growing. Complex designs take a significant amount of engineering hours to complete in both ASICs and FPGAs. Design reuse can potentially decrease cost and increase design productivity. The central thesis of this paper is that design productivity could be enhanced by assisting the designer in discovering archived designs in order to complete a reference circuit. An overview analyzing different methods for characterizing and comparing digital circuits is provided in order to suggest candidate circuits that engineers can reuse. A new design entry methodology is also proposed. Several of these methods are implemented, modified, and compared to show the feasibility of utilizing this work to potentially increase overall productivity for hardware design.
I. INTRODUCTION
FPGAs have become a targeted platform for many high performance computing applications. In spite of this, a significant limitation in using FPGAs is the design process. As hardware circuits become more complex, design time can increase significantly, outpacing productivity [1] . The design productivity gap [2] shows that design capabilities are unable to keep up with the doubling of silicon density every two years according to Moore's Law. One solution to increase productivity is to reuse existing digital circuits, or intellectual property (IP) cores. Nelson et al. [2] suggested that, depending on the fraction of the design that is being reused and the overhead of reusing the design, the use of existing hardware can lead to a significant increase in FPGA productivity.
Design reuse in the hardware community has not gained widespread acceptance. Reasons include obsolescence of different platforms, lack of standards, overhead of designing a module to be reusable, incompatible databases [2] , incomplete and inconsistent documentations, performance [3] , etc. Even if reuse is seen in practice, the user would have to allocate time to search through sources, relevant IP cores, and documentations which may or may not be organized in which designs are easily found. There are challenges in the domain of design entry in FPGA tools. Current design environments for FPGAs lack interactive features that could potentially assist and promote design reuse. Even though standards such as IP-XACT are available to help facilitate the reuse of IP cores across various sources, many designers are unwilling to conform to these standards because of the overhead and complexity associated with them. For example, the XML data format of IP-XACT is difficult to read and modify without the support of additional tools [4] . If a majority of the disadvantages are made transparent to the user during the design phase, reuse of existing hardware can become a more appealing solution for increasing overall productivity.
There has been an abundance of effort involved in increasing the reusability of IP cores for FPGAs. The primary goal of OpenFPGA's CoreLib is to create a standard for hardware libraries in order to promote interoperability so that existing cores can be seamlessly integrated into FPGA tools. Vendors provide many pre-designed hardware modules that are optimized for their specific products. For example, Altera provides a library of parameterized modules (LPM) whose purpose is to provide efficient designs of specific functions that are independent of the technology. Altera [5] This paper explores a new methodology for hardware design entry. The idea is to have a design environment where reusing existing hardware is essentially integrated into the design process, requiring little to no effort from the user. As the user designs a circuit, the proposed tool will continuously monitor the circuit design during the entry process, performing a vast number of comparisons between the emerging design and an archive of existing designs. Candidate designs are suggested and ranked based on similarity. The idea of comparing two circuits does not necessarily imply that the search looks for an exact match; instead, a similarity metric decides how similar two circuits are. Because the tool automatically presents similar circuit alternatives, the designer can remain in the scope of the project instead of relying on external documentations, and as a result, suggested circuits are more likely to be reused. Therefore, the best way of improving the design process is to turn it into a discovery process. Once similar circuits are discovered, detailed documentation can be viewed and analyzed to see if the candidate circuits fit the designer's requirements. If the circuit is of interest, the design is readily available to the designer to reuse. The next section explores different methods regarding a back-end that determines the reusability of circuits in order to provide a real-time environment in which reusing existing hardware is essentially integrated into the design process.
II. BACKGROUND
Determining suitable existing hardware for reuse requires a system that is able to compare a design against a multitude of patterns in order to search for a match. A common way to represent a circuit is to use a circuit's netlist and represent it as a graph, where the vertices represent the logic components and the edges represent the wires connecting the components together. Previous methods for circuit matching focused on whether there exists an exact match between two circuits, used primarily for LVS checking [7] , [8] . Whitham et. al. [9] searched for circuits in a repository that are similar to a reference for educational purposes. However, their definition of similar refers to circuits in the repository that are either a sub-circuit, super-circuit, or an exact match to the reference. Another method of circuit matching include comparing the function of two designs such as combinational equivalence checking [10] . This paper focuses on using structural matching for evaluating the similarity between circuits. Several different algorithms for extracting structural data from the graphical representations for comparison were explored.
1) Subgraph Isomorphism:
Subgraph isomorphism has been commonly used for comparing two different circuits [9] , [8] , [7] . Given two graphs [9] . In other words, this determines if the smaller graph, G 2 , is contained with the larger graph G 1 . Because of this, subgraph isomorphism is too restricting and only circuits that make up or are made up of the reference are returned.
2) Maximum Common Subgraph: Maximum common subgraph (MCS) is a type of subgraph isomorphism in which the largest subgraph that is common to two given input graphs is determined [11] . The MCS problem can be described as a maximum clique problem. In order to find the MCS, the product graph of the two input graphs is needed. The product graph G of the two input graphs shows the possible compatibility between the vertices and edges of the two graphs. With the product graph, the MCS problem is then determining the largest clique of G . The clique problem determines if there is a complete subgraph that exists in G of size k, where k is the largest complete subgraph. By finding the largest clique in G , the mapping of the common edges and nodes between the input graphs can be determined, resulting in the maximum common subgraph.
3) Decomposition Subgraph Isomorphism: The previous two methods only compare one pattern circuit at a time. The idea of decomposition subgraph isomorphism (DSI) is where the pattern graphs are recursively decomposed by a decomposer into smaller graphs until only one vertex remains in an offline step [12] . Patterns with similar subgraphs are represented once by reusing the same subgraph in the decomposition. The entire decomposition hierarchy of the pattern database is stored in a decomposition tree structure. After every pattern graph has been decomposed, the decomposition tree can be used to determine if a subgraph exists in the input graph. Subgraph detection is done by trying to combine and build the circuit from the bottom up. Any patterns with a mapping to the input are a subgraph of the input graph. 
III. MATCHER IMPLEMENTATION
This section explains the details of the three main different graph comparison algorithms implemented. The reference circuit is designed using a front-end tool. The circuit netlist is imported onto the back-end where the similarity between the reference circuit and each of the pattern circuits will be determined. Overall back-end was designed to be extensible to various front-end design environments. Results of the comparators will be displayed in an interactive window so that the user can easily compare and explore the similar designs. Figure 2 shows the high-level system diagram.
A. System Overview
The overall system setup can be seen in Figure 3 . Xilinx ISE is used as the front-end design environment. From RTL, a netlist is generated and passed into the back-end tool for comparison. After the matchers complete the search to find circuits similar to the input, the results are passed to the output to be displayed for the user. The matches returned are ranked based on how similar the pattern circuit and the input are, with the best matched circuit listed out front. In order to provide a better sense of similarity, a graphical representation of the results was explored. The game engine Unity was used to develop an interactive interface. By having a visual aid and an interactive interface, data is compressed and expressed in a manner where the user will be able to identify and compare circuits quickly and easily.
B. Custom Subgraph Isomorphism
The implementation of a custom subgraph isomorphism (CSI) takes from [8] and [13] . First, a candidate vector is determined between two circuits, G 1 and G 2 . The candidate vector is a list of possible matches between the vertices in G 1 and G 2 , or a candidate pair. The candidate pairs are determined by the most uncommon logic component between the two circuits.
Front-End
Interactive Display
Back-End Using each candidate pair as a starting point, both G 1 and G 2 can be traversed. Traversal is done recursively so that if a mismatch is found, the matcher can backtrack to a valid state and check the next possible node for a match. At each level of traversal, the type of the logic component is compared. If the component of the nodes in question in G 1 and G 2 are identical, then the match counter is incremented; the node is marked and the next node is then compared.
Each matching node is marked so that if a feedback loop exists, it does not pair a vertex with an existing match. The next nodes to be searched are determined by looking at all the connected outputs. If there is more than one possible match, the matcher peeks at the next level down and decides which path to take so that the input and pattern match. If the matcher reaches a dead end, it backtracks and attempts to find a valid matching pair to continue searching. The matcher continuously scans the nodes until the match counter reaches the number of nodes in the smaller graph. This indicates that all the nodes in the smaller graph were successfully mapped to the nodes on the other graph and that a subgraph exists.
1) Similarity:
The similarity of the circuit can be determined by calculating the ratio of the number of nodes in the best possible match between the input and pattern and the size of the pattern graph. The equation to calculate the similarity is shown in Equation 1.
C. Maximum Common Subgraph
Maximum common subgraph (MCS) allows for a more inexact form of matching circuits by determining the largest circuit common between two graphs G 1 and G 2 . The algorithm implemented is based on [14] . MCS focuses on edges that are compatible between G 1 and G 2 . In other words, if two edges have the same source and destination type, then it is said that the two edges are compatible. These compatibilities are used to form the compatibility graph. By finding the modular product of G 1 and G 2 , the relationship between the nodes of the compatibility graph can be determined.
The modular product of two graphs is used to determine the relationship of the nodes in the compatibility graph. Two vertices in the compatibility graph, cv 1 and cv 2 , are adjacent if edge e 1 of cv 1 is incident on the same vertex as e 1 of cv 2 and e 2 of cv 1 is incident on the same vertex as e 2 of cv 2 , or both e 1 and e 2 of cv 1 are not incident to e 1 and e 2 of cv 2 , respectively.
After the modular product of G 1 and G 2 has been determined, the largest cliques found are the largest subgraphs that are common to both G 1 and G 2 . The clique detection algorithm implemented is a variation of a recursive backtracking algorithm called Bron and Kerbosch (BK-algorithm) in [11] .
1) Similarity: The largest common circuit of the input and pattern circuit can be used to determine a similarity metric. The similarity is calculated by finding the ratio between the size of the largest common circuit and the size of the input circuit. The same ratio is calculated again but with the size of the pattern circuit. The larger ratio of the two is chosen as the similarity between the input and the pattern circuit. The equation is shown below in Equation 2.
D. Decomposition
The CSI and MCS algorithms both work on two graphs at a time. Therefore, to determine a similar reusable design from a database of patterns, the input circuit will have to be compared individually with each pattern. The idea of decomposition subgraph isomorphsim (DSI) [12] is that the entire database of patterns is decomposed, such that circuits with similar sub-circuits can be represented and searched through once, leading to a more compact representation. The decomposition of the pattern database is a one-time incremental procedure that is done offline. Moreover, additional patterns can be added without having to re-decompose the entire database. The first step to decomposition is the partitioning of the circuits.
The partitioner decomposes or partitions the pattern into two connected graphs. If the graphs are disconnected, every possible combination that can be made with the disconnected graphs has to be mapped, increasing the search space. A random vertex is chosen as a starting point. The circuit is then traversed until the number of vertices traversed is equal to half of the total number of vertices in the circuit. Since the first partition was created by traversal, the sub-circuit is connected. To ensure that the second partition is connected, the second partition is traversed from a starting point. All the nodes that are not connected are moved to the first partition. This can be When the patterns in the database are all decomposed, the comparison between an input circuit and the decomposition can take place. An example is shown in Figure 4 . There are three states the decomposition can reside in: unsolved, dead, or alive. Each decomposition node is initially labeled unsolved. The search first focuses on the decompositions that consist of a single vertex. For each of the decompositions with a single vertex, the vertices with the same component type in the input graph are added to a list of possible matches. An unsolved decomposition node contains a possible match if its two children nodes are both marked alive. When no more unsolved decomposition nodes can be combined, all the decompositions that are marked alive are subgraphs of the input graph.
1) Similarity: A similarity metric can be derived from the result of the matching. From each full pattern graph in the decomposition tree, the distance from the current position to the first node that is alive can be used as the distance metric. Furthermore, the number of matching nodes and the number of edges missing are used to further refine the similarity metric. Equation 3 shows the similarity metric used.
IV. RESULTS AND ANALYSIS
This section discusses the results obtained from the implemented matching algorithms with respect to digital circuits. The accuracy and performance of each algorithm were analyzed. The core of the matcher was implemented in C++ in Ubuntu 12.04 on a Dell Vostro with a 2.8 GHz Intel Core2 Duo processor and 2.9 GB of RAM.
A custom benchmark was constructed and used for initial testing of accuracy and functionality. The initial benchmark contains about 20 circuits ranging from 8 to 50 components. The circuits designed consist of logic gates, arithmetic operators, and counters. To test scalability on a larger scale, different benchmarks were used. The ISCAS models of the International Workshop for Logic Synthesis (IWLS) 2005 benchmarks were used as a larger dataset [15] . Around 30 circuits ranging from about 20 nodes to over several thousand nodes were used to test the scalability and accuracy of the different matchers.
A. Accuracy
The dataset used to test the accuracy of the matchers was the custom benchmark in order to see and compare circuits that were known to have matches so that the results could be accurately analyzed. The custom benchmark was used with a 1-bit counter circuit as the input circuit. Table I displays the similarity metric returned between a 1-bit counter and each of patterns from the three different matchers.
1) CSI:
CSI has the capability to determine the similarity between two circuits as opposed to traditional subgraph isomorphism algorithms. The similarity metric the CSI algorithm returned is a pretty accurate representation of the similarity between the input and pattern circuits. Since the AOI circuit has a completely different structure, the similarity measure should be fairly low. The kogge-stone adder and counter have a similar feature in that they both contain adders. Therefore, the adders should have a higher similarity metric compared with AOI as seen in Table I .
2) MCS: Compared with CSI, MCS provides more data on the circuits that did not return exact matches. For example, the two 4-bit adders returned a possible match of seven input circuits. However, it is not a complete match since the similarity between the input and pattern is only 66%. Counters have adders in them, or more specifically, XOR gates. Therefore, based on the number of potential XOR gates found, the 4-bit adders contain seven possible counter circuits.
3) DSI: One main problem with DSI is that decomposition is primarily a subgraph isomorphism matcher. With CSI, the two graphs in question can be passed in based on which graph is larger. Therefore, super-circuits can also be determined. However, because DSI decomposes the patterns as an offline step, only exact subgraph matches can be found when using the decomposition tree during matching. Trying to find exact super-circuits will not work as intended. For the CSI algorithm, the size of the two circuits is checked and passed to the matcher based on their size. Next, the similarity metric is used to try to detect supercircuits of the input. From Table I , exact matches for the subgraph were found along with the number of possible instances. There is no way to definitively determine if a supercircuit exists or not given the results from DSI; however, the similarity can be used to suggest if a circuit is a possible super-circuit. For example, the 2-bit counter has the highest similarity score among the other circuits and is indeed a supercircuit of the input. At the same time, false positives may occur as well, such as the AOI gate, even though the structure is different from a counter.
B. Performance
The performance data obtained was averaged after five runs. The custom benchmark was first used. For small circuits of probably less than 50 nodes, MCS performed exceptionally well; however, as the number of components increased to more than 50, the execution time increased significantly compared to the other matchers. Execution of MCS was terminated early. The other two matchers performed well as the input graph size increased. Results can be seen in Figure 5 .
To see how far the system can be pushed, models from the ISCAS benchmark were used. The input circuit used was a slightly modified version of s5378, containing over a thousand logic gates. The entire database contains forty circuits ranging from less than 100 gates to over 1000 gates. The tool was able to detect the original s5378 as a matching design.
In order to test how well the program scales as the number of patterns in the database grows, CSI and DSI were tested with varying database sizes of 5, 10, 20, and 40 pattern circuits. MCS was not tested with the ISCAS benchmark due to the significant execution time with increased circuit size.
From the results shown in Figure 6 , decomposition scales extremely well compared to the CSI matcher. Again, the primary reason is that similar sub-circuits are represented once wherever possible, and the tree-like structure of the decomposition allows for an efficient search for similarities. However, DSI uses large amounts of memory due to the fact that it decomposes the entire decomposition tree ahead of time. Furthermore, as DSI builds the circuits from the bottom up, it saves each state and therefore starts hitting it's limit as the number of pattern circuits increase.
V. CONCLUSION
Circuit designers can reduce design time significantly by not having to redesign circuits that already exist. The IP discovery system uses different matching techniques to find similarities between digital circuits with an emphasis on design reuse. By reusing existing hardware, the designers can focus more on the application rather than the verification of existing hardware. The main focus of this paper is to introduce a new methodology of design entry with the potential to increase FPGA productivity by discovering reusable designs. Preliminary results are given in terms of using structural matching as a possible form of comparison.
Out of the different circuit matching techniques that were explored, the DSI test is the best way to determine similar circuits in terms of performance and scalability. It allows an effective and efficient way of searching through a large database of circuits. MCS execution does not scale well at all for large circuits, but provides a detailed analysis on the similarity between two circuits. If some sort of hierarchy is inferred, the overall circuit can be simplified, which can lead to a more efficient matches in general. The CSI algorithm scales linearly based on the number of circuits in the database; however, it is seen that the performance of each match does not necessarily depend on the size of the circuit. Because similar sub-circuits of each pattern are represented once, DSI scales better than any of the other methods, but requires significant amounts of memory. Unfortunately, there is no easy way to determine if the input circuit is a sub-circuit of those in the database; however, the similarity measure DSI returns is able to provide some insight as to which pattern could be a possible super-circuit, but does not indicate if one absolutely exists.
A. Process Enhancements
Structural matching of circuits have various limitations such as the need for explicit definitions of pattern circuit for structurally different but functionally equivalent circuits. Attempting to match a circuit functionally will allow one specific function to be defined rather than every possible structure to be described as a pattern circuit. The end goal is not only to determine a similar circuit structurally or functionally but to try and predict the end behavior of the circuit based on the current state of the design. In other words, we want to be able to predict the end circuit or component the user is designing.
Hierarchy can be applied as well. By doing so, the overall netlist of the circuit is smaller resulting in a quicker overall search. Circuit hierarchy can be inferred by simplifying the circuit by compressing patterns with exact matches into a black box. However, this would mean that the library components and naming would have to be identical throughout the hierarchy.
The matching process can start at the RTL level so that the structure of the circuit does not have to be extracted from the netlist. In addition, the back-end can be parallelized due to the nature of the algorithms. Different similarity metrics can be explored, and support for super-graph matching for the DSI algorithm can be investigated.
B. Extended Applications
The overall system can be extended to a wider variety of applications. Finding similar circuits and then providing multiple implementations of the circuit could provide optimization suggestions. Similar to code-complete for software, the tool could also automatically complete a given hardware circuit. This could be further refined as an autocorrect function. In addition, the tool could be applied to reverse engineering. The system could also be used to build reliable hardware libraries by finding circuits similar in structure and to help broaden the community of hardware developers as designers contribute and learn as a whole.
