Introduction

Motivation
Systems are increasingly being constructed from off-the-shelf components acquired through a globally distributed supply chain. There is a rising concern over the trustworthiness of these components, especially when used in mission-critical applications. The possibility that, a small but malicious modification could compromise the entire integrity, security and reliability of an integrated circuit (IC), is becoming a pressing concern. For example, the Integrity and Reliability of Integrated Circuits (IRIS) program from DARPA [3] seeks to develop techniques for deriving functions of digital, analog and mixed-signal ICs from limited operational specifications, as a way to ensure the overall integrity of a system possibly constructed using multiple third-party components. Such malicious modifications, commonly known as hardware trojans (HTs), can provide footholds for software based attacks, where the attacks are orchestrated by colluding software [18] . They can also provide side-channels for leaking sensitive information [23] , or simply subvert the operation of the system under special conditions (e.g. special instruction sequences that trigger the trojan) [19] .
In general, modifications can be introduced at different stages of the design and fabrication flow, and efforts specifically targeted at each stage have been made to mitigate these threats [36] . In this report, we consider the context where a logic alteration was made either at the Register Transfer Level (RTL) or at the gate level. At the RTL level especially, one can inject malicious behavior that can undermine the correct operation of the entire system with only a few lines of code written in a Hardware Description Language (HDL). Such design-time modifications are also difficiult to detect due to possible obfuscation [34] and small physical footprints [36] . Since they may be activated only under very specific conditions, they are unlikely to be triggered and detected in simulation or functional tests. Even if suspicion is raised during the operation or inspection of a system, currently there is no way to zoom into a particular portion (say the ALU unit) of a system by simply looking at the overall gate-level netilist. Most high-level structures such as word declaration, modularization, function separation are lost once the design has undergone logic synthesis, thus making it extremely difficult to perform targeted function search in the flattened netlist. In this report, we present a systematic framework for automatically deriving high-level structures from the gate-level netlist of a digitial circuit. First, we formally define the problem of reverse engineering a bit-level description into high-level description (henceforth referred to as REHLD) of a digital circuit. We then present several techniques for solving the REHLD problem.
Reverse engineering, if successful, can bring significant benefits. For example, identifying a word-level datapath allows the user to navigate the netlist at a higher level. Also, such word-level datapaths allow automatic graph-based inference techniques to be applied without complication from low-level details (e.g. the function that a particular gate implements). On the other hand, identifying the function (even partially) of a block of gates means one can now perform more comprehensive analyses to it in an isolated fashion. HT detection techniques, such as the ones presented in [18] , are largely based on simulating the netlist. Therefore, such dynamic analysis techniques can be synergetically applied with the static techniques presented in this report, and in a more scalable manner. In addition to complementing existing HT and counterfeit detection techniques, reverse engineering can be applied directly to find IP violation, which is another rising concern in the semiconductor industry [14] .
Algorithmic reverse engineering of an unstructured netlist, however, can be particularly challenging due to the following reasons.
• Lack of structure. A flattened netlist does not contain any of the hierarchy or module information that its RTL counterpart would typically have. In addition, optimization techniques such as multi-level logic minimization, technology mapping and retiming further destroy high-level structures in the netlist, and can result in overlapping functional blocks and gate sharing.
• Large functional space. Designs can vary from one to another and the space of possible functions that a circuit may implement quickly becomes intractable. This problem is further exacerbated by the assumption that only scanty documentation is given, which means well-defined and well-specified reference models are not available, thereby making it especially difficult to reason about the function of a netlist.
• Large implementation space. In case a high-level description of a function is given, e.g., the netlist contains an 8-bit multiplier, the space of all possible implementations for this particular function can be again enormous. The implication is that structural matching techniques quickly become impractical as the size of the target structure grows.
In this report, we present a systematic framework for automatically deriving high-level structures from the gate-level netlist of a digitial circuit, and techniques that specifically address each of these challenges. To cope with the large functional space, we crafted a library that contains more than a thousand commonly used components. Leveraging formal verification techniques such as symbolic evaluation, model checking and equivalence checking, we further address the problem of a large implementation space per function, and recover structure from an unstructured netlist.
Contributions
To summarize, we make the following contributions:
• We present a formal definition and a systematic framework for the netlist reverse engineering problem (REHLD), based on the notion of matching against abstract components.
• We describe a two-stage algorithm for deriving word-level datapaths in a bit-level netlist, which first finds candidate words and then propagates them using symbolic evaluation.
• We present an approach for matching an unknown sub-netlist against an abstract component library based on mining behavioral patterns from simulation or execution traces, followed by model checking.
• We present a second approach for matching an unknown sub-netlist against a known circuit using (conditional) equivalence checking -formulating the problem as solving a Quantified Boolean Formula (QBF) (as opposed to SAT in traditional equivalence checking). This formulation addresses the challenge of gate sharing in an optimized flattened netlist.
• We apply our approach to a collection of open-source designs. In particular, we demonstrate the effectiveness of our approach on netlists that contain up to 400,000 IBM 12SOI cells.
The rest of the report is organized as follows. We first describe the assumptions we make in this work in Section 1.3. Aterwards, we survey related work in Section 1.4, with a focus on algorithmic reverse engineering of netlists. Afterwards, we review background materials and formally define the REHLD problem. We then present techniques that try to address this problem from two angles -finding word-level datapaths in Section 2.3 and identifying functional modules in Section 2.4. Experimental results on applying these techniques to a number of circuit benchmarks are reported in Section 2.5. Finally, we conclude in Chapter 3.
The work described in this report is based on [22] and [21] .
Assumptions
In this report, we work with the following assumptions for reverse engineeirng a gate-level netlist.
• RTL-level description of the original design is not available.
• Only limited operational specifications (e.g., a datasheet describing the high-level functionallities of a chip) are given.
• Micro-architectural and design-specific information is also not available.
• Information about how individual wires should be grouped together is only known at the primary input and output.
• The cell library for which the netlist is synthesized from is given.
Related Work
Digital circuit designers usually proceed from a high-level description to a gate-level netlist, and then to a physical layout and mask; it is rare to proceed in the opposite direction. However, as noted in Sec. 1.1, the study of reverse engineering of digital circuits has been gaining importance in recent years. We review some of the closely related work in this section. For a general survey on the taxonomy and detection techniques of hardware trojans, we point the readers to [36] . Hansen et al. [17] present a study of reverse engineering the well-known ISCAS-85 combinational circuits. They present several strategies, mostly manual, to reverse engineer circuit functionality from a gate-level schematic. Some of these include looking for common library components, repeated structures, computing truth tables of small blocks, and identifying bus structures and control signals. In this report, we provide a formal definition of the reverse engineering problem and present several automatic techniques for solving it. Particularly, our techniques reason with components that are at a much higher level of abstraction than those suggested by Hansen et al. Moreover, we demonstrate that they are effective in dealing with netlists that are several orders of magnitude larger than the ones they considered.
More recently, Subramanyan et al. [35] propose several techniques to identify high-level components such as register files, counters, adders and subtracters. Our work complements well with their solution. In fact, words generated by their bitslice aggregation algorithm are used as candidate words in our word propagation technique to infer more words. Our framework also provides additional features, such as the capability of navigating the netlist at the word level and that of handling gate sharing in module identification.
Our work does not address reverse engineering of arbitrary finite-state machine functionality. While any sequential circuit can be trivially viewed as a monolithic FSM, the challenge is to be able to decompose that FSM into a set of smaller FSMs, each of which performs a distinguishable function, thus making the resulting high-level description easier for a human to understand. The recent work by Shi et al. [33] is a step in this direction.
A key component of our work is to find the input-output signal correspondence between an abstract component and a block in the unknown circuit. There is not very much prior work in this area. Mohnke and Malik [26] present a BDD-based approach for comparing two combinational circuits whose input correspondence is not known apriori. The authors also extended their idea to finding latch correspondence for sequential circuits, by considering the combinational circuit computing the next-state function [25] ; this does not address our problem, though, as sequential equivalence is required between the two circuits. Our approach, based on behavioral pattern matching followed by property checking, is a novel attempt in this direction.
Torrance and James [38] describe the practice of reverse engineering semiconductor-based products. Their approach includes product tear-downs (stripping packaging and disassembling the unit), "system-level analysis" (identifying components on a board and performing functional analysis through probing), process analysis, and circuit extraction (deriving a schematic from a stripped IC). Our work is complementary to this effort. Once a gate-level schematic is derived, our techniques can then be used to identify high-level components within the schematic.
Our technique addresses the REHLD problem for a system integrator who has not designed the circuit being reverse-engineered, but instead needs to verify its functionality prior to integration. We do not address the problem of untrusted manufacturing and IC piracy, where the designer is trusted, which can be tackled by techniques such as EPIC [31] . Our technique is complementary to other recent work on malicious trojan circuit detection (e.g., [18, 34] ). We do not seek to find trojans, instead focusing on detecting if a sub-circuit exhibits correct behavior which is captured by a set of logical specifications; if our approach reports that a sub-circuit matches an abstract component, it is guaranteed to do so due to the use of formal verification. In addition, if the sub-circuit violates some security-related specification, our approach will report that as well.
Reverse Engineering Gate-Level Netlists 2.1 Solution Overview angles. First, we aim to uncover word-level dataflows from the bit-level netlist. Second, we try to associate individual parts of the netlist to components in a predefined library with known functionalities. The composition of these two is eventually produced as a high-level description of the netlist. In this report, we only partially address the problem of module identification -dividing the netlist into candidate modules where each of them will be mapped to a component in the library. We plan to pursue this direction further in future work.
Word-Level Datapath Extraction
A word is simply a bounded array of bits. A word-level dataflow is then a directed graph summarizing how one word is propagated to another word. Such word-level information is common at the RTL level. For example, below is a snippet of Verilog codes showing some simple word-level operations. However, in a post-synthesis netlist, such information is lost and this is the first problem we will tackle in this report. Given a bit-level netlist, the first stage identifies candidate words. We employ two techniques for solving this problem, one based on bitslice aggregation [35] and the other based on a notion called shapehashing. The first technique uses functional matching while the second one uses structural information to group "equivalent" wires into words. We discuss these techniques in detail in Section 2.3.1. Starting with the candidate words and other known words (such as ones at the primary input and output), the second stage infers more words by iteratively propagating them across gates in the netlist, by leveraging ideas in symbolic evaluation. We describe this in detail in Section 2.3.2. These words can also be used as boundaries to form a netlist slice, which naturally provides a divide-and-conquer strategy to reason about separate parts of the overall netlist.
Function Identification
The second collection of techniques we present in this report is known as function identification. Essentially, given a slice of the netlist, the task is to identify the functionality that this slice implements. We reduce this problem to a verification problem -checking whether the netlist slice is functionally equivalent to a known component. To deal with the large space of possible functions, we created a database consisting of more than a thousand circuit components inspired by common design patterns, ranging from simple arithmetic circuits such as adders to communication protocols such as I 2 C. This component library by no means captures all possible functionality a netlist may contain. However, it serves as a testbed for us to experiment with and evaluate the proposed approach. We give more detail about this library in Section 2.4.1.
The key steps of the overall approach are illustrated in Figure 2 .3. Given this component library, we pursue two different ways to functionally identify if a netlist slice matches a component in this library.
The first is based on property checking -model checking if the netlist slice satisfies a set of logical properties associated with the component. The added challenge here, compared with similar verification task, is that the correspondence of inputs and outputs between the two circuits are not known. Our solution is to use a pattern mining technique to heuristically determine this correspondence. The general function of the unknown netlist slice is then determined by finding the closest match in the component library, by model checking the unknown netlist slice against each logical specification.
The second approach is based on equivalence checking. It is well known that the equivalence checking problem (for combinational circuits) can be formulated as a SAT problem [16] . However, in the context of reverse engineering, gate-sharing (i.e. overlapping functional blocks) is quite common. Specifically, this results in additional side inputs to the (smallest) netlist slice that implements a particular function.
To address this problem, we extend equivalence checking to handle side wires, by formulating the problem as a Quantified Boolean Formula (QBF). We describe this formulation in more detail in Section 2.4.3. 
Example -CMP Router
To illustrate the overall approach, consider the following example.
Example 1. Consider the high-level description of a chip multiprocessor (CMP) router as shown in Figure 2 .13. It is a composition of four high-level modules. The input controller comprises a set of FIFOs buffering incoming flits and interacting with the arbiter. When the arbiter grants access to a particular output port, a signal is sent to the input controller to release the flits from the buffers, and at the same time, an allocation signal is sent to the encoder which in turn configures the crossbar to route the flits to the appropriate output ports. Our goal is to reverse engineer such high-level information including the flow of flits and function of FIFOs, as captured in Figure 2 .13, from its gate-level netlist.
Problem Definition
We begin with some basic definitions and terminologies in Section 2.2.1, followed by the formal problem definition in Section 2.2.2.
Preliminaries Circuits and Netlists
In this report, we consider the following two abstractions of modeling digital designs. • O is a finite set of output values;
• Q is a finite set of states;
• Q 0 ⊆ Q is a finite set of initial states;
• δ : Q × I → Q is the transition function, and
A gate-level netlist N is a set of registers R (state-holding elements) and combinational gates G interconnected by wires W . Each gate g ∈ G is associated with a Boolean function f g with input pins I g and output pins O g . Such a netlist N with I input wires and O output wires is naturally associated with a circuit C N as follows. The input space I = 2 I , the output space O = 2 O , and the state space Q = 2 R . The transition and output functions are defined correspondingly by the logic of the gates.
A collection of registers and gates naturally induces a netlist slice (denoted as n, i.e. a portion of the original netlist. Let C n be the circuit associated with n. In this report, we consider slices that do not contain disjoint slices. For simplicity, we henceforth use circuit(s) for sequential circuit(s) and netlist(s) for gate-level netlist(s).
An input-output trace (or simply, a trace) τ of a circuit is a sequence of input-output val-
2 starting from a state q 0 such that o j = θ(q j ) and q j+1 = δ(q j , i j ) for all j ≥ 0. A finite trace τ k of length k is then a sequence of values
Word-Level Datapath
Words are desirable in datapath representations, especially for ease of human understanding. In a RTL description of a circuit, many signals are moved together and operated on at the word-level. However, identifying such word-level structures in a netlist poses significant challenges, since all the signals are given at the bit-level. We use the vector notation w = [w 0 , . . . , w k−1 ] to denote an ordered set of (bit-level) wires with cardinality k. For convenience, we use w η to denote the word formed by taking the individual bits of w according to and as ordered by an index array η. For example, if
It is also convenient to view word-level datapaths as a directed graph G, where the vertices represent words, and an edge ( u, v) represents the word u propagating to v (across a single layer of gates).
Formal Specification
A formal specification of a sequential circuit N is a set S of input-output traces of that circuit. Intuitively, every trace in S is an allowed behavior of N and every trace outside S is disallowed.
There are broadly two ways to write a formal specification for a digital circuit. The automata-theoretic approach is to describe S as a finite-state machine over infinite input sequences [37] . The other approach is to write a logical formula (or set of formulas) characterizing the input-output behavior of the circuit. The latter approach has gained favor in the EDA community over the years, especially in the form of assertion languages that allow one to specify temporal properties of a system, and which are usually slight extensions of linear temporal logic (LTL) [29] . We follow this logic-based approach in this report. Abusing notations, we also use S to denote the LTL formula characterizing the input-output behavior of a circuit.
A brief overview of LTL is provided below. A LTL formula is built from atomic propositions AP , Boolean connectives (i.e. negations, conjunctions and disjunctions), and temporal operators X (next) and U (until ). Given an atomic proposition p ∈ AP , a formula in linear temporal logic (LTL) can be constructed as follows.
Other temporal operators F (eventually) and G (globally) can be derived using the temporal operators X and U, and Boolean connectives: F ψ = trueU ψ and G ψ = ¬F ¬ψ.
For example, we can express that "every request must be eventually followed by a grant" in LTL as G (request ⇒ F grant), where the operator G specifies that globally at every point in time a certain property holds, and F specifies that a property holds either currently or at some point in the future.
Given a netlist N and a LTL specification ψ, we can verify if C N |= ψ. This is commonly known as LTL model checking. In this report, we assume the readers are familiar with the notion of property checking and have knowledge of LTL model checking. More details about model checking and LTL model checking can be found in [11] and [32] .
Abstract Components
An abstract component α is a triple (I, O, S), where I and O are sets of input and output signals, respectively, and S is a formal specification defining allowed input-output behavior of the component. An instance of an abstract component α is any circuit or netlist that satisfies the specification S of α.
We illustrate the notion of an abstract component using an example.
Example 2. Consider an arbiter servicing two (input) request lines r 0 and r 1 with two grant lines g 0 , g 1 as outputs. The abstract arbiter component would comprise the following LTL properties:
The first property states that both requests cannot be granted at the same cycle. The last two properties state that a request must eventually be granted, provided there are infinitely many cycles where no competing requests are present -the latter assumption is the property G F ¬(r 0 ∧ r 1 ).
High-Level Description
Informally, a high-level description of a netlist is a composition of abstract components. This includes a mapping from netlist slices to abstract components, as well as the datapaths that connect these components.
A library of abstract components (component library, for short) L is a set {α 1 , α 2 , . . . , α n } of abstract components. We will assume that each abstract component is also accompanied by at least one concrete instance; this is reasonable, as the abstract component library is typically constructed from observations of commonly occurring components in hardware designs.
Example 3. Examples of abstract components include common hardware design patterns and modules such as an arbiter, FIFO buffer, adder, multiplier, content-addressable memory (CAM), and crossbar. Abstract descriptions of finite-state machines are relevant for circuits that implement protocols; e.g., transmitter or receiver modules implementing the Ethernet or I 2 C protocols.
A high-level description (HLD) of a netlist N with input signals I and output signals O is a tuple (I, O, Γ) where Γ is a set of instances of abstract components drawn from L plus some "glue" logic such that the synchronous composition of these instances and the "glue" logic is (sequentially) equivalent 3 to the original netlist N .
Problem Definition
We are now ready to formally define the problem of reverse engineering a high-level description (REHLD) of a circuit.
Definition 2.1. Given
• an unknown netlist N containing cells R and G interconnected by wires W ,
• a library L = {α 1 , α 2 , . . . , α n } of abstract components, the REHLD problem for N is to derive a high-level description N = (I, O, Γ) where N is equivalent to N and Γ is a composition of instances of abstract components from L and a collection of registers R ⊆ R, gates G ⊆ G and wires W ⊆ W that make up the "glue" logic.
Solving the REHLD problem involves multiple steps. We rephrase each of these steps using the notation introduced in this section. There are four main steps:
• Word-level datapaths extraction: Extract word-level datapaths G from the bit-level netlist N .
• Library definition: Constructing an abstract component library L;
• Netlist slice identification: Extract from the given unknown netlist N a set of netlist slices n 1 , n 2 , . . . , n k , where each n i can be used as a candidate for matching against the component library L;
• Matching against component library: Given a candidate netlist slice n and an abstract component library L = {α 1 , α 2 , . . . , α n }, determine whether there exists an abstract component α i such that:
(a) Input-output correspondence: Each input (respectively, output) signal of α i that appears in the formal specification of α i is mapped 1-1 to some input (respectively, output) signal of n. (b) Verification: n is an instance of α i ; i.e., C n satisfies the specification S i associated with α i .
These steps can be categorized into datapath extraction ( Step 1) and function identification (Step 2, 3 and 4). In the rest of this report, we first describe how word-level datapaths can be extracted from a bit-level netlist. Afterwards, we present two novel approaches for finding the mapping from netlist slices to abstract components.
Finding Word-Level Datapath
In this section, we describe a systematic approach for identifying word-level datapaths in a (bit-level) netlist. The approach involves a combination of two sub-procedures -word identification and word propagation. In word identification, the goal is to find candidate groupings of wires that may form words. Starting with the candidate words and other known words (such as ones at the primary inputs and outputs), the second procedure tries to infer more words by iteratively propagating them across gates in the netlist.
Word Identification
We employ two techniques for finding candidate words in a netlist. The first is based on bitslice aggregation [35] , and the second is based on a notion called shapehashing. On a high level, the first technique uses functional matching while the second uses structural information to group "equivalent" wires into words.
Bitslice Aggregation
This technique is not our contribution and is described in detail in [35] . We briefly review it in this section.
The function of a wire w in the netlist can be characterized by a feasible cut of w. This is defined as a set of wires in the transitive fan-in cone of w such that an arbitrary assignment of truth values to each wire in the set completely determines the value of w [10] . A cut is said to be k-feasible if it has no more than k inputs. As in [35] , we enumerate the set of 6-feasible cuts for each node. Each cut then forms a bitslice rooted at w, which is a Booolean function with a single output and no more than 6 inputs. 4 Once all the bitslices are identified, they can be grouped into equivalence classes using permutation-independent Boolean matching. For example, a bitslice matching the function y = a ∧ b ∨ c and a bitslice matching the function y = b ∧ c ∨ a are grouped into the same class. Now that we have found all the duplicated bitslices across the netlist that compute a particular function, we can look for aggregates of them that are connected in specific patterns. Following the approach described in [35] , we group all matching bitslices that (1) have a common select signal; (2) the output of one bitslice feeds to the input of another (e.g. carry chain in a ripple carry adder). Since aggregated bitslices are essentially circuits that operate on sets of nodes simultaneously. We group the inputs or outputs of aggregated bitslices together to form candidate words. Note that in the case of a carry chain, the words are ordered in the carry direction.
Shape Hashing
The idea of shape hashing is to assign each wire in the netlist a shape, and then create a hash function for all the shapes such that we can easily identify equivalent wires if they have the same shape. The shape of a wire w is defined as the directed graph formed by the set of gates backward reachable 5 from w. A k-bounded shape is simply a shape where all the gates in the graph are backward reachable from the root w within k steps, and at least one of them is reachable at exactly k steps. When the directed graph is cyclic, we unroll the loops by duplicating gates along the loops until the gates are not backward reachable from w in k steps. In our experiments, we used all values of k ∈ {2, 3, 4}
6 . To compute a hash key from each shape efficiently, we perform a depth-first-search traversal of the DAG (i.e. shape) backward starting from w to produce a serialization of the DAG using gate and wire types. Multiple children gates in the traversal are tie-broken lexicographically. However, we do not check for graph isomorphism, especially one that is induced by having gates with commutative ports, for efficiency reasons.
We further refine the equivalence classes by taking into account the relative location of the wires. We define the distance between two wires as the smallest number of gates between them in the netlist 7 . With this distance measure, we form equivalence subclasses for wires that have the same shape. In each subclass, the wires are located to one another by at most d distance (number of gates). The collection of wires in each subclass then forms a candidate word. Individual wires of the same word are ordered arbitrarily.
Word Propagation
In this section, we describe the second important piece of the overall word-level datapath extraction process -an algorithm for propagating words. Intuitively, we want to see if arbitrary values of a word can propagate across a set of gates to reach a new set of wires. To do this efficiently, we use symbolic evaluation [9] , which allows the evaluation of a set of values simultaneously in a single run.
Similar to Roth's D-calculus [30] , we redefine the functions of the logic gates in the netlist to operate over the expanded domain {0, 1, D, D, X}, where D represents a symbolic value in {0, 1}, D is the negation of D, and X represents an unknown/undetermined value. Our solution for word propagation uses a "guess-and-check" approach. Starting from some known or candidate word (source word), we first try to find a set of wires (target word) whose cardinality is no greater than the cardinality of the source word, that are located one level of gates away from this word. For simplicity of discussion, we search forward for such a 5 In general, one can use both forward and backward reachable gates to assess structural similarity. In our case, the set of backward reachable gates also describes a Boolean function with a single output at w.
6 As k increases further, even wires originally declared as parts of the same word become structurally dissimilar in an optimized netlist.
7 A gate can be traversed to either from an input or from an output of the gate. set of wires, using a procedure called FindForwardWordPairs. This set of wires constitutes a target word, i.e. the word to propagate to. Next, we construct a netlist slice for symbolic evaluation using SetupSymbolicEvaluation. Finally, we check if symbolic values of the source word can indeed be propagated to the target word, by using the procedure TryPropagate. FindForwardWordPairs, i.e., the "guessing" stage of the algorithm, consists of finding promising target words. Figure 2 .6 shows two structural heuristics we use to find target words forward and backward. The idea is to group gates according to their function types and group wires according to the ports they connect to. For instance, consider the source word w as shown in Figure 2 .6a, which is 3-bit wide, and assume that "Gate1" and "Gate2" have the same function type, we guess two target words u and v, each of 2-bit wide. We then check if the subword w [0,1] can propagate to u and v respectively, using Roth's D-calculus.
SetupSymbolicEvaluation is the first step of the "checking" part of our algorithm. In general, if we just evaluate the gates between the source word and the target word by assigning X to the other inputs of these gates, we would not be able to propagate to the target word. The key insight here, which leads to effective word propagation as we will show in Section 2.5.2, is that if the target word is indeed a word (in the sense that it would be naturally declared as a bit-vector in a RTL description language like Verilog), then often there exist a few nearby (in the fan-in cone of these gates) wires which behave as control signals. This means that for some specific concrete assignments to these control signals, the target word will take the values of the source word (or their negation). This pattern is quite common in digital circuits, such as in the case of a multiplexer, or a conditional assignment in an "if-then-else" statement. In fact, even if the condition involves many signals and Boolean operators, the synthesis tool will likely create a wire in the netlist that is the evaluation result of this condition. We elaborate on how we make use of this insight in Figure 2 .7. We consider wires and gates in the fan-ins of "Gate1" and "Gate2", up to some small depth k. Any wire that lies in the intersection of these fan-ins is treated as control, e.g. wire a. We construct a netlist slice including the gates between the source word and the target word, as well as the gates in the aforementioned fan-ins. The fan-in gates are aggregated in a backward traversal manner up to depth k or when a sequential logic cell such as a flip-flop is reached. This ensures that the overall netlist slice forms a combinational circuit. This is the netlist that we will symbolically evaluate, and is set up in the following way.
• Source word: each bit is assigned the symbolic value D.
• Control wires: If the number of control wires is greater than 3, we enumerate all combinations of wires of size 3 in the set of control wires. In each combination, we further try all possible concrete assignments (assignments using 0s and 1s only) to the wires involved. The choice of the constant value 3 is related to the insight mentioned above that the synthesis tool will likely add wires capturing the evaluation of complicated conditions.
• Other inputs: each wire is assigned the unknown value X.
TryPropagate is the second piece of the "checking" stage of the algorithm. For each set of concrete assignments to condition wires, the netlist slice is symbolically evaluated afresh using the D-calculus. Because the netlist slice is a combinational circuit, symbolic evaluation can be done efficiently by evaluating each gate in the slice in a topological order. If every bit of the target word is evaluated to D or D for any concrete assignment to the control wires, then the target word is considered as a true word. In this case, the propagation process iterates and tries to use this new word to infer more words. The overall algorithm is summarized in Algorithm 1. For simplicity, we show the only parts for forward propagation. The algorithm iterates over the set of source words W . At each iteration, a word w is removed from the source pool. FindForwardWordPairs then uses the structural heuristics described in Figure 2 .6a to find promising pairs of words to test for propagation. For each pair of source word u and target word v, if no propagation has been attempted so far from u to v, the pair is added to H and we proceed to checking if u can propagate to v. This is achieved by SetupSymbolicEval which first sets up the netlist slice C and control wires cw for symbolic evaluation as described in Figure 2 .7, and TryPropagate which then evaluates C . If propagation succeeds, this means we have verified u and v are words and we add them to the set of inferred words W . Since v may be further propagated forward, we add it to the set of source words W. The algorithm terminates when we have tried all words that can be inferred from, i.e. W is depleted. for ( u, v) ∈ P do 8:
Add ( u, v) to H.
10:
if TryPropagate(C ,cw, u, v) succeeds then
12:
Add u, v to W . Add v to W. In this section, we remark on the potential limitation and robustness of the proposed techniques, and discuss open problems.
• Order of bits in a word. In general, it is difficult to infer the ordering of bits in a word as it was originally described (e.g, in RTL). Other than a few special cases, such as in a carry chain or at the primary I/O, an arbitrary order is picked for a candidate word during word propagation. Due to the generally large number of candidate words produced, we hash any new word produced in the propagation step unordered. This prunes out a significant portion of the search space since the search stops if the same collection of bits is encountered. However, the tradeoff is that we may miss re-ordering operations on words. This also has implications on function identification when a candidate module is generated using propagated words as boundaries. For example, if we want to match a candidate module to an adder, we need to know the correspondence of the input and output bits of the module to those of the adder. We describe a technique that addresses this problem in Section 2.4.2.
• Robustness of structural heuristics. Even though we start with an unstructured netlist, we use structural heuristics to aid both word identification and propagation -grouping wires based on shapes and grouping gates based on types of gates. This is based on our observation that certain regularities still remain after the circuit is synthesized and optimized. For example, the same type of gates (and respectively the same input/output pins of these gates) are often used when a word is propagated to another word. In fact, this grouping heuristics allows us to identify instances when only part of a word is propagated or when a word is propagated across one level of gates to two different words. In our experience, the simple structural heuristic we use in Section 2.3.2 does not result in much reduction in performance (number of words found). Alternatively, we can expand the domain further by assigning each wire a different symbolic value, and then check if the same value is obtained at an output of the gate(s) that the wire leads to. However, in case several target wires take the same symbolic value, a similar grouping based on gate and pin types would have to be performed to distinguish these wires.
In the context of dealing with (maliciously) modified circuits, we assume that the modification is introduced before logic synthesis (e.g., in the Verilog RTL description). This means that the structure of the post-synthesis is not completely obfuscated, thus allowing us to exploit structural similarities to heuristically guide our techniques. While this assumption does not apply to all attacker models, it is largely valid from an economic perspective. Changing the function of a post-synthesis netlist also requires the attacker to satisfy other design constraints, such as timing and capacitive balancing. Hence, while it is theorectically feasible to apply arbitrary modifications to a circuit at the netlist (or even mask) level, the attacker may not have enough incentives to do so.
• Reachability of propagating conditions. During word propagation, we check if a word can be symbolically propagated to another word under some assignment to a set of condition wires. Theorectically, one needs to check if these values are reachable. However, this would incur significant overhead in runtime since a large number of these reachability tests are needed and we are dealing with netlists with thousands to millions gates and registers. As a compromise, we use a small number of condition wires and propagate words optimistically. This compromise is also motivated by the observation that a conditional assignment in the RTL level typically translates to a single or a few wires (even if the condition is a large Boolean expression) taking the values of the condition in the netlist.
Functional Module Identification
In this section, we first describe our library of abstract components, and then present two techniques, both based on reduction to formal verification problems, for finding a match of a given netlist slice against the component library.
Library of Abstract Components
In general, a burden is placed on the end user to create an initial component library. While this is a limitation for any library-based approach, common hardware design patterns [12] and standardized interfaces are ideal starting components for the library. Our component library currently contains the following types of circuits (the same type of circuits may be parameterized by, for example, the width of inputs and outputs), ranging from simple multiplexer to complex controllers.
• Multiplexer and demultiplexer
• Boolean operations, including NOT, AND, OR and XOR
• Comparator
• Shifter
• Modulo operation
• Adder and subtractor
• Multiplier and divider
• Counter
• Decoder and encoder
• Register file
• FIFO
• Arbiter
• Arithmetic core
• Communication interfaces/controller
• Memory controller
• DSP core
• Error correcting code
• Audio decoder
• GPIO
• Video controller
• Processor core For ease of access and future exploration, each component is associated with the following information.
• Number of input bits;
• Number of output bits;
• Port declaration in words, e.g. input in1 [7:0];
• Behavioral (RTL) Verilog description;
• Synthesized Verilog netlist using the IBM 12SOI cell library;
• A Verilog test bench;
• Formal specificaton S of the component. Currently, we have gathered a total of 1285 different circuits and stored them in a MySQL database. Remark. The simpler components are generally more useful due to the relatively smaller sizes and smaller variations in implementations. A significant effort is also required if one wishes to fully specify a circuit using formal specifications. However, we believe partial specification is still useful, since knowing what properties a netlist slice satisfies (and does not satisfy) is itself a stride towards understanding the high-level function of the netlist.
Matching by Property Checking
In this section, we first show how temporal patterns mined from simulation or execution traces of the circuits can be used to determine input-output correspondence. Afterwards, we use model checking [11] to determine whether a given netlist slice satisfies the formal specification associated with an abstract component. The approach is illustrated in Figure 2 .8. Given an unknown sub-circuit n and an abstract library component α, we must first compute the correspondence between their input and output signals, if one exists. A bruteforce approach will have to try all possible permutations of the signals. In addition, the numbers of inputs and outputs of the two circuits may not be identical. We use a heuristic procedure for this step. First, a concrete instance of α (e.g. a reference circuit that satisfies that formal specification of α), denoted as n is obtained. Then, given the two circuits n with interface signals V I = I ∪ O and n with interface signals V I = I ∪ O , our method tries to find the corresponding signals between V I and V I . Definition 2.2. Given two sets of signals A and B, the signal correspondence between A and B is a bijective mapping σ :Ā →B, whereĀ ⊆ A andB ⊆ B. We say the mapping is maximum if there does not exist another mapping σ :Ã →B such thatĀ ⊂Ã ⊆ A or B ⊂B ⊆ B.
Input-Output Correspondence
Our approach to finding a good signal correspondence (so that the inputs and outputs of two equivalent circuits are matched correctly) on mining patterns from a set of input-output traces of the two circuits. Other approaches, e.g., based on the structure of the circuits, are also possible, and will be left to future work.
The key insight in our approach is that two signals are similar if they exhibit similar behaviors in relation to other signals. In this paper, we measure the similarity of two signals by checking if they satisfy some particular patterns (in relation to other signals) in the traces. However, our framework is general -one can use other definitions of similarity such as statistical measures.
In a nutshell, our method uses a two-step combination of pattern mining and graph matching. The pattern mining step infers likely temporal properties of a circuit from inputoutput traces of that circuit. It is important to note here that for the unknown component, we can only observe the trace induced by a test bench at the chip-level. This means that it is not possible to control the simulation as in the case of the known component. Consequently, even if the unknown circuit is identical to the library component, the behaviors of the two traces can be very different. In this work, we use pattern mining as a way to concisely capture key features of a trace. The mined properties are represented in terms of a pattern graph. Given circuits n and n , we compute pattern graphs for each of them. Then, a graph matching procedure is used to compute the maximum common subgraph between these two graphs, which yields the desired maximum bijection. We elaborate below.
Pattern Mining
We follow the approach described in [20] for mining temporal patterns from input-output traces of a circuit. A pattern is defined over (delta) events, and is given either as a LTL formula or a regular expression. An event e is a tuple s, v, t , where s is a set of signals and v is the corresponding valuations at cycle t. We denote the valuation of a Boolean signal s at cycle t as v s,t . A delta event, denoted ∆e, is an event such that at least one of its constituent signals changes value from the previous valuation, i.e. ∆e := s, v, t such that ∃ s ∈ s, v s,t = v s,t−1 . In this report, we use the follow simple pattern templates.
• Alternating (A) An alternating pattern between two delta events ∆a and ∆b is true when each occurrence of ∆a alternates with an occurrence of ∆b. Note that this does not mean ∆b follows ∆a immediately in the next cycle. This pattern can be described by the regular expression (∆a ∆b) * .
• Next (X) The next pattern corresponds to the LTL formula "G (∆a ⇒ X ∆b)." One can easily generalize this pattern to fixed-delay pairs.
• Until (U) The until pattern can be used to describe behaviors such as "the request line stays high until a response is received." Figure 2 .9 shows a trace where this pattern is satisfied. Formally, the LTL formula is "G (a 0→1 ⇒ X (a U b 0→1 ))."
• Eventual (F) The eventual pattern can be described by the LTL formula "G (∆a ⇒ X F ∆b)." The idea of pattern mining is similar to online monitoring. For each instantiation of these patterns templates (patterns over concrete delta events), one can generate a deterministic monitor that checks if the pattern is violated by some given trace. A pattern is said to be mined from a trace (or a collection of traces) if it is not violated by the trace(s). An approach for evaluating these instantiations efficiently is described in [20] .
Now given a set of input-output traces and a pattern template, our technique first generates a pattern graph G = (V, E ⊆ V × V ). A vertex v ∈ V represents an event over I ∪ O. For example, a vertex labeled with ∆a 0→1 represents the event of signal a transitioning from 0 to 1. There is a directed edge e = (u, v) ∈ E if and only if the pattern involving the delta events represented by u and v is satisfied in all traces.
As an example, let α be the arbiter described in Section 2.2.1. Suppose n uses a roundrobin priority scheme and n uses a fixed priority scheme for arbitration.
Given the input-output traces of n and n and the Until pattern, suppose that Figure 2 .10 shows the pattern graphs generated for n and n . Table 2 .1 shows how the named signals a-d of the round-robin priority arbiter and w-z of the fixed-priority arbiter correspond to request/grant signals r 0 ,r 1 ,g 0 , and g 1 as described in Sec. 2.2.1. (Our task is to compute this correspondence, but we provide it here so that the reader can follow this example.) Thus, for example, the edges from a 0→1 to c 0→1 and from w 0→1 to y 0→1 are instances of the property G [r 00→1 ⇒ X (r 0 U g 0 0→1 )] (request stays high until the corresponding grant is asserted). Graph Matching:
Given the two pattern graphs G and G corresponding to n and n respectively, a common subgraph is a graph which is isomorphic to induced subgraphs of G and G . We wish to find a maximum common subgraph (MCS) between the two graphs, i.e. a common subgraph between G and G that has the maximum number of vertices.
The key observation here is that the bijective function f that defines the MCS is exactly the signal correspondence mapping σ that we are looking for. In fact, when the vertices represent delta events, our approach can identify corresponding signals even if they are implemented with opposite polarities in the two circuits.
Consider our arbiter example. The MCS problem is known to be NP-hard [15] . Most complete approaches are based on reformulating the problem into a maximum clique problem in a compatibility graph between G and G [13] . The compatibility graph is a product graph of G and G such that a vertex in the product graph is a pair of vertices (i, k) where i ∈ V and k ∈ V . There is an edge from (i, k) to (j, l) (i = j and k = l) if and only if one of the following conditions hold:
In a directed graph G, we say two vertices u and v are connected if both (u, v) ∈ E and (v, u) ∈ E. A clique is then a subset of vertices such that every pair of vertices in this set are connected. A maximum clique is a clique with the most number of vertices.
We omit details of how the maximum clique problem is solved but refer the readers to [8] . The technique can scale to graphs with hundreds of vertices. Solving the maximum clique problem correctly generates the signal correspondence as depicted in Table 2 .1.
Matching by Property Checking
If all inputs and outputs of the abstract component α appearing in its specification S are mapped onto some inputs and outputs of the unknown circuit n, then we proceed to the next step, where we verify whether n satisfies S. In our approach, we perform this step using model checking [11] . (If S were a reference circuit, it is possible to replace the model checker with a sequential equivalence checker.) If n satisfies S, then we terminate and report that n matches the component α. Otherwise, we report that it does not match (and move to trying to match n to a different abstract component).
However, if some inputs/outputs of α appearing in S are not matched to inputs/outputs of n, then we stop and declare that there is no match between n and α. Note that this approach is conservative: it is possible for n to be an instance of α but not pass this phase, since our input-output signal correspondence algorithm is heuristic. However, it is important to note that our matching approach is sound due to the use of formal verification.
While formal verification may not scale to the full circuit, we envision applying this procedure mainly to netlist slices with hundreds to thousands of library cells, which is within the capacity of state-of-the-art model checkers today. We demonstrate our approach on benchmarks from OpenCores, as discussed in Section 2.5.3.
Lastly, it should be noted that while it may be difficult to write specifications that account for all possible behaviors of each library component, it is generally possible to distinguish one design block from another using only a few logical specifications (e.g. distinguish an arbiter from an adder). We believe the automation that we provide in the proposed approach can benefit reverse-engineering greatly, as it is still mostly a manual process today.
Matching by Conditional Equivalence Checking
The previous section described an approach when a candidate module with proper boundaries is mapped to components in a predefined library. However, it is in general difficult to carve out such perfect modules. Additionally, we observe that due to optimizations performed during logic synthesis, netlist slices with different functions can overlap, i.e. some gates are shared between slices. For example, instead of having a separate set of gates implementing each opcode function, a single set of gates between the inputs and outputs of an ALU unit will suffice for all operations, each having overlapping logical block with another. In fact, gate-sharing can be a result of design choice and customization. The ripple-carry addersubtractor design demonstrates this phenomenon where a single signal value can convert the adder to a subtractor. As a result, extra inputs are present when a netlist slice is carved out. Consequently, we can only reason about the function of the netlist slice when these side inputs are properly set. This means depending on the assignment to these side inputs, the netlist may or may not behave according to a certain operation, e.g. addition or subtraction. In this section, we describe an approach that addreses this problem, based on modifying the traditional equivalence checking problem into a quantified equivalence checking problem.
Generating Candidate Netlist Slices
Recall that we have all the words that can be inferred by propagation, we are now interested in finding computation structures that operate on these words. The main idea is to cut out the portion of the netlist that lies between words, and then check if this structure implements a particular word operation. We currently support only operations that are combinational logic. However, these still include many that are commonly found in circuit designs, such as addition, subtraction, Boolean operation (e.g. NOT, AND, OR, XOR) and shifting/rotation. Note that the user can extend this set with other word operations by providing reference models for those operations.
To extract the netlist between words, we first arrange the words in topological order between sequential boundaries. For example, given three words w a , w b , w c of the same width such that w a and w b are followed by w c in the topological order, and we are interested in checking if w c = w a + w b modulo a carry-in, we can form a netlist slice by using the gates between w a and w c and those between w b and w c .
QBF Formulation
Our solution is to model the problem as a Quantified Boolean Formula (QBF) and make use of state-of-the-art QBF solvers to solve it. This is different from the traditional SAT-based formulation for combintional equivalence checking [16] , with the side inputs existentially quantified. QBF is the canonical complete problem for PSPACE. It extends propositional formulas by including the universal quantifier ∀ and existential quantifier ∃. The particular instance of QBF we have formulated here involves a single alternation of ∀ and ∃, which is also known as 2QBF. Figure 2 .12 illustrates the miter construction for creating the 2QBF formula.
The reference circuit for the word operation we are interested in checking has inputs X. However, the extracted netlist slice also contains side inputs Y in addition to X. We can describe the miter circuit (as typically done for SAT-based combinational equivalence checking [16] in the verification literature) with a single Boolean formula φ such that φ is true if and only if the two circuits are equivalent. The intuition behind the existential quantification over Y is that if the netlist implements the function of the reference circuit, 
Discussion
We remark on the potential limitation of the function identification techniques and discuss open problems.
• Candidate module generation. We only partially address this problem in this paper -combinational blocks are carved out between words. However, the success of such matching by verification approach depends heavily on finding the correct input/output interfaces. In the case of matching by property checking, a partially matched module (when only a subset of the properties of the abstract component is satisfied) is still useful. In general, this is an open problem for us and we plan to explore it in future work.
• Robustness of pattern mining for solving I/O correspondence. We use pattern mining in this work. However, in general, any signature generation technique is applicable. In our experience, the simple pattern templates we use correspond to common hardware behaviorable patterns and thus are effective in capturing them.
• QBF formulation. Similar to word propagation, when the QBF procedure produces a satisfying assignment to the side inputs, the reachability of these values needs to Another limitation of the current QBF formulation is that it only applies to combinational netlist slices. For sequential circuits, we hypothesize that one can use a bounded enrolling to leverage the same QBF approach, and this is a subject of future work.
Experimental Results
We have developed an inference tool using Python and C++ that implements the algorithms described in this paper. The tool takes as input a synthesized netlist, analyzes it and produces as output word-level datapaths as well as a set of annotated netlist slices. The tool uses DepQBF [24] as the backend solver for solving QBFs. Table 2 .2 summarizes the netlist information of the designs that are used in this paper. The generic name BigSoC is used for confidentiality reasons. All the designs were synthesized with an IBM/ARM cell library for a 45nm SOI process using the Synopsys Design Compiler with its default optimization setting. The CMP router is a simplified version of the chip-multiprocessor router proposed in [28] . BigSoC is a system-on-a-chip design which consists of 7 subsystems: a 32-bit ARMcompatible RISC processor, a Singular Value Decomposition module, a SPI interface, a UART interface, an I 2 C interface, a VGA controller and a memory controller. The subsystems are further interconnected through an AXI4S switch. The rest of the benchmarks are available on OpenCores [4] .
Benchmarks
We also separately evaluate the "matching-by-property-checking" approach and used the following three circuits obtained from OpenCores [4] in our experiments.
• WISHBONE-compatible SPI (WB-SPI) [7] • WISHBONE-compatible SimpleSPI (WB-SimpleSPI) [5] • WISHBONE-compatible I 2 C (WB-
The serial peripheral interface (SPI) is a full duplex, serial communication link. Devices operate in either master or slave mode. Typically there is a single master device and one or more slave devices. SPI specifies four logic signals: SCLK (serial clock), MOSI (master output/slave input), MISO (master input, slave output), and SS (slave select). The SS signal is only necessary if more than one slave device is connected to the master. To initiate a data transfer, the SS pin for the desired slave is first pulled low. Then data are clocked from the master to the slave via the MOSI port and data are clocked from the slave to the master via the MISO port. When the transfer is complete, the SS pin is pulled high. SimpleSPI is a simplified version which supports only one master and one slave.
The inter-integrated circuit or I 2 C is a serial two-wire communication bus. Devices operate in either master or slave mode, similar to SPI; however, there can be multiple masters on an I 2 C bus. The bus is made up of two logic signals: SCL (clock) and SDA (data). A typical data transfer starts with a master sending a START bit along with a 7-bit address for the slave it wishes to communicate with, and a bit indicating a read or write operation. The slave responds with an ACK bit and proceeds to operate in either read or write mode, depending on the master's request. Once the transmission is over, the master sends a STOP bit.
WISHBONE is a communication interface for IP cores that enhances design reuse by enforcing compatibility between cores. Furthermore, WISHBONE is open source, which makes it easy for engineers to share hardware designs. As such, many projects on OpenCores, a website dedicated to open source hardware designs, include a WISHBONE interface. The protocol supports handshaking, single read/write cycles, block read/write cycles, and readmodify-write cycles. All three circuits are supposed to be WISHBONE compatible.
In this part of the experiment, we consider the scenario where the WISHBONE protocol [6] has been pre-characterized as a library component and the WB-SPI circuit is given as a concrete implementation of the WISHBONE protocol. We treat the WB-SimpleSPI and WB-I 2 C as unknown candidate netlist slices. The goal is to determine whether an unknown circuit also implements the WISHBONE interface. Table 2 .3 summarizes the results of applying WordRev to the netlist designs. Columns 2 and 3 record the number of input and output words used respectively. Column 4 records the number of candidate words identified using the algorithms in Section 2.3.1. Column 5 is the number of words produced by word propagation. Column 6 is the runtime for word propagation in minutes. In all experiments, we limited the search to only words between 4 and 32 bits wide.
Finding Word-Level Flow
Input and output words were assumed to be known, since these are at the I/O of the design. Candidate words were selected from the word identification step described in Section 2.3.1. For BigSoC, we only selected 16 32-bit words identified by using the bitslice aggregation algorithm in Section 2.3.1. This is because a lot of words were identified as candidate words, which caused the tool to timeout as a result of trying to propagate each of them. Additionally, we consider words that can be propagated to be more valuable than just identified words, since propagation indicates that these arrays of bits are more likely to be operated together at the RTL level.
Case Study: CMP Router
In this section, we use the CMP router to evaluate the effectiveness of WordRev in detail. Particularly, we focus on the following analysis.
• Usefulness of the flow of word-level information presented as a directed graph.
• Effect of the netlist being synthesized from different cell libraries.
• Effect of the netlist being synthesized from the same cell library but with different optimization settings.
For convenience, we called the router netlist used in the previous section the "TR Router", the resulting netlist synthesized from a different cell library the "TS Router", and the optimized netlist the "Optimized TR Router". The overview of the CMP router design is illustrated in Figure 2 .13. It is a composition of four high-level modules. The input controller comprises a set of FIFOs buffering incoming flits and interacting with the arbiter. A flit is a flow control unit. A data packet is composed of multiple flits. In this implementation, a flit has 12 bits, with the lower two bits as a header (used for channel reservation) and the remaining 10 bits as address (6 bits) and data (4 bits). Each input controller contains a circular FIFO buffer of 4 flits deep. When the arbiter grants access to a particular output port, it sends a signal to the input controller to release the flits from the buffers, and at the same time, it sends the allocation information to the encoder which in turn configures the crossbar to route the flits to the appropriate output port.
Word-Level Datapath: Figure 2 .14 shows the word-level datapath as a directed graph produced by our tool. We have highlighted regions of the graph that correspond to highlevel modules of the router as described previously. The nodes in yellow are the known words that we start with at the primary input and output of the router. The number in each node denotes the numbering of the word (e.g. "w11") followed by the width of the word (e.g. 12 for "w11"). If a node is contained in another node, it indicates that the inner word is a subword of the outer word.
The top half of the word graph is in fact isomorphic to the bottom half, each corresponding to a port of the router. The input controller FIFO is the subgraph that has two special nodes, marked as "Writing into FIFO" and "Reading from FIFO". From the first node, which is the write pointer of the FIFO, it splits to four other words. Each of these words is then latched (as indicated by the red arrow). The latched output has three possible out-going paths: one going back to the itself (typical for state holding elements), one goes to the write pointer of the FIFO (multiplexed to determine whether it will get overwritten), and the last one goes to the read pointer of the FIFO (another multiplexing for determining which flit will be read). The crossbar on the right is easier to recognize -it consists of two 12-bit words that interweave to two other 12-bit words downstream. While the module identification described here was performed manually, comparing to Figure 2 .13, the word graph in Figure 2 .14 essentially reduces the router netlist with approximately 1100 cells to a single graph (which can fit on a page) that captures most aspects of the data-flow and architectural features of the router. This allows to a human analyst to look for patterns at a higher level of abstraction. Moreover, the word graph provides a structure for further automation, such as module identification, which we plan to explore in future work. Different Cell Libraries: "TS Router" had about 4% less gates than "TR Router", but had the same number of latches.
11 Applying the same word propagation algorithm to "TS Router", 52 (instead of 54) words were found. Comparing the words inferred in the two Figure 2 .14: Word-Level Datapath for the Same CMP Router Design netlists directly was difficult since the wires were named differently. However, upon close examination of the word graphs, we verified that the topology of the "TS Router" word graph is almost identifical to that of "TR Router". The only differences were at the read and write pointers of the FIFOs. In addition, the widths of the words were also the same, showing the splitting of I/O words into words of 10 bits wide. This shows that our heuristic for finding target words to propagate is robust to small change in the cell library used in logic synthesis. Different Synthesis Parameters: "Optimized TR Router" contained about 25% less gates than "TR Router". 12 In addition, it used 176 instead of 182 latches. In this case, WordRev found 50 words, 4 less than the number of words found in "TR Router". We analyzed the word graph generated for "Optimized TR Router" and found interesting discrepancies. While the topology of the graph remained largely the same, which means the key features were again visible, the words being propagated were only 4 bits wide. In fact, they correspond to the 4 data bits in each flit. In "TR Router", both the 6-bit wide address field and the 4-bit wide data field were propagated together. This shows that while our algorithm can extract the word-level dataflow of a netlist, its performance can be influenced by the optimization setting during logic synthesis.
Functional Module Identification QBF-Based Matching
In this section, we focus on reporting results that demonstrate the effectiveness of using the QBF formulation to identify structures and conditions in reasoning various word operations. The OC8051 microcontroller is widely used in many embedded system products. It contains an 8-bit CPU optimized for control applications. Using word identification techniques described in Section 2.3.1 and the structure extraction process described in Section 2.4.3, we extracted a netlist slice that was part of the ALU unit of the microcontroller. This netlist slice contained 435 12SOI gates. In addition to the two 8-bit input words, the netlist slice had 87 other side inputs. We formulated a QBF problem for each of the word-level operation (each word is 8-bit wide and each operation corresponds to a specific ALU opcode) shown in Table 2 . 4 , and verified that the single netlist slice could implement that operation by making appropriate assignments to the side inputs.
Property Checking with Unknown I/O Correspondence Signal Correspondence:
As described in Section 2.4.2, we followed the approach in [20] for generating the pattern graph by mining the Until pattern on the simulation traces of each circuit. The simulation traces were produced by the test benches provided on OpenCores. We further assume the clock, reset (wb rst i) and data-path signals at the interface are already identified (these are easy to identify by structural methods). Next, a compatibility graph was generated for the pattern graph of the unknown circuit and WB-SPI. 13 The time taken to generate each pattern graph was under a second. We used the program Cliquer [27] which implements an exact branch-and-bound algorithm developed by PatricÖstergård for finding the maximum clique in a graph. The time taken to find a maximum clique in the compatibility graph was under a second in all instances. Table 2 .5 shows the signal map derived between SPI and SimpleSPI by using the first MCS produced. All the WB-related signals were mapped correctly. In addition, two SPIrelated signals were also mapped correctly. The signal names used here were taken from the respective RTL files and only serve as an illustration. In an actual reverse engineering exercise, the signal names of the unknown circuit will be arbitrary. These results were obtained despite the fact that the traces were generated by two test benches with very different behaviors. Table 2 .6 shows the signal map derived between SPI and I 2 C by also using the first MCS produced. Table 2 .6: Signal Mapping between WB-SPI and WB-I 2 C WB-SPI WB-I 2 C wb we i wb we i wb cyc i wb cyc i wb ack o wb ack o All the WB-related signal were again matched correctly. However, the wb stb i signal was not matched. It was identified by iterating the approach with the Alternating pattern, given the current matches. This suggests an incremental approach to our framework but this will be left to future work. On the other hand, there were also four other incorrectly matched signals. This is because other than being both WISHBONE compatible, the two circuits were actually implementing different functions. The next step which checks these signals against their logical specifications will eliminate the incorrect matches.
Matching by Verification:
We focused on specifying the slave interface of the WISHBONE protocol, although we used properties of the master interface as assumptions.
At a minimum, a slave interface requires the following signals: wb ack o, wb clk i, wb cyc i, wb stb i, and wb rst i. Since the WISHBONE interface does not explicitly require the data lines, wb dat o and wb dat i, the only properties we could specify were about the reset operation and the handshaking protocol.
The properties for the reset operation and handshaking protocol are as follows:
• G (wb rst i ⇒ X ¬wb ack o)
• G ¬(wb ack o ∧ X wb ack o)
• G ((wb cyc i ∧ wb stb i) ⇒ F wb ack o)
The assumptions we made on the master interface, part of the specification, are as follows:
• G (wb rst i ⇒ (¬wb stb i ∧ ¬wb cyc i))
• G (wb stb i ⇒ wb cyc i)
These LTL properties were manually translated from the WISHBONE documentation Rev. B.4 [6] .
Taking the netlist descriptions of WB-SimpleSPI and WB-I 2 C, we translated them to the SMV format and verified the properties described above using the Cadence SMV model checker [1] . For both circuits, all the properties passed. This confirmed that both indeed implemented the WISHBONE interface. The verification times were under 0.1 second for either benchmark.
Chapter 3
Conclusion
Summary
In this report, we have presented a portfolio of novel techniques for reverse engineering circuit netlists to their high-level descriptions. We are also the first to provide a formal definition for the netlist reverse engineering problem (REHLD), based on the notion of matching against abstract components. Many of the techniques proposed here have connections to techniques used in formal verification. For example, symbolic evaluation is used to infer word propagation, and property checking and equivalence checking are used to match a unknown circuit to a known abstract component. These techniques allow us to cope with challenges such as the netlist being unstructured and an intractably large implementation space exacerbated by scarce documentation.
While reverse engineering does not directly address the problems of trojan detection and isolation, it enables one to perform more targeted and comprehensive analysis once the high-level description of the netlist is obtained. We envision the techniques proposed in this report will be applicable in the grand picture of ensuring the integrity of a fabricated circuit, and will be a valuable complement to "dynamic" techniques that exist in the literature.
We have proposed an initial and integrated solution to the reverse engineering problem and the experimental results are promising. In fact, we tackled netlists that are orders of magnitude larger than those considered in previous reverse engineering approaches (e.g. [17] ). However, we also recognize the potential limitations of the proposed techniques, as described in Section 2.3.3 and Section 2.4.4. We plan to address these problems in future work.
Acknowledgement
The author would like to thank his collaborators Prof. Sanjit A. Seshia, Zach Wasson and Wei Yang Tan from UC Berkeley, Adriá Gàscon, Dr. Natarajan Shankar and Dr. Ashish Tiwari from SRI International, and Pramod Subramanyan and Prof. Sharad Malik from Princeton University. Some of the work done in this report could not have been possible without their assistance and guidance. In particular, the observation that setting a few "control" wires is likely to be sufficient for propagating the values of a word was first made by Dr. Ashish Tiwari. The work on word propagation also benefited much from discussion and working closely with Adriá Gàscon. The idea of using shapehashing is due to Dr. Natarajan Shankar. The author would like to thank Pramod Subramanyan and Prof. Sharad Malik from Princeton University for the help with their tool, BSIM, which was used in the word identification step. The author would also like to thank Prof. Sanjit A. Seshia and Prof. Robert K. Brayton for reviewing this report and giving insightful feedback. The work described in this report was supported in part by Defense Advanced Research Projects Agency (DARPA) under the IRIS program, and by the Hellman Family Faculty Fund.
