A novel trust-based design method for FPGA circuits that uses error-correcting code (ECC) structures for detecting design tampers (changes, deletion of existing logic, and addition of extradesign logic-like Trojans) is proposed in this article. We determine ECC-based CLB (configuration logic block) "parity groups" and embed the check CLBs for each parity group in the FPGA circuit. During a trust-checking phase, a Test-Pattern Generator (TPG) and an Output Response Analyzer (ORA), configured in the FPGA, are used to check that each parity group of CLB outputs produce the expected parities. We use two levels of randomization to thwart attempts by an adversary to discover the parity groups and inject tampers that mask each other, or to tamper with the TPG and ORA so that design tampers remain undetected: (a) randomization of the mapping of the ECC parity groups to the CLB array; (b) randomization within each parity group of odd and even parities for different input combinations (classically, all ECC parity groups have even parities across all inputs). These randomizations along with the error-detecting property of the underlying ECC lead to design tampers being uncovered with very high probabilities, as we show both analytically and empirically. We also classify different CLB function structures and impose a parity group selection in which only similarly structured functions are randomly selected to be in the same parity group in order to minimize check function complexity. Using the 2D code as our underlying ECC and its 2-level randomization, our experiments with inserting 1-10 circuit CLB tampers and 1-5 extraneous logic CLBs in two medium-size circuits and a RISC processor circuit implemented on a Xilinx Spartan-3 FPGA show promising results of 100% tamper detection and 0% false alarms, obtained at a hardware overhead of only 7-10%.
INTRODUCTION
In recent years, a significant number of technology products have started using FPGAs due to their substantially lower costs compared to ASICs, and due to their recent increases in performance. Due to the high complexities in current designs, FPGA circuit design and FPGA system integration (e.g., integrating an FPGA and its circuit design with other FPGAs and ASICs on, say, a PCB-based system) are generally performed across multiple organizations. Thus, circuit designers have no control on the integration process, while the integrators need design details. These products are thus vulnerable to malicious tampers that could cause them to operate incorrectly at some future time. This article is concerned with the design of trusted FPGA circuits and trustcheck mechanisms to detect intentional tampering of circuits, an emerging area of research [DARPA; Trimberger 2007] (unintentional tampers are also automatically covered by our techniques).
Most techniques for trust and security in VLSI involve techniques to protect critical data on a chip (e.g., cryptographic keys) by prevention of side-channel attacks [Kocher et al. 1999; Yang et al. 2005 ] and other techniques [Kommerling and Kuhn 1999] secure coprocessors that can execute sensitive functions without interference [Dyer et al. 2001] , and packaging-based tamper-proofing mechanisms (tamper resistance, evidence, detection and response) [Weingart 2000; Dipert 2000 ]. These works address issues that are orthogonal to design of trusted circuits, and in particular to detection and prevention of design tamper, which is the thrust of this article. In this article, by design tamper (henceforth termed "tamper") we mean deliberate insertion of extraneous logic (e.g., Trojans) in a circuit or deletion of essential logic from it or change in some portion of its logic.
Classical testing, which needs exact knowledge of the circuit being tested, can only check for the latter two types of tampers, but cannot detect any extraneous logic. Furthermore, testing generally targets 1-2 faults/tampers. It is thus of limited use in the trust-checking scenario.
Another related set of techniques [Adell and Allen 2008; Bolchini et al. 2007; Carmichael et al. 2000; Heiner et al. 2008; Xilinx 1,2 ] is concerned with fault tolerance of configuration memory changes due to a limited number (generally, 1-2) of upsets or errors per frame of the configuration memory; a frame is the smallest unit of configuration memory that can be read or written in Xlinx FPGAs. These methods involve detection of the changes in the configuration memory of an FPGA followed by correction of these · 6: 3 errors. Detection techniques include: (a) duplicate-and-compare implementation of the design [Bolchini et al. 2007 ]; (b) use of single-error-correcting double-error-detecting (SECDEC) Hamming code in the Xilinx Virtex-4 and Virtex-5 configuration memory, and detection of up to two errors using an onchip Frame ECC logic during readback of the configuration memory using an on-chip circuit called the Internal Configuration Access Port (ICAP) that is part of the Virtex-4/Virtex-5 FPGA (see footnote 1, 2); (c) use of off-chip detection during readback of configuration memory using precomputed cyclic redundancy check (CRC) codes [Carmichael et al. 2000] . Fault tolerance or correction of detected errors involves the following approaches: (a) triple-modular redundancy in the design and/or the ICAP circuit [Adell and Allen 2008; Heiner et al. 2008] ; (b) reloading of the detected erroneous configuration memory frames via partial reconfiguration [Bolchini et al. 2007; Carmichael et al. 2000] ; (c) Scrubbing, which is a periodic reloading of the entire configuration memory of the FPGA from the external configuration bit storage in order to prevent accumulation of configuration memory errors [Adell and Allen 2008; Carmichael et al. 2000] ; (d) bit-level error correction using SECDEC Hamming ECC on the detection of a single error in a configuration memory frame (see footnote 1, 2). The aforementioned set of techniques can only catch on-chip configuration memory errors, but cannot detect preconfiguration tampering of the FPGA design/configuration bits: the main focus of this article. Further, unless a timeconsuming bit-by-bit comparison is performed of the readback configuration memory frames with the frames of the original configuration bits (stored off chip), these techniques can only catch a few errors in each frame. Thus these fault-tolerance-oriented approaches are not very germane to trust design and checking.
Recently, Trimberger [2007] addressed various issues in the protection of FPGA designs from tampering. That work argues that: (i) FPGA chip fabrication is completely separated from FPGA circuit design, so there is no scope for an adversary at the fabrication company to tamper with any circuit design; (ii) FPGA configuration bitstreams are hard to reverse engineer in order for an adversary to determine the circuit and tamper with it; (iii) the bitstream can be further protected by encrypting it, and decryption hardware is available in many current FPGAs for internal decryption of bitstreams. While the first point is correct, the circuit design could be required at an integration company (as opposed to at the fabrication company) that assembles, for example, application-specific multichip PCBs that include FPGAs, in which the designs are needed to test and fine-tune the product. This exposes the designs to an adversary at such a company. Furthermore, bitstream reverse engineering, while hard, is not impossible to accomplish for a determined adversary, as acknowledged in Trimberger [2007] . Finally, while encryption provides a reasonable trusted FPGA design execution environment, it imposes certain restrictions on FPGA use, like disabling of partial reconfiguration [Trimberger 2007] , and is thus not suitable for applications such as adaptive computing, an important niche area for FPGAs. Furthermore, encryption does not protect against tampered IPs that are unknowingly used in an otherwise trustworthy design.
6: 4 · S. Dutt and L. Li Another possible trust-design approach is signature computation and checking of the bitstream. In this approach, a sig nature S of the bitstream is computed (e.g., using a linear-feedback shift-register (LFSR)-based computation) on the "golden" (correct) FPGA bitstream and sent directly to the user. When the FPGA, possibly integrated on a PCB with other components, arrives at the user's location, he/she needs to compute the signatureŜ of the FPGA circuit's bitstream (that is, for example, stored on a ROM). If S =Ŝ, then this indicates presence of tampers in the new bitstream. There are two drawbacks to the signature-checking scheme. First, signature checking is an offchip checking approach, and thus cannot protect against malicious IPs (since IPs are encrypted), tampers introduced by the device programming unit, or remote attacks like those using high-energy electromagnetic pulses (EMPs) that tamper with the on-chip configuration bits. Secondly, in the case of checking only the design configuration bits, even using strong signature computation techniques like LFSR, their nondetection probability (equal to the aliasing, i.e., two distinct bitstreams producing the same signature, probability) is an order of magnitude higher than ours. For example, as we show in Section 2, for a medium-size FPGA like the Spartan-3 with a 24 × 20 CLB array, our nondetection (or masking) probability is ≈ 16×10 −9 . In Saxena et al. [1992] , it has been shown that in the context of testing circuits with one output bit, in which the output bits across input vectors to the circuit are compacted into a signature using an LFSR approach, the aliasing probability is lower bounded by 1/L if the signature register period is greater than L, where L is the number of random test inputs (in other words, L is the number of output bits compacted). Translating this to the signature computation of an FPGA with a configuration bit size of L, we get a lower bound for the aliasing probability of 1/L. Large Virtex-4 FPGAs have L of the order of 5M-50M bits (see footnote 1), and we thus estimate that the L for the smaller Spartan-3 FPGA with a 24 × 20 CLB array is at most 5M bits (we could not find the configuration bit size of this FPGA), leading to an aliasing probability lower bound of 2 × 10 −7 . The nondetection probability of our ECC-based technique for the aforesaid Spartan-3 FPGA is thus about 12.5 times less than what would be obtained by an LFSR-based signature computation and checking approach for FPGA trust verification. This, coupled with the fact that our technique can detect tampers in many scenarios in which signature checking is ineffective, makes our ECC-based on-chip checking approach superior. Figure 1 shows four different tamper vulnerability points of an FPGA circuit (design, IP, device programming, and remote attacks using, say, high-energy EMPs) that are not all protected by fault tolerance, signature-checking, or encryption approaches. Thus it is necessary to employ explicit trust design and checking techniques for FPGA circuits that: (1) check the circuit's configuration bits on chip, and (2) are functionality-based, which allows all the four vulnerabilities to be addressed. The techniques presented in this article are of this type. Such methods also enable FPGAs to be used in a trusted manner in all types of applications, without restrictions on the use of any of their features.
In this article we address design tampering of FPGA circuits' logic elements, namely the CLBs. Trust design and verification for tampering of interconnect routing will be addressed in a future paper. We assume the following in our trust design and check methodology.
(1) The original design process of the FPGA circuit is trusted.
(2) Either the correct original design (the "golden reference") is available for subsequent trust-based design (trust structure insertions) or trust-based design immediately follows the original design phase (thus almost eliminating any possibility of any design tamper between the original design and trust-based design phases).
We also note that we do not need the golden reference during trust verification/checking.
The rest of the article is organized as follows. Section 2 presents the basic idea of ECC-based trust checking for design tamper detection in FPGA circuits. Section 3 gives an introduction to the Spartan-3 FPGA architecture for which we have instantiated our techniques. Section 4 discusses TPG control of inaccessible inputs of CLB logic and determination of the TPG size needed to check a given circuit. In Section 5, we give a classification of logic structures of slices (a slice is a smaller logic unit within a CLB, which has multiple slices; see Section 3) called parity group patterns that is useful for determining parity groups with similarly structured slice outputs that yield low check-CLB/slice overhead. Next, in Section 6 we discuss and analyze techniques for tackling tampers in the checker circuit (TPG and ORA) that use a second level of randomization, namely that of the parity vectors of parity groups. Section 7 discusses different embeddings of the check components in an FPGA and their attendant hardware overheads. In Section 8 we present our experimental results, and we conclude in Section 9.
ECC-BASED DESIGN-TAMPER DETECTION
Our core idea in designing trusted FPGA circuits is to impose an ECC's parity group structure on the CLB array. During a trust-checking phase, a TestPattern Generator (TPG) is connected to the inputs of each parity group (PG) of CLBs, one at a time, and feeds identical input/test vectors to each CLB in a group, while an Output Response Analyzer (ORA) determines the parity of the 6: 6 · S. Dutt and L. Li outputs of the CLBs in the PG for each test vector; 3 see Figure 2 (b). The output vector produced by the ORA (the ORA produces a parity bit per test vector) is then checked to determine if it is the expected parity vector (PV) for this PG; for the even-parity scheme we have discussed so far this PV will be the zero vector, but for the random-parity technique to be discussed in Section 6, the PV will be a predetermined random bit vector. For example, consider a PG with 2-input functions, with input vectors I 0 , I 1 , I 2 , I 3 , and a randomly determined parity of (odd, even, even, odd) across these input vectors. Then the expected PV for this PG is (1,0,0,1). If, on the other hand, we use the even-parity scheme, then this PG's expected PV = (0, 0, 0, 0).
In either parity scheme (even or random), if the ORA output is the expected PV, then the CLBs in this group are tamper free with very high probability (as we will establish shortly, both analytically and empirically). The entire FPGA is tamper free with very high probability if all PGs are determined to be tamper free; otherwise, the FPGA is determined to have tampers. Figure 4 shows our trust checking flow for Xilinx FPGAs that uses the Xilinx ISE toolset.
Classically, in ECCs the designed-for parity is always even, and the rest of this discussion, unless otherwise stated, is based on even-parity groups in the FPGA circuit; in Section 6, we augment our method with random parities across input vectors of a PG. Any deviation from the expected parity implies some tampering of the CLB logic for one or more outputs in that group. 3 It is also possible to check multiple PGs at a time by connecting the TPG outputs to all their inputs, and using separate ORAs to check each PG; the number k of PGs that can be checked simultaneously is limited only by the FPGA size and the resources needed by k ORAs. For simplicity of exposition, we assume henceforth, unless otherwise stated, that one PG is checked at a time.
In an ECC for information bits, for the ith parity group, there is a check bit C i and k information bits x i,1 , . . . , x i,k , where
Figure 2(a) shows a 2D parity code in which the information bits and check bits are arranged in a 2D matrix, with the check bits occupying the last row and column; each row and column of this matrix defines a parity group. Similar to bits, CLBs can also be arranged in parity groups, where circuit CLB outputs correspond to information bits and check CLB outputs to check bits. The output function of the check CLB in each PG is then the parity or XOR of the output functions of the circuit CLBs in the PG. C( j, i) , where C( j, i) is used here to denote the output function of the CLB that is also labeled as C ( j, i) ; similarly for the rowcheck CLBs CC(i, 3)'s. As mentioned earlier, tamper detection is performed by a TPG generating all possible input vectors that simultaneously feed all CLBs in a parity group, and an ORA determining if the parity of the CLBs' outputs (that are in the same parity group) is even across all input vectors.
We note that we always need to map the underlying ECC to the entire CLB array of the FPGA, irrespective of whether a CLB is part of the functional circuit. This is needed in order to detect the presence of extraneous logic inserted into nonfunctional CLBs/slices. We also configure all nonfunctional CLBs/slices to implement the zero function (one whose output is always logic 0). The reason for this is that for an extraneous circuit to disrupt the application circuit, all its CLBs/slices cannot be configured with zero functions, as otherwise it would do nothing (e.g., there must at least be a multiplexer in the extraneous circuit to connect one of its outputs to at least one input of some slice/CLB of the application circuit in order to disrupt it, and the multiplexer output will be a nonzero function). Hence, at least some CLB output of the extraneous circuit would have to be a nonzero function for this to be possible. Since this is a change to its "normal" zero-function output, and since we check the entire CLB array during the trust-checking phase, any extraneous circuit insertions will be detected by our techniques with high probability.
4
Randomization of parity group mapping. In an adversarial design tampering scenario (as opposed to one of randomly occurring faults), having an obvious mapping of the 2D code to the CLB array can be easily defeated by arranging tampers in a 2 × 2 subarray in which they "mask" each other (masking is formally defined and discussed shortly); see Figure 3 (a). The way around this problem is to randomize the mapping of the m × n 2D parity code onto an m × n CLB array. In other words, if r : H × V → H × V is a random function, where H = {0, . . . , m − 1} is the set of row coordinates and V = {0, . . . , n − 1} is the set of column coordinates, then the ith row X (i, 0), . . . , X (i, n − 1) of the 2D code is mapped to CLBs C(r(i, 0)), . . . , C(r(i, n − 1)), and similarly its j'th
a random mapping is shown in Figure 3 (b). It is virtually impossible for an adversary to determine the randomized embedding of a 2D code (or of any other ECC) into the CLB array unless he/she exhaustively analyzes all subsets of CLB outputs and determines if, for each subset, one of the outputs is the parity of the others. Note that it will be necessary for an adversary to examine all subsets of CLBs (as opposed to subsets of sizes m and n only), since it is not necessary that we only map an m × n 2D code to the m × n CLB array; any m ′ × n ′ 2D code can be mapped to this array, where m ′ n ′ ≥ mn, m ′ (n ′ − 1) < mn, and (m ′ − 1)n ′ < mn. This means analyzing 2 mn subsets, a virtually impossible task for current FPGAs which have mn in the order of 64-20,000.
Tamper masking. In an ECC for information bits, the flipping of two bits in the same parity group avoids detection in the PG (but not necessarily in the entire ECC, since these two flipped bits could also be present in other disjoint parity groups, where their errors can be detected if no other bits have flipped in these groups). Such a phenomenon of nondetection due to multiple bit errors is called masking; in such cases the errors are said to mask each other. Similarly, masking can occur in CLB PGs due to multiple tampers, so that across all test vectors the output parities of a PG with tampers remain unchanged. Let f i 1 , . . . , f i t be CLB outputs that belong to the same PG, and let there be tampers in the CLB logic used to generate these outputs. We denote the tampered outputs byf i 1 , . . . ,f i t . These tampered outputs are said to mask each other if
since under this condition, it is clear that these tampers will not change the parity of the PG for any test vector. We have been able to determine two general approaches to inserting tampers that mask each other in a PG, and these are described next. (2) Inserting tampers in an even number of outputs in a PG so that each output function is complemented also causes the tampers to mask each other.
Since a ⊕ b =ā ⊕b , we satisfy Eq. (2) if the number s of outputs tampered within this manner is even.
From the second masking-tamper pattern, it is possible to extrapolate the following ECC-oblivious scheme for tampering with all functional outputs of CLBs such that they are never detected in our ECC-based technique: complement all CLB outputs. However, this will not work if any PG has an odd number of functions in it (including the check function), since Eq. (2) is not satisfied whenf i j =f i j and t is odd (e.g., a ⊕ b ⊕ c =ā ⊕b ⊕c); thus such a tamper pattern will be detected in PGs with odd cardinality. Thus, in order to defeat the aforesaid strategy we will always embed an m ′ × n ′ 2D code in an FPGA, where at least one of m ′ and n ′ is odd; we will thus have PGs with odd cardinality of m ′ and/or n ′ . It might be argued, that in such PGs, the adversary can insert the "complementing" tampers in only an even number of functions in each PG. But there are two problems with this strategy: (a) Since PGs of an ECC intersect each other in complex ways, it may not be possible to satisfy the requirement of having an even number of complementing tampers in each PG; (b) even if this were possible, the adversary still has to know the ECC structure (its PGs), and, as we determined earlier in this section, this is a practically impossible task. Note that the first masking-tamper pattern also requires knowledge of the embedded ECC structure.
Finally, consider two output functions f 1 and f 2 that belong to the same PG. If they are neither identical nor have complementing tampers, then it seems extremely difficult to tamper with the logic of these functions so that f 1 (I r ) ⊕f 2 (I r ) = f 1 (I r ) ⊕ f 2 (I r ) for each test vector I r . Thus, it appears that the only viable way for an adversary to insert undetectable tampers is to insert the aforementioned two types of tampers randomly (since the embedded ECC structure will not be known) and hope that there will be an even number of tampers (including 0 tampers) of each of these types in each PG. We note that this pattern of even tampers in each PG is a necessary condition for undetectable tampers, irrespective of the tamper type; that is, tampers do not have to fall into our two categories of masking tampers in order to have to satisfy the aforestated pattern across the entire embedded ECC in order to be undetectable. Thus the even-tampers-in-each-PG pattern is a necessary condition for undetectability (masking), but not sufficient. Sufficiency for tamper undetectability is provided by the combination of an even tamper pattern and the tamper type belonging to one of our two categories (or to either of these two categories, in which case it is necessary (and sufficient) to have an even number of tampers of each type in each PG).
Masking probability. Since, as we have determined, an adversary cannot, for all practical purposes, extract the randomly embedded ECC's structure from the FPGA circuit, he/she has no choice but to throw in tampers at random on the CLB array and hope for masking. We also note that the minimum number of tampers required for a 2D code to meet the even tampers per PG pattern is four, as illustrated in Figure 3 (a): two tampers in each PG of an intersecting group of 4 PGs. Furthermore, to have the highest probability of masking he/she should have exactly 4 tampers; less than 4 means this probability is 0, more than 4 means every subset of 4 tampers should conform to the masking pattern in the embedded 2D code shown in Figure 3 (a), and the probability of such an event is less than that of exactly 4 tampers fitting this pattern. We thus analyze the probability of masking p mask with 4 tampers in an m × n 2D code with tampers thrown in at random. A masking pattern is a 2×2 subarray of the 2D code, and a random insertion of 4 tampers will conform to this pattern with a probability of
; the numerator of the exact term is the number of 2 × 2 subarrays, while the denominator is the number of ways in which the 4 tampers can be distributed in the CLB array.
A medium-size FPGA like the Spartan-3 that we have used for our experiments has a 24 × 20 CLB array and thus p mask with a randomized 2D code mapped to it is 2.4 × 10 −5 . While this probability is miniscule, in our techniques it is even lower by orders of magnitude due to the following.
(a) We actually map a 2D code of the appropriate size to the function output array of the CLBs (as opposed to the CLB array); 5 thus if there are t outputs per CLB, then close to a √ tmn× √ tmn 2D code is mapped to the output array (a square 2D code minimizes p mask ).
6 Thus
For the Spartan-3, each CLB has 28 outputs, and assuming roughly 70% of the outputs are used, t ≈ 20, a 100 × 96 2D code is mapped to the output array, and p mask ≈ 6.4 × 10 −8 . (b) As we will see in Section 5, the CLB outputs are partitioned into four categories based on their function structures, and four independent 2D codes are mapped to these four output arrays. Assuming for simplicity that the four output arrays are of the same size (tmn/4), p mask for 4 tampers = (the probability that all 4 tampers fall in one output array) × (probability of a masking pattern in the containing output array). Thus
For the Spartan-3, we have a 50×48 size for each of the four output arrays, and p mask ≈ 16 × 10 −9 .
5 Consider a 2 × 2 CLB array C(1, 1), C(1, 2), C(2, 1), C(2, 2), and let each CLB C(i, j) have two outputs f 1 i, j and f 2 i, j . Thus we have a 4 × 4 output array and we map a 4 × 4 2D code to this output array (instead of mapping a 2 × 2 2D code to the CLB array). 6 Assume there are M outputs and we map an a × M/a 2D code to the array of M outputs. Thus
Setting the preceding derivative to 0, we get a = √ M for minimizing p mask . In general, to obtain a certain trust or detection probability P d = 1 − p mask , we can partition each of the four subarrays into smaller subarrays and embed separate 2D codes (or any other ECC like the full-2, full-3, 3D code [Gibson et al. 1989; Dutt and Mahapatra 1997] ) in each such subarray. With k CLB function output subarrays of equal size, p mask is given by
and thus the required k can be determined from the desired P d .
THE SPARTAN-3 FPGA CLB ARCHITECTURE
We briefly cover relevant features of the CLB architecture of the Xilinx Spartan-3 family of FPGAs for which we have instantiated our trust design and checking techniques. The CLB architectural features of another popular family of Xilinx FPGAs, the Virtex 4, are similar, and our trust design and checking techniques are easily extendible to it. An FPGA is an m × n array of CLBs. The CLBs can be programmed to implement combinational or sequential logic functions. All the CLBs are identical before they are configured. There are also input/output blocks (IOBs) and routing channels to provide interface between the package pins and the internal circuit. Each CLB · 6: 13 comprises four interconnected slices, as shown in Figure 5 . All four slices have the following elements: two logic function generators or lookup tables (LUTs), two storage elements, wide-function multiplexers, and carry logic as shown in Figure 6 . A slice has 16 inputs and 9 outputs. All the inputs/outputs are externally accessible, except inputs CIN, FXINA, FXINB, and F5 (shown in Figure 6 ) which can be accessed only internally within the CLB. We define an input/ output as externally accessible (EA) if it can be connected to a routing channel (via a switch matrix).
Combinational logic functions. As we can see from Figure 6 , a LUT has four inputs, and thus can implement at most a 4-input function. Moreover, if we drive the outputs of the two LUTs to the 2:1 multiplexer (mux) F5MUX, then, also counting its select input, its output F5 (which is non-EA) will be at most a 9-input function. If needed, F5 can be transmitted via another 2:1 mux FXMUX to its EA output X in order to access external routing.
Sequential logic functions. The output of some combinational logic components in a slice drives the D-input of storage elements whose outputs are thus sequential. As we can see from Figure 6 , the storage element has four more input signals: CK, CE, SR, REV. CK is the clock input signal; CE is the clock enable signal. SR is a Set/Reset signal (its function is determined by a configuration bit): When SR is asserted, the storage element is Set or Reset depending on its configuration. REV is a reverse signal which is used together with SR to force the storage element into the state opposite from that achieved by asserting SR. In the rest of the article, we omit REV, as it is related to SR (see footnote 7). So taking the SR and CE signals into account, the sequential logic in one slice can be at most an 11-input function, since its data input can at most be a 9-input function.
Dedicated multiplexers. Each slice has a F5MUX and a second expansion mux, called the FiMUX, that functions as either an F6MUX, F7MUX, or F8MUX; their interconnections are shown in Figure 7 . Each FiMUX receives inputs from muxes in the previous level; for example, the two F6MUX outputs drive the F7MUX. As mentioned earlier, F5 can be at most a 9-input function.
With the F6MUX, F7MUX, and F8MUX interconnections in a CLB being tree structured as shown in Figure 7 , and noting that every mux has a select input, the outputs of F6MUX, F7MUX, and F8MUX can be at most 19-input (9+9+1), 39-input, and 79-input functions, respectively.
Carry logic. In a slice, seven multiplexers CYMUXG, CYSELG, CY0G, CY-MUXF, CYSELF, CY0F, and CYINIT control the carry output COUT as shown in Figure 6 . When forming a carry chain through multiple slices in a column of CLBs, the initial carry-in is BX, an EA input. In the rest of the carry chain, the COUT of a slice is connected to its next slice's CIN, which is a non-EA input. At the end of the carry chain in the last CLB of the column, the COUT of the last slice serves as the carry-out of this carry chain, as shown in Figure 12 .
ISSUES IN TPG DESIGN
We discuss here various issues related to the design of TPGs with the goal of reducing TPG size (number of output bits), check time, and check logic complexity.
Efficient Controllability of Non-EA Logic Inputs
Non-EA signals into a slice (e.g., FXINA, FXINB, CIN, F5) come from other slices, and thus, depending on the slice's configuration, its output(s) can be functions of non-EA inputs; let f be one such output. These inputs are not directly accessible by the TPG for checking such slice outputs. The non-EA inputs will, however, be functions of EA inputs of the slices they come from, and if we propagate backwards from all non-EA inputs of f until we reach all EA inputs (of adjacent/connected slices), then we will find a set of EA inputs, some in other slices, that f is a function of (see Figure 11a) . A straightforward way to check f is then for the TPG to provide test vectors to feed these EA inputs. Unfortunately, the number of these inputs could be very high (e.g., 79 inputs for the output Y of F8MUX, as discussed in Section 3, when its inputs come from a mux-tree configuration formed of muxes in multiple slices; see Figure 7 ), thus making it not only expensive in terms of TPG cost but also practically impossible to check exhaustively.
An alternative approach to checking such f 's is to reconfigure, for the purpose of trust checking, the slices that generate the non-EA inputs to f 's slice so that the non-EA inputs are functions of very few EA inputs of their slices; see the conceptual depiction of this approach in Figure 11b . This makes f a function of a few EA inputs in the locally reconfigured FPGA, making it much less expensive and practical to check f ; note that f 's slice is not reconfigured, and the checking process can thus verify the correctness of the configuration of f 's slice, which is the purpose of the trust checking of f . Since f will be a function of not too many EA inputs for the purpose of checking, the check logic will also be a function of only a few inputs. Thus this local-reconfiguration-based approach for slice outputs with non-EA inputs reduces both TPG and check logic cost, and also makes exhaustive checking (generation of all input vectors to check a circuit slice output) practical. Further details of this approach are given next. We define uncuttable routings as routings from outputs of a slice to non-EA inputs of another slice in the same or different CLB; see Figure 11 . Due to the presence of inaccessible inputs of a slice, we divide the configuration of a slice into three parts for the purpose of efficient trust checking: (1) Fully accessible configuration (FAC) is a configuration (logic function) which has all EA inputs (Figure 8) . (2) Partially accessible configuration (PAC) is a configuration for which some inputs are non-EA inputs (Figure 11(a) ). (3) Uncuttable routings.
In Figure 11 (a), the PAC has non-EA inputs Y 1 , . . . , Y m and EA inputs Z 1 , . . . , Z k . We can see that the non-EA inputs Y 1 , . . . , Y m of the PAC in slice 2 are connected to the outputs of the FAC in slice 1 through uncuttable routings. Thus Y 1 , . . . , Y m are functions of the FAC's inputs X 1 , . . . , X n . Thus to check this PAC in a straightforward, though expensive, manner, we will require n+ k TPG outputs to drive these EA inputs. This is generally much larger than would be required if we had direct access to the m non-EA inputs (in which case we would require m+ k TPG bits), since in most cases n ≫ m. In Step B of the TPG connection determination algorithm, R&T, given in Figure 10 , we give an approach to reduce the required TPG outputs (test vector size) for checking PACs.
Example of step B.1 of Algorithm R&T (Figure 10 ). Consider the checking of the F6MUX which is a PAC. As we can see from Figure 9 (a), the F6MUX with output Y has non-EA inputs FXINA and FXINB that are connected to two F5 outputs, which are outputs of F5MUXes. As we mentioned in Section 3, F5 can be a function of at most 9 EA inputs (see Figure 9(a) ). Thus Y can be a function of at most 19 EA inputs, which means that we need a 19-bit TPG in order to check Y via all its EA inputs. Furthermore, the parity check function of the parity group to which Y belongs will be complex (at least a 19-input function). However, as shown Figure 9 (b), for trust checking, we can reconfigure the LUTs feeding the F5MUXes so that LUT 1 generates 0 and LUT 2 generates 1 (both 0-input functions). For such a configuration, F5 = BX (the select input of each F5MUX) and hence F5 will be a function of only one input. As a result, Y will be a function of just 3 EA inputs (the two feeding F5MUX's select inputs BX's and the F6MUX's own select input BY). We thus decrease the number of test vector bits needed to check Y from 19 to 3. Note that the original configurations of LUT 1 and LUT 2 are FACs and are checked · 6: 17 in a separate FPGA configuration in which the original configurations of LUT 1 and LUT 2 are maintained, and the output of F5MUX is driven to its EA output.
Determining TPG Size
We define two slice outputs X and Y to be independent if no other slice output depends on both X and Y . Note that this requires that the PACs and their driving FACs (that need to be reconfigured) cannot be in the same PG, as they cannot be tested together. Our PG determination step takes care of this constraint. The argument for sharing a TPG output among independent outputs extends to min(m, n) TPG bits for min (m, n) 
We perform exhaustive checking of each CLB output, and thus the TPG is a simple counter; we define the size T of the TPG as the number of counter bits. T will be the larger of T 1 and T 2 , where T 1 is the TPG size determined in step A and T 2 is the TPG size determined in step B of algorithm R&T (Section 4.1). As we mentioned in Section 3, the combinational output corresponding to a FAC will be a function of at most 9 inputs, and a sequential logic output will depend on 11 inputs at most. So T 1 is at most 11. We determine T 2 , the TPG size required to check PACs, as follows. As we discussed in the example for step B of algorithm R&T, the output of F6MUX can be made to depend on only three inputs during checking. Using the same approach as for checking F6MUX and referring to Figure 7 , an output of F7MUX can be made to depend on seven inputs (3+3+1), and similarly the output of F8MUX can be made a function of 15 inputs. Also, as we can see in Figure 12 , the carry logic takes an EA input BX of slice 1 as a carry-in and COUT of slice N as the carry-out of this carry chain. Through reconfiguring the FAC and PAC in the carry chain, each COUT in it can be made equal to the input CIN. Thus the final COUT of the carry chain can be made dependent on just one EA input, BX. The preceding cover all non-EA inputs; thus T 2 is at most 15, and the upper bound for T is also 15.
If we can determine a slice output f max in the application circuit that is a function of the largest number of inputs (either as a FAC or a PAC with reconfigured FACs feeding it as described in algorithm R&T), then we can determine the exact size of the TPG required as the number of these inputs (≤ 15 from the previous analysis). This is because f max has to be independent of all the other outputs. Suppose this is not the case. Then there must be another output g which depends on f max and at least one other slice output, so the input size of g must be larger than the input size of f max , which contradicts our assumption that f max is a function of the largest number of inputs. Thus, since all other slice outputs are independent, with respect to f max , by Lemma 4.1, the inputs of any of these functions can share TPG outputs with f max 's inputs (if they are in the same PG).
PARITY GROUP PATTERNS AND THEIR CHECK FUNCTIONS
Based on the number k of EA slice inputs that a slice output O i depends on (either when O i 's logic is a FAC or a PAC with local reconfiguration of feeding FACs as described in Algorithm R&T), and its logic configuration structure, we categorize four parity group patterns (PGPs). Each parity group is then formed with slice outputs of the same PGP in order to minimize the complexity of the corresponding check function. Here we discuss the check function complexity for parity groups belonging to each PGP for the case of even-parity functions. We will later extend these to random-parity functions (see Section 6). 
PGP 1: Combinational Logic Functions with k ≤ 4
The first PGP is characterized by combinational outputs of no more than four inputs. Suppose there are t outputs O 1 , O 2 , . . . , O t , all of them are independent, the number of their inputs is at most four, and they are outputs of combinational logic. According to Lemma 4.1, the TPG size needs to be at most four so as to feed the inputs of all the O i 's in order to check them simultaneously as outputs in the same parity group. The check slice output function O c will be
and thus O c will also be a function of no more than four inputs (note again that during checking, the inputs of each O i are fed by the same test vectors; see Figure 3 (a)). Since a LUT can implement any function of four inputs, for a parity group belonging to PGP 1, we just need one LUT to implement the check function of all t outputs.
PGP 2: Sequential Logic Functions with k ≤ 6
PGP 2 is characterized by outputs of storage elements whose data input (D input) is a combinational logic function of at most four inputs; see Figure 13 .
The sequential logic may have two more input signals, SR and CE. The combinational logic part of a PGP-2 output is the data input to its storage element, which may be configured as a flip-flop or a latch. We obtain the XOR of all the combinational logic parts of PGP-2 outputs to determine the combinational logic part of their check function. Then, as shown in Figure 13 , we connect this combinational parity check function to the data input of a storage element to form the final sequential parity check function of the PGP-2 outputs. Thus, we need just one LUT and one storage element for a PGP-2 parity check function.
We term the outputs of storage elements as SE outputs and the outputs of any combinational logic as non-SE outputs. For an SE output O i , we denote the combinational logic function input to its storage element as comb (O i ), as shown in Figure 13 . In an SE output, besides the combinational logic data input, there are two more inputs that affect the output: SR and CE that were introduced in Section 3. For an SE output, there are thus four different sequential (SE) configurations: (1) It has both SR and CE; (2) it has SR only; (3) it has CE only; (4) it has neither SR nor CE. If the storage element does not have SR or CE inputs, it means that the storage element will not be Set/Reset or that the clock is always enabled. It so happens that SE outputs with different SE configurations cannot be parity-checked simultaneously, as we prove next. The previous lemma implies that SE outputs with different SE configurations will need to be partitioned into different parity groups, which will increase the total check function cost. However, this requirement can be circumvented by configuring in, during the trust-checking phase, SR and/or CE signals to those SE outputs which do not have them as inputs. This enables us to check all of them in one parity group, and in only one check slice configuration, as established in the next theorem. 
PGP 3: Shannon's Expansion Functions with 4 ≤ k ≤ 11
As we mentioned in Section 3, when driving two LUTs' outputs to F5MUX, we can get at most a 9-input function at the output of F5MUX. We term the output of F5MUX as an "intermediate" output f v . It can serve as a D-input to the storage element in the slice and thus become an SE output XQ (which can be at most an 11-input function if it has SR and CE inputs), or it can bypass the storage element and be a non-SE output X, as shown in Figure 14 . f v can be represented as a Shannon's expansion expression:
where I is the mux's select signal, and f and g are the functions implemented by the two LUTs driving the data inputs of F5MUX. Outputs that fall into PGP 3 are those that are directly driven by f v (X in Figure 14) or the output of storage element driven by f v (X Q in Figure 14) . Consider s such intermediate outputs of F5MUXes. We have
. . .
The parity check function O c of these intermediate outputs is
We prove the following result about the structure of O c . 
PROOF. We prove this by induction on s. Induction Basis. We prove the theorem statement for s = 2.
Induction Hypothesis. The theorem statement is true for s = k. Induction
Step. We need to show that, given the basis and hypothesis, the theorem statement is true for s = k + 1.
( from the induction hypothesis )
It follows from the preceding theorem that
where f c = f 1 ⊕ f 2 ⊕ . . . ⊕ f s , and g c = g 1 ⊕ g 2 ⊕ . . . ⊕ g s . Thus, O c also has the same Shannon's expansion function structure as the O i 's, and can thus be configured in just one slice as the F5MUX output with the mux's inputs driven by its two LUTs that implement f c and g c (similar to the configuration shown in Figure 14) .
Unlike PGP 1 and PGP 2, we will not check a set of PGP-3 SE outputs and non-SE outputs separately (in two separate parity groups), because this would require two check slices. Our technique for checking PGP-3 outputs is as follows.
(1) During the trust-checking process, change, if necessary (i.e., if the original configuration is not the following desired configuration) the configuration of each PGP-3 output so that both X and XQ are available at the corresponding slice outputs as shown in Figure 14 (the original output of the slice could have been only X, only XQ, or both).
(2) We configure the check slice in the same manner so that we can check the X and XQ outputs corresponding to each O i simultaneously (i.e., in one parity group and in one check configuration). Thus, the f v of the check slice is configured as the XOR function of all the f v 's in the parity group as derived earlier. Since the f v 's drive the X outputs directly, the X output of the check slice is the parity function of the X outputs in the parity group. (3) For the corresponding SE outputs XQ, their data inputs are the corresponding f v 's, and similar to PGP-2 outputs, and as spelled out in the proof of Theorem 5.2, during the trust-checking phase, SR and CE signals are added to those XQ's which do not have them, and also to the check slice's XQ output. Thus by Theorem 5.2, the XQ output of the check slice is always the parity of all the XQ's in the PGP-3 parity group for each of the four combinations of SR and CE values.
We are thus able to check both SE outputs and non-SE outputs of this parity group simultaneously. Note that we may check more X and XQ outputs than there are in the original circuit, but we check all the original X and XQ outputs as well.
PGP 4: Complex Output Functions with k > 4
There can be very complex functions of more than four inputs that do not fit into PGP-3 categorization. For example, the output of F8MUX can be a 15-input function during trust checking with local reconfiguration (as described in Section 4), and while the output of the corresponding parity check function of such functions will be a 15-input function as well, it may not be implementable in a single slice. Unlike the PGP-3 case, the function structure may not be preserved when XOR'ing such 15-input functions. It is thus not possible to determine the configuration required to implement parity check functions analytically, as we have done for PGPs 1-3. We need to use synthesis tools to determine the implementation of these parity check functions, and their implementation may require one or more slices.
TACKLING TPG/ORA TAMPERING VIA RANDOM PARITY POLARITIES
Our discussion so far assumed that the TPG and ORA are tamper free. However, if an adversary is able to obtain the TPG/ORA designs, he/she can tamper with them so that the actual circuit tampers go undetected. For example, a simple tamper of an ORA in which its output is changed to implement the zero function will lead to all parities being detected to be even, in spite of the presence of circuit tampers and the resulting erroneous slice outputs. Similarly, the TPG can be tampered with to skip exactly those test vectors which exercise tampers inserted in the circuit CLBs, thus not detecting them. The TPG and ORA may also be tampered with by the device programming unit during the loading of their configuration bits for trust checking. We present here a technique using random-parity vectors that ensures very high probability detection of design tampers-even in the presence of tampers, in the checking circuit (TPG and/or ORA).
Random parities. Consider that for a parity group G i , the correct TPG produces the test vector sequence (TVS) S = (I 0 , I 1 , . . . , I 2 N −1 ) where N-bit input vectors are needed to check each CLB output in G i . A tampering of the TPG may result in an incorrect (2 N )-vector TVS S ′ = (I j 0 , I j 1 , . . . , I j 2 N −1 ). In such a sequence, for example, vectors {I k } that exercise tampers inserted in circuit CLBs may not be present and thus some input vectors are repeated one or more times in the tampered TVS. Let the CLB outputs in G i be O 1 , . . . , O t , and its check function be O c . If O c is an even-parity function (the type of check function we have seen so far), then since S ′ does not exercise the tampers in G i , the outputs of G i will be correct over all vectors in S ′ , and the ORA will detect even parity for inputs O 1 , . . . , O t , O c , thereby resulting in the tamperings in G i , as well as in the TPG, going undetected.
The way around this problem is to design the check function for each parity group so that the designed-for parity for each input vector is randomly chosen to be odd or even. We term such a check function as a random-parity function. For a parity group G i , let P even be the sum-of-product expression of a set of minterms corresponding to randomly chosen input vectors of G i for which the check function O c is to be an even-parity function We define a parity vector PV(G i ) of G i as the bit vector (a 0 , a 1 , . . . , a m ) , where a j = 0(1) if O c is an even-(odd-) parity function for the j'th input vector I j , and m = 2 N − 1. In the tamper-free case, PV(G i ) will be the output vector produced by the ORA during trust checking across all input vectors in G i .
TPG-CLB masking probability. We define the distance d(S 1 , S 2 ) between two test vector sequences S 1 and S 2 as the number of positions in which they have different test vectors. We say that a tampered TVS S ′ masks tampers in the circuit CLB it checks if the ORA outputs seen during the checking of G i using S ′ is identical to PV(G i ). We further term such maskings as TPG-CLB masking, and note that these are different from the 2 × 2 patterned CLB-CLB masking (which also applies to random parities) discussed in Section 2. Since each a j ∈ PV(G i ) is randomly chosen to be 0 or 1 for determining O c , the probability that two different test vectors fed to G i produce identical parities (odd or even) is 1/2. Thus the probability p tpg−clb mask (G i ) of TPG-CLB masking is the probability 1 2 d(S,S ′ ) that S and S ′ will produce identical ORA outputs; since d(S, S ′ ) ≥ 1, this probability is upper bounded by 1/2. However, as a practical matter, a disrupting circuit tamper, and especially the insertion of Trojans, can be detected by quite a few test vectors; let the number of these vectors be m det . These test vectors will not be present in S ′ (due to a careful tampering of the TPG by the adversary, otherwise there will not be a complete TPG-CLB masking) and thus d(S, S ′ ) ≥ m det and p tpg−clb mask (G i ) would be quite low. Furthermore, if we assume for simplicity that CLB outputs in each parity group are functions of the same number N of inputs, then S is the tamper-free TVS for all parity groups. Thus, irrespective of whether a parity group is tampered with or not, the probability that the ORA will not detect any abnormality when the parity group is checked is 1 2 d(S,S ′ ) . Hence if g is the number of parity groups, then the FPGA-wide TPG-CLB tamper masking probability p tpg−clb mask is the probability that for each G i , the ORA's output is the same as the expected parity vector PV(G i ). Thus, given that the random-parity vectors are chosen independently for each parity group, we have
In a medium-size FPGA such as the Spartan-3, as discussed in Section 2, there will be in general four 50×48 2D codes mapped to the CLB-output array, where each 2D code has 50+48+2 = 100 parity groups; thus g = 400 and p tpg−clb mask ≤ 1 2 400 , an extremely miniscule value.
ORA-CLB masking probability. For a correct TVS S, we denote the output vector sequence of a parity group G i by OV S i = (OV 0 , · · · , OV 2 N −1 ), where OV j is the output corresponding to test vector I j . If a tamper in the CLB logic corresponding to output function O k ∈ G i produces an incorrect output for test vector I r , then OV r will have an incorrect parity that will be detected by a tamper-free ORA. We design the ORA as a tree of 4-bit XOR functions, where each XOR is implemented by a 4-bit LUT. When some LUTs are tampered with, and one of the tampers is exercised by the incorrect output vector OV r , then the ORA produces a reverse parity compared to the one it should produce for the incorrect OV r , and thus the error in OV r goes undetected; we term such a masking of the underlying tamper in the CLB logic that resulted in an incorrect OV r as ORA-CLB masking. Irrespective of how the tampers are distributed in the ORA and the CLBs of the parity group G i , we can capture their effect by two parameters x < 1 and y < 1, where x is the fraction of ORA inputs that exercise the tamper(s) in the ORA so that an incorrect ORA output is produced, and y is the fraction of output vectors in OV S i that are erroneous with an odd number of errors (and hence detectable by a correct ORA).
8
The probability p ora−clb mask (G i ) of masking a tampered parity group by a tampered ORA = the probability of a tampered ORA not producing any incorrect output for any output vector of a tampered parity group G i = (probability that all odderroneous output vectors in OV S i exercise ORA tamper(s)) × (probability that no other output vectors in OV S i exercise any ORA tamper). Thus
Obtaining the partial differentiations of Eq. (3) with respect to x and y, equating them to 0, and solving yields x = 1/2 and y = 1/2 for maximizing p ora−clb mask (G i ). Thus we have Fig. 15 . Our ECC-based trust design flow. Steps 1 and 2 comprise the conventional design flow, and the actual trust design flow begins from step 3. It is assumed that the conventional design flow produces a correct design that is then used by the trust design flow to insert trust structures for detection of tampers that may be inserted after the trust design flow is completed (e.g., at any point in the system integration pathway, at the user's end, or by remote attacks using high-energy radiation such as electromagnetic pulses that can flip on-chip configuration bits).
Finally, since in the previous expression for O c , each subexpression AND'ed with I andĪ is a 4-input function, O c for a random-parity PGP-3 group is also a Shannon's expansion function implementable in one slice with two LUTs and one mux driven by them (just as for the O c 's for an even-parity PGP-3 group).
Our trust design flow that uses the various concepts developed in this article is given in Figure 15 .
EMBEDDING CHECK COMPONENTS IN THE FPGA
We discuss here issues in the embedding of check components (check logic, TPG, and ORA) in the FPGA for the trust-checking phase, and the corresponding hardware overheads. There are two ways to embed check components into an FPGA circuit: one is integrated embedding, the other is nonintegrated embedding.
Integrated embedding. In this approach, we embed the check components into the unused CLBs in the FPGA along with the application circuit. During each parity group checking iteration, we just need to choose the corresponding check components and configure the required routing to connect the TPG to the inputs of the circuit and check CLBs of the parity group to be checked, and the outputs of these CLBs to the ORA. The check components in this scenario will impose a hardware overhead, and may minimally affect the timing of the original circuit. The advantage, however, of this type of embedding is the minimization of configuration time since only the new routings need to be configured for each parity group checking. An example of integrated embedding is shown in Figure 2b . Our trust design flow given earlier (Figure 15 ) is for the nonintegrated embedding scenario, though it is easily modified for integrated embedding.
Nonintegrated embedding. In this scenario, the check CLBs, TPG, and ORA are not integrated with the original circuit, but configured in when needed during the tamper trust-checking phase. During this phase, one parity group is checked at a time, and each time we embed the corresponding check components in other parity groups' space which are not being checked. In this way, our hardware overhead is 0% as for each parity group checking, we actually reuse the hardware that is not being checked. Similarly, since the check circuit components are not present during the normal circuit functioning, its performance is unaffected by this approach. Figure 16 shows a nonintegrated checking of a parity group column (for simplicity, we show here a nonrandomized parity group) of a 4 × 4 CLB array with the TPG, ORA, and check CLB configured in nearby positions. Our trust design and checking flows given earlier (in Figures 15 and 4 , respectively) are for nonintegrated embedding. After the checking phase, the original FPGA circuit (that includes nonfunctional CLBs configured with the logic zero function) is restored in the FPGA, and normal circuit operation resumes.
ORA and check slice overheads. Consider an FPGA with N CLBs. Irrespective of the size of the two dimensions of the FPGA array, we can map a "logical" square m×m CLB array onto the actual FPGA array, where (m−1) 2 < N ≤ m 2 . For simplicity of exposition, we thus assume in the overhead analysis here that the FPGA has an m × m CLB array; we thus also assume that an m × m 2D code is (randomly) embedded in it. As shown in Figure 6 , each slice has 7 EA outputs, and so each CLB has 28 EA outputs. Thus, in the embedded 2D code's (random) CLB row/column, there are 28m outputs. These outputs will fall into different PGPs. We assume that these outputs are, on average, evenly distributed across the 4 PGPs available in each row/column of the embedded 2D code, so that each parity group corresponding to each such PGP will have 7m outputs on average.
The hardware cost of the ORA of size p (number of inputs = p) is an XORgate tree (a parity detector). Each LUT can implement a 4-input XOR function and each slice has 2 LUTs. Thus the slice cost of ORA of size p is
If the ORA size is p, then each potential parity group of size X will need to be divided into ⌈ X p ⌉ parity subgroups and be separately checked; each subgroup will have its own parity check function (i.e., each such subgroup is a finallydetermined bonafide parity group). So the average check slice cost for each row/column of the embedded 2D code is 4 j=1 ⌈ 7m p ⌉ · L j , where the summation is over the four different types of parity groups, and L j is the number of slices required to implement the parity function of PGP j. Thus the total hardware cost of check slices is
where 2m is the number of rows and columns of the embedded 2D code. Hence the overall hardware cost A( p) we want to minimize is
· 6: 31 From Section 5, we know that L 1 = L 2 = 0.5 and L 3 = 1 and we assume here that L 4 = 2 on the average. To minimize the hardware overhead, we equate the differentiation of Eq. (6) with respect to number p to zero (with the simplifying assumption that ⌈ ≈ 7m slices. We assume that the average TPG (across all circuits) is an 11-bit counter (note that its maximum possible size is 15) which uses 12 slices. So the total hardware overhead percentage will be 4 √ 21m 3 +12 4m 2 × 100%. The FPGA we used in this experiment is the Spartan-3 XC3S200 FPGA, which has a 24 × 20 CLB array. We map a 22 × 22 CLB array onto it, thus m = 22. From the preceding expressions, the minimized hardware overhead percentage will on an average be 7.56% across different circuit implementations on the FPGA; this is borne out by our experimental results (see Tables I-III) .
EXPERIMENTAL RESULTS
Using the two trust-design methodologies based on even-and random-parity groups with an underlying 2D ECC developed in this article, we check three circuits: I2C Controller Core (ICC) [Herveille 2001 ] (size of 45 CLBs, 132 slices), a 64-bit Tree Comparator (TC) (size of 24 CLBs, 75 slices), and a simple RISC processor risc16f84 [Clayton 2002 ] (size of 290 CLBs, 1003 slices). Both the trust-design ( Figure 15 ) and trust-checking (Figure 4) flows were manually followed, and ISE tools were used where indicated. We obtained experimental data for the following metrics.
(a) Hardware overhead (includes TPG, check slice, and ORA overheads) for the integrated embedding scenario (for nonintegrated embedding the hardware overhead is 0). (b) Tamper detection probability P D , defined as the percentage of time inserted tampers in the circuit or unused slices (extraneous logic) are detected in their respective parity group checking, and when TPG/ORA tampers are detected during parity group checking as unexpected ORA outputs. (c) False alarm probability P FA is the % of time the checking of a tamper-free parity group results in an unexpected ORA output when the TPG and ORA are also tamper free. (d) Detection latency (DL) of the parity group checking process, defined as the number of test vectors fed across tampered parity groups before the first tamper is correctly detected. (e) Circuit time overhead, which is the increase in the critical path delay of the circuit in the integrated embedding scenario (for nonintegrated embedding the circuit time overhead is 0).
Hardware overhead. As explained in Section 7, for nonintegrated embedding, the hardware overhead is 0%. Using integrated embedding, the hardware (CLB) overhead of the three circuits is 7.5% for ICC, 7.3% for TC, and 9.8% for RISC, as we can see from Tables I-III. Fault-detection and false-alarm probabilities. We insert 3 types of CLB tampers: (i) Logic Function Tamper (LFT): tamper of only the combinational logic; (ii) Storage Element-related Tamper (SET): tamper of the sequential elements of a CLB; (iii) Interconnection Tamper (IT): tamper of internal interconnections within a CLB. We insert 1-10 tampers into the circuit CLBs in the following proportion: 60% LFT, 30% SET, and 10% IT. We also insert TPG and ORA tampers: We insert 1-3 tampers each into the TPG and ORA (half of them are LFT and the other half IT).
The average P D and P FA results for the three circuits are shown in Tables IV-V; each P D and P FA is calculated based on 20 different tamper injection experiments. While we check all parity groups of the two medium-size circuits, for efficiency, we insert tampers in and check a few randomly chosen parity groups of the RISC circuit; statistically, the P D and P FA results obtained · 6: 33 for RISC are representative of the P D and P FA that would be obtained if all its parity groups are checked.
As we can see from Table IV , for the even-parity technique, all the P FA 's are 0%, which is very promising. Almost all the P D 's are 100% except for two cases:
(1) During the experiment, we insert 1-3 tampers in the TPG circuit 20 times when there are no tampers in the circuit CLBs. The P D obtained is 0%, which indicates that TPG tampers cannot be detected when using the even-parity technique. (2) We randomly insert stuck-at-0 tampers (among other tampers) at various CLB outputs of the ORA, including the final ORA output. The latter case will always lead to a nondetection of circuit tampers for the even-parity technique. Overall (over various random tampers of the ORA), in the ORA tampered and the TPG and ORA tampered cases, the P D is 95%. As shown in Table V , in the random-parity technique, however, the P D 's are 100% for all cases, which underscores the significantly higher reliability for this technique, and its promise for FPGA trust design.
We also randomly insert extraneous logic in 1-5 empty CLBs in the following PGP categories: 50% PGP 1, 30% PGP 2, and 20% PGP 3. Average results across 10 such experiments are given in Tables IV and V; all the P D 's are 100% except the P D 's for the even-parity technique when the ORA is tampered with, while the P FA 's are always 0%. This empirically establishes that our techniques can also detect extraneous logic with very high probability (as is also theoretically evident from the discussion of our techniques) and without false alarms.
Detection latency. For the detection latency (DL) metric, for expediency we obtained results for both parity techniques for only the ICC circuit and with one tamper in it (either LFT, SET, or IT). Once again, taking the TPG and ORA into account, there are four different tamper configurations as tabulated in Tables VI and VII. As shown in Tables VI and VII, the DLs of the randomparity technique are smaller than those of the even-parity technique when the TPG is tampered with, while the DLs of the two techniques are the same when both TPG and ORA are tamper free or only the ORA is tampered with. The DLs of the random-parity technique when the TPG is tampered with are all 2's Circuit time overhead. For nonintegrated embedding, after checking every parity group, we remove the check CLBs, TPG, and ORA, so the timing of the original circuit will not be influenced. When using integrated embedding, the timing effect is shown in Table VIII . While embedding the checking circuit, the original routing may be influenced even when there is no trust checking in progress. Table VIII shows that after check components are embedded, the original circuit timings are surprisingly improved for TC and ICC circuits, but this is probably noise for the medium-size circuits. For the RISC circuit the minimum clock period increases by no more than 2%.
CONCLUSIONS AND FUTURE WORK
In this article, we presented a novel two-level randomized structural errorcorrecting-code-based method for design of trusted FPGA circuits in which design tampers are detected with very high probability during a trust-checking phase. We analytically obtained masking probabilities for the 2D code ECC under different masking scenarios (CLB-CLB, TPG-CLB, and ORA-CLB) and showed that these probabilities are miniscule. Experimental results on two medium-size and a moderately large circuit (a RISC processor) showed that we are able to detect tampers of all varieties: (i) in circuit CLBs, in terms of LUT functions, storage elements, and internal interconnections; (ii) in the checker circuit comprising the TPG and ORA; (iii) insertions of extraneous logic. With a small hardware overhead, we achieved a tamper detection probability of 100% and a false alarm probability of 0% for the random-parity technique.
We have recently developed routing-tamper checking as a significant extension to the CLB/slice checking techniques discussed here; this will be addressed in a future paper. In future work, we will develop techniques to detect IP tampering. An approach for this is to use the IP's output functions, which are known (e.g., for an FFT IP, the functions of the inputs for each output are known) to derive check logic for these functions and those of other (IP or 
