We present a chip architecture for a compressive sensing based method that can be used in conjunction with the JTAG standard to detect IC Trojans. The proposed architecture compresses chip output resulting from a large number of test vectors applied to a circuit under test (CUT). We describe our designs in sensing leakage power, computing random linear combinations under compressive sensing, and piggybacking these new functionalities on JTAG. Our architecture achieves approximately a 10× speedup and 1000× reduction in output bandwidth while incurring a small area overhead.
INTRODUCTION
Fabless integrated circuit (IC) manufacturing has become a mainstream trend today, as many companies choose to outsource the manufacturing of their IC designs to wafer foundries. While this outsourcing model may provide advantages in manufacturing cost and an access to advanced fabrication facilities, there is an increasing concern about the security of these externally manufactured chips. It is generally more difficult to enforce security measures with external manufacturers which are under separate management, and as a result there is less assurance on these chips being free from possible malicious attacks during the fabrication process.
A Trojan [1] [2] [3] is malicious circuitry implanted in, for example, a CPU or encryption IC. Trojans may be a small addition to a normal circuit and may remain dormant until triggered by a special signal. During the incubation period it could be especially difficult to detect Trojans, given that there will be no functional difference between Trojan-free and Trojan-embedded circuits.
In the literature, there are three basic premises studied for a Trojan detector: a) timing of the circuit path could be slower because of supplementary Trojan gates [4] ; b) Trojan will inevitably draw some static power [5] ; and c) physical structure of the circuit is altered [6] . When the Trojan is not on the critical path, however, it is difficult to notice the timing modification. And finding out the circuit alteration can be impractical due to costly, destructive inspection. Therefore, detecting the power consumption difference induced by a Trojan can represent an attractive alternative [7] [8] [9] [10] [11] . This approach belongs to a class of detection methods called side channel analysis. It is the approach this paper takes.
Ideally, a Trojan can be detected by observing the leakage power difference between a circuit-under-test (CUT) and the corresponding Trojan-free circuit. The total leakage power of a circuit with the gate-level details can be modeled and described statistically [11, 12] . However, because of fabrication process variation [13, 14] , the leakage current distribution varies from gate to gate even with the same input states. This means that the Trojan power consumption could be hidden in process variation.
To combat the issue of process variation, statistical methods, such as [11] , involving a large number of test vectors will be needed. In DISTROY [7] , we have proposed an I/Oefficient method to discover revealing test vectors that can distinguish a Trojan-embedded circuit from a Trojan-free circuit. The approach relies on the assumption that such test vectors are rare, i.e., they are sparse. Thus, by using compressive sensing (CS) [15, 16] , which exploits signal sparsity, we can efficiently find the revealing test vectors from a large candidate pool (i.e., the test vector space) without incurring excessive chip I/O.
In this paper we describe a chip architecture for a compressive sensing based IC Trojan detection method under the DISTROY framework [7] . We explore architecture issues for various CS regularity conditions such as the restricted isometry property (RIP) [15] . Our goal is to provide an IC design house or a foundry with a realistic and low-cost Trojan detection infrastructure.
Our architecture leverages the commonly used Joint Test Action Group (JTAG) boundary scan standard [17] . We therefore name this architecture compressive sensing-JTAG, or CS-JTAG. CS-JTAG provides not only the original function of JTAG but also compressive encoding capabilities for efficient detection of possible malicious implants.
The rest of the paper is organized as follows. We explain 
COMPRESSIVE SENSING BASED TROJAN DETECTION

Compressive Sensing (CS) Fundamentals
Compressive sensing [15, 16, 18 ] is a new signal processing paradigm that can acquire a sparse signal representation with linear random projections. It allows encoding a signal with sparsity constraints and recovering the signal with only few compressed measurements. More precisely, suppose that we can represent a signal x, an N × 1 vector, as a sparse approximation using a basis Ψ:
where the ψ i are basis functions, with K N . Then x is said K-sparse and compressible. CS encoding computes an M × 1 measurement y of compressive measurements, each random linear combination of K-sparse x using an M × N measurement matrix Φ:
Matrix Φ is also called a sensing matrix. It has randomly chosen entries.
To reconstruct the signal x from y, we use an underdetermined linear system given by Eq. 3 where there are more unknowns (N ) than equations (M ). The CS theory states that the signal can be reconstructed, with M being as small as cK log(N/K) for a small constant c, using 1 -norm minimization:
Note that a number of optimization algorithms can be used such as compressive sampling matching pursuit (CoSaMP) and iterative soft-thresholding (IST). The CS theory shows that more measurements result in more accurate recovery.
CS-based Trojan Detection Approach
Our proposed CS-based approach ( Fig. 1 ) reduces I/O bandwidth requirement while maintaining the same testing quality. First, we avoid inputting test vectors by generating them on-chip. Second, chip output is reduced from N power measurements to M random linear combinations of these measurements, where M N . The off-chip Trojan detector then recovers the most significant power variations from the M power measurements.
An on-chip test vector generator (TVG) generates N test vectors, v 1 , v 2 , ..., v N that are applied to the CUT. The corresponding N leakage power measurements, x c1 , x c2 , ..., x cN (x C ) are then compressed into M linear combinations, y c1 , ..., y cM (y C ), on-the-fly through multiplication by a sensing matrix Φ. The simulation reference ("gold" measurement), x G is also multiplied by the same Φ to get y G .
We perform CS reconstruction of x C −x G off-chip based on y C −y G . Since the vector x C −x G is expected to be sparse, the required number of measurements M can be small. Based on the recovered x C −x G , statistical analysis proposed in [7] is then applied to examine suspicious chips and make the final decision about whether or not the CUT is deemed to be Trojan-embedded. Note that CS reconstruction can be performed off-chip with highly parallel computing platform such as GPU. In this paper, we focus on the on-chip sensing architecture.
CS-JTAG ARCHITECTURE
Our chip architecture aims at providing an automatic selfexamination scheme for discovering malicious Trojans without complicated interfaces. As shown in Fig. 2 , the on-chip detecting architecture includes a TVG, a leakage power sensor (LPS), a measurement generator (MG), and the CS-JTAG controller governing the original JTAG controller. Except LPS which is analog, all the other circuits are digital. The following subsections describe the data flow and major modules in detail. Fig. 3 depicts the data flow and work scheduling of the architecture. The TVG applies each test pattern, v i , to the CUT per clock cycle. After the CUT settles down, the LPS measures each leakage power sample, x ci , corresponding to its test pattern. The MG performs matrix-vector multiplication to form compressed measurements y ci while receiving x ci . Note that all y ci are produced at the same time after the last leakage power data x cN is received. Each y ci is outputted afterwards.
Data Flow and Schedule
CS-JTAG Controller
The JTAG standard was originally developed for boundary scan and internal device tests during chip production. depicts the original JTAG controller and how it interfaces to our CS-JTAG controller. There are three input ports to control JTAG controller: clock decides the debug operating frequency, enable controls the state transition of output control signals, and input allows the user to insert the test vector and the expected result one bit per cycle. The JTAG operating stage can be divided into three steps, get, shif t, and set. Each step individually manages the register scan chain around the CUT to get the output data, shift the test vector along the scan chain, and update the input register buffers. Similarly, our proposed architecture is also a three-step process, parallel test vector insertion, leakage power sensing, and measurement generation. It implies that the JTAG ports can be used as a chip infrastructure for our Trojandetection purposes. To this end, CS-JTAG manipulates a new T rojan enable signal to control the state transition of the JTAG. The consequent JTAG output control signals then are utilized to control the three new modules. As a result, no additional port is needed for the CS-JTAG architecture and the CS-JTAG controller is simplified to merely enable the JTAG controller to support CS related operations.
Leakage Power Sensor
Since ICs typically operate at a fixed voltage, we can refer to leakage current I o and leakage power P L interchangeably. As illustrated in Fig. 5 , we duplicate the circuit current by a current mirror to make I = I o . By fixing the resistance R, 
An analog-to-digital converter (ADC) is applied in LPS to calculate the voltage drop, as suggested by [19] . After knowing V + − V − , the circuit leakage power can be determined as depicted in Eq. 5. For calculating P L of each test vector, the ADC is needed to be fast enough to provide power results in every cycle. However, the leakage current is usually too small to drive ADC. An opamp A is designed to increase the current driving capability and accelerate power settling time.
Measurement Generator
The measurement generator comprises circular shift registers (CSRs), linear feedback shift registers (LFSRs), and selective adders (Fig. 6) . The compressed measurement y ci is computed in a shift and accumulate manner, using a Bernoulli sensing matrix with coefficients φ ij ∈{+1, −1}. The original leakage power data is sparse in time domain and the Bernoulli sensing matrix is incoherent with our representative basis.
Note that each y ci is the inner product of a row vector in Φ and x C . That is, y c1 = φ 11 
DESIGN CONSIDERATIONS
Operating Frequency Aspects
We denote the sampling frequency of LPS by f LP S , and the operating frequency of MG, CUT by f MG , f CUT respectively. To measure stable power data, the f LP S should not be smaller than f CUT . For instance, if f CUT is 100 MHz, f LP S should be at least 100M samples/s. We denote the maximum f CUT by f CUTmax and consider the following cases.
• High capability LPS, f LP S ≥ f CU Tmax . In this case, the CS-JTAG achieves maximum throughput and shortest latency for generating y ci , provided that f MG ≥ f LP S . Given that the testing cannot be performed faster than f CUTmax , there is no need to design f LP S to be larger.
• Low capability LPS, f LP S < f CU Tmax . The CS-JTAG bottleneck is determined by f LP S . Note that the design constraint f MG ≥ f CUT holds in both cases. However, the area-efficiency may increase by selecting a higher f MG (discussed in 4.2). Depending on the target testing frequency and throughput requirement, we then design the architecture with suitable f LP S and f MG .
Area Aspects
The most area-intensive part lies in MG (Fig. 6) . Thus it is significant to reduce the area of MG. Here defines a frequency ratio r = f MG /f LP S and a parallelism ratio h = M/r . After receiving a power data x ci from LPS, MG has r clock cycles to calculate all M partial sums until the next x ci comes. In each cycle, MG calculates h partial sums in parallel (i.e., MG calculate partial sums of y c1 ∼y ch at the first clock cycle and those of y ch+1 ∼y c2h at the second, etc.). Thus, there are total h CSRs, h LFSRs, and h selective adders in MG. Each CSR consists of r registers. We now discuss three factors influencing circuit area. • Number of Measurements, M . MG requires M registers for buffering partial sums of compressed measurements. Hence, the area cost is linearly proportional to M .
• Frequency Ratio, r. With M fixed, low frequency ratio results in high parallelism of MG. This indicates more selective adders and LFSRs are required.
• Bitwidth of Measurements, BW (y ci ). The bitwidth of y ci , defined as BW (y ci ), directly affects the gate counts of registers in MG. We can reduce BW (y ci ) but still maintain compression quality. x ci is firstly biased by a reference value in the middle range of leakage power. Because the Bernoulli random matrix has coefficients in {+1, −1}, we then reduce BW (y ci ) depending on the coefficient distribution.
PERFORMANCE ANALYSIS
Circuit Under Test with Embedded Trojan
As a test circuit for CS-JTAG, we designed an encryption circuit (Fig. 7) including an deliberately-inserted Trojan for performance evaluation purposes. The circuit uses a LFSR to generate pseudo random number for an XOR-based cipher and produces the corresponding cyclic redundancy check code. The Trojan is activated while triggered by a specific input pattern. It then changes the seed and load control of the pseudo random number generator, resulting in an unreliable cipher. The Trojan-free circuit can be obtained directly by removing the Trojan part and connecting the corresponding ports. In the simulation, we set the 64-bits seed to be a constant and use exhaustive (N =2 16 ) test vectors for 16-bits data input. The maximum testing frequency f CUTmax is assumed to be 200MHz. All circuits, including CS-JTAG and circuit under test, are synthesized and simulated in 90nm general purpose process. Without CS, the output time dominates the system performance. It requires about 3.27 ms to finish testing all 2 16 vectors if f LP S =f MG =200MHz. With CS, the testing time is reduced to 0.33 ms, implying about 10× speedup. Note that as f LP S is extremely slow, power sensing time becomes critical and there is no gain by adopting CS. Fig. 9 shows the output bandwidth reduction ratio. When y ci is 26-bits and M is 32, the proposed approach achieves about 780× improvement against the baseline approach. The improvement is even higher if y ci is 16-bits and M is 8. Besides, it results in about 22% area reduction of MG while the bitwidth of y ci is reduced from 26 bits to 16 bits.
Synthesis Results
Simulation Results
In the CS-based Trojan detection scheme, we only need to find the most revealing vectors. With its the largest-first decoding property, CS decodes the largest abnormal power consumption values first. In our design experiments, we set sparsity K to be less than 8, M to 64, N to 2 16 , and y ci to 16 bits. f LP S is set to 200MHz. Note that the system should follow M =cK log(N/K) constraint to have correct reconstructions. The total area cost of CS-JTAG is about 32K at 200MHz. The Trojan-embedded circuit is about 2K gates with Trojan being approximately 0.18K gates (about 8% of the total area of the CUT). We then perform gate-level circuit power analysis using simulation CAD tools given this design specification. We show the leakage power distribution for the Trojan-embedded circuit and the Trojan-free circuit, respectively, under process variation. Note that here we only illustrate leakage power values for 1024 consecutive test vectors (Fig. 10) . The proposed architecture is further adopted to encode the sensed power data of CUT on-line into compressed measurements. By applying our previous work, DIS-TROY [7] , the compressed measurements are off-line decoded to discover several vectors that can separate the leakage power distribution of the Trojan-free circuit from that of the Trojanembedded circuit. Therefore, the statistical analysis in DIS-TROY can then determine the false positive rate and the detection rate for detection decisions. In the future, we plan to verify the proposed architecture and the testing procedure through the hardware implementation.
CONCLUSION
Compressive sensing based detection of IC Trojans is capable of identifying test vectors which reveal Trojans, without subjecting to excessive amounts of chip I/O. In this paper, we have shown a chip architecture to realize the required chip functionalities such as sensing of leakage power and computing of random projections. In addition, we have shown an approach leveraging the existing JTAG architecture. Based on these results, we conclude that chip realization of compressive sensing based IC Trojans detection is feasible.
Acknowledgment
This material is in part based on research sponsored by Air Force Research Laboratory under agreement number FA8750-10-2-0180. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory or the U.S. Government.
