This paper presents a resilient cache memory for dynamic variation tolerance in a 40-nm CMOS. The cache can perform sustained operations under a large-amplitude voltage droop. To realize sustained operation, the resilient cache exploits 7T/14T bit-enhancing SRAM and onchip voltage/temperature monitoring circuit. 7T/14T bit-enhancing SRAM can reconfigure itself dynamically to a reliable bit-enhancing mode. The on-chip voltage/temperature monitoring circuit can sense a precise supply voltage level of a power rail of the cache. The proposed cache can dynamically change its operation mode using the voltage/temperature monitoring result and can operate reliably under a large-amplitude voltage droop. Experimental result shows that it does not fail with 25% and 30% droop of V dd and it provides 91 times better failure rate with a 35% droop of V dd compared with the conventional design. key words: design for robustness, cache, variation tolerance, 7T/14T SRAM
Introduction
Technology scaling increases the threshold-voltage (V th ) variation of MOS transistors mainly because of random dopant fluctuation, NBTI and RTN. The minimum operating voltage (V min ) of SRAM cell increases as the V th variation increases with technology scaling, which degrades operating margin of a processor. A processor with a shrinking operating margin is more susceptible to power supply noise, IR drops, and temperature fluctuations. Especially, electric control units in electric vehicles suffer large temperature fluctuation and large voltage fluctuation/droop caused by motor noise, EMIs, voltage surges, and sudden interruptions in wiring harness connections. A sudden interruption, for example, can cause disconnection of the ECU to the power supply for several milliseconds. Power supply cir-cuits implemented in the ECU have large capacitors to improve tolerance against sudden interruptions. If the capacitance is hundreds of microfarads, then the voltage droops caused by the sudden interruptions are reduced to less than 20% droop, with droop duration in the milliseconds. But the use of a large capacitor for the ECU should be avoided for reason of reliability, cost, and size. Consequently, voltagevariation (the voltage droops of 20% V dd ) and temperaturevariation tolerant processors are needed for ECUs in electric vehicles.
Earlier designs [1] - [3] have addressed timing errors caused by a high-frequency (ca. 100 MHz) voltage droop. A tunable replica scheme [4] can reduce V min of SRAM by 9% under 13% voltage droop. However, they cannot mitigate embedded SRAM margin failures caused by large amplitude (ca. 20% of V dd ) voltage droops. An SRAM block in a processor with high integration and minimum-size transistors determine the V min of the entire processor. For dynamic variation tolerant processors, a fault-tolerance cache is necessary.
A common fault-tolerant cache architecture uses redundant columns/rows [5] . The architecture requires many redundant columns/rows to accommodate the large number of faults. The columns/rows are inefficient in low failure rate situations. The PADed cache proposed in [6] uses a programmable decoder to remap faulty cache lines to nonfaulty ones. As another solution, the error correction code (ECC) has been applied to caches [7] - [9] . Two-dimensional ECC proposed in [8] combines vertical and horizontal error coding. In [9] , 1-bit ECC is applied to cache blocks uniformly. Blocks containing two or more defective cells are protected selectively with multi-bit ECC. The Multi-bit ECC check bits and the block locations are stored in a small dedicated cache. These techniques are not effective for large amplitude voltage droops that cause many faults.
Herein, we present a resilient cache memory that can perform sustained operations under a large-amplitude voltage droop. To realize sustainable operation, the resilient cache exploits 7T/14T bit-enhancing SRAM, which has a more reliable operation mode and on-chip voltage monitoring circuit.
Copyright c 2014 The Institute of Electronics, Information and Communication Engineers 
Proposed Resilient Cache
The resilient cache ( Fig. 1) is a 256 KB 8-way cache memory array with 7T/14T bit-enhancing (BE) SRAM bitcell structure [10] , voltage and temperature monitoring circuits [11] , and an autonomous resilient cache controller. Each memory block can be switched individually its power supply to the power supply for runtime operation (V dd rt ) or the power supply for testing (V dd test ). The power supply of the monitoring circuits and the power supply of the controller are separated from the power supply of the memory blocks. The local power rails of the memory blocks are monitored by voltage monitoring circuits, which can obtain a precise supply voltage level at a testing time and monitor a voltage fluctuation during runtime. Furthermore, a temperature monitoring circuit can sense the on-chip temperature. The temperature information recorded at a testing time is used in a temperature correction of the V min . The autonomous resilient cache controller comprises an autonomous controller and an online testing controller with a test module and data transfer unit. The online testing controller can execute memory testing that is completely transparent to user accesses. The controller obtains an operating margin and V min of the memory block. The autonomous controller controls a probing point of the voltage monitor and reference voltage (V ref ) using the external DAC. It receives results from the monitoring circuits. The results are used for voltage droop detection and block-basis voltage droop control, as described reminder of the paper.
7T/14T bit-Enhancing SRAM Bitcell
Each SRAM cell in the proposed resilient cache comprises 7T/14T BE SRAM cell structure [10] . The 7T/14T BE SRAM cell has a pair of conventional 6T SRAM bitcells. The internal nodes of the pair are connected directly by two additional PMOS transistors as presented in Fig. 2 . This structure of 7T/14T BE SRAM provides an additional oper- ation mode designated as the enhancing mode along with the normal mode. The two modes of 7T/14T BE SRAM are presented in Table 1 . Figure 3 shows bit error rates in 7T/14T BE SRAM and in the other scheme. In enhancing mode, the added transistors are activated and BE SRAM features reliable operations especially at low voltages by combining two bitcells.
On-Chip Monitoring Circuits
On-chip monitoring circuits, presented in Fig. 4 , comprise a source follower (SF) and a latch comparator (LC) [11] . The on-chip monitoring circuits are area efficient and sense accurate voltage level of the SRAM array, in addition to the cache temperature. Therefore, they are suitable for use in online built-in self-tests (BISTs) and voltage droop detection. Figure 5 shows the block basis online testing scheme for the proposed resilient cache. The online test controller conducts memory testing on each memory block in order of the physical block address. The supply voltage of the testing block is decreasing gradually during the testing time. The controller records the testing voltage and temperature from the on-chip monitoring circuits with respect to each operation mode of BE SRAM at which the first failure is detected. The resilient cache still has cache lines to which data can be allocated even if memory testing is working because it is block-basis testing. The memory blocks, except the current memory under test (MUT) block, are still accessible. Thereby they can operate as runtime (RT) blocks.
Block Basis Online Testing
The testing controller uses the test bus separated from the user bus. The proposed testing scheme is transparent to the processor operation. Although one cache way cannot be used during the testing, the IPC performance degradation in the SPEC 2006 [12] benchmarks is less than 1%. The test is conducted periodically. The testing cycle can be regulated outside the cache (e.g. a cycle responding to a control period of the software). The IPC degradation is 1% at most, although it depends on the testing cycle.
The flowchart in Fig. 6 depicts the online testing flow. At the beginning of memory testing on each block, the data transfer unit transfers data from the MUT block to the previous MUT block. The MUT block power supply is switched to testing voltage (V dd test ). After switching, a testing is executed to evaluate whether the failure is detected or not. If not detected, then V dd test is decreased by one step and the testing is executed again. If detected, then the voltage at that time is recorded with temperature. Having completed the testing on one block, the online test controller sets V dd test to a nominal V dd and changes next block into MUT. This flow continues until all blocks have been tested.
Operation of the data transfer unit is depicted in Fig. 7 . First, physical block 0 is tested. Physical blocks of 1-7 operate as runtime blocks. Next, the data transfer unit transfer data from physical block 1 (next MUT block) to physical block 0 (previous MUT block). After the transfer, physical block 1 is tested. Physical block 0 and physical blocks 2-7 operate as runtime blocks. In this way, the MUT block moves among 8 blocks without losing the memory contents.
An example of test results is presented in Fig. 8 . The online testing controller has a test result table to record V min corresponding to temperature. The recorded testing volt- Fig. 9 . V dd is monitored by the monitoring circuit If the gradient is greater than the threshold value, the controller estimates that the V dd crosses V min normal . If not, the controller estimates that the V dd does not cross V min normal (shown in Fig. 9(b) ). The resilient cache changes the operation mode to the 14T enhancing mode at the voltage below V min normal . The controller may miss detecting very steep droops and very slow droops. The very steep droops are caused by high frequency noises. SRAM cell is less susceptible for the high frequency noises [14] . The reconfiguration of the cache is accomplished even in case that the voltage droop is very slow because the autonomous controller changes operation mode to 14T enhancing mode when the supply voltage falls below specified level.
This voltage variation adaptive control scheme is performed in a block-basis manner. Only blocks for which the V dd drops below its V min normal change the operation mode to the 14T enhancing mode as presented in Fig. 10 . The other blocks keep the operation mode as the 7T normal mode. Dirty lines in the proposed cache can be written back to main memory even if the V dd drops by 35% in 100 µs.
To reconfigure the blocks, the tag array of the resilient cache must be modified. One bit is added to the tag bits in each cache line. The comparators for the tag comparison must be extended for the additional bit. The additional bit holds MSB of the index and is compared as the LSB of tag bits. Moreover, the decoder must be designed so as not to choose the half index. The LSB of the decoder input is fixed to "0" in the bit-enhancing mode.
The V min at runtime is corrected in response to the runtime temperature to compensate the temperature fluctuation (shown in Fig. 11 ). The autonomous controller obtains the current temperature using the on-chip temperature monitor and looks up V min in the test result table. The V min corresponding to the current temperature is calculated using these data. The coefficient data to compensate temperature difference between the testing time and current time are recorded When the autonomous controller changes operation modes of the blocks into bit-enhancing mode, the dirty cache lines in the blocks must be migrated. The migration process is shown in Fig. 12 . In this example, a target block of the mode transition is block 7. The controller searches dirty cache lines in the odd index of block 7. Dirty lines in the even index do not need to migrate because these lines are used after the mode transition. If the dirty cache line is detected, then the cache line migrates into the LRU cache line in the same set. If the LRU cache line is also dirty, then the LRU line is written back to main memory before the detected dirty line is migrated. If the detected dirty line is LRU line, then the line is written back to main memory.
If the V dd is over V ref high again, then the autonomous controller changes the operation mode of the blocks from bit-enhancing mode to normal mode. In such cases, it is unnecessary to migrate cache lines. The controller simply inactivates the control signal of the 7T/14T bit-enhancing SRAM (CL depicted in Fig. 2 ) and sets the cache state of the cache lines in the odd index to invalid.
Measurement Results

On-Chip Voltage Droop Waveform and V min of Memory Blocks
Measurement results obtained using a test chip fabricated in 40-nm CMOS (Fig. 13) are presented in Figs. 14-16. The voltage monitoring circuit measures the on-chip voltage droop waveform (Fig. 14) . An upper waveform in Fig. 14 is the injected waveform from outside the chip. This waveform is measured at off-chip probing point on the global power rail. A lower waveform is acquired by measurement with the on-chip monitoring circuit, which probes the local power rail of each memory block. The on-chip measurement waveform presents a different shape from that of the injected waveform because of parasitic elements of the chip. The result shows that the on-chip monitoring circuit is necessary to obtain a precise voltage level. Measured V min characteristics of the memory blocks are shown in Fig. 15 . The V min s are acquired for 8 blocks of 11 chips at each operation mode of BE SRAM. The temperature at the measurement is normal (25
• C) and high (100
• C). The averages of the V min of the worst block (i.e. V min of the entire cache) for 11 chips are 1015 mV in normal mode and 806 mV in bit-enhancing mode at 25
• C. At 100
• C, the average V min in normal and bit-enhancing modes are 1050 mV and 827 mV respectively. Results show that changing the operation mode of BE SRAM to bit-enhancing mode improves the operating margin by 205 mV at 25
• C and 223 mV at 100
• C, on average.
Voltage Variation Tolerance
The voltage variation tolerance of the resilient cache is evaluated using a voltage droop injection to the external power supply rail. During voltage droop injection, the trace of cache access is input to the resilient cache. Then the accesses to fail bits are counted. Five cache traces were taken from SPEC2006 [12] . The evaluation shows that the re- silient cache does not fail irrespective of the droop duration length when the voltage droop amplitude is 20%. Therefore, it is seen that the resilient cache can be applied to the ECUs in electronic vehicles.
To investigate the voltage variation tolerance of the resilient cache, we conducted evaluations under voltage droop conditions with amplitude higher than 20%. The amplitudes are assumed to be 25%, 30%, and 35% of V dd as shown in Fig. 16(a) . The droop durations are 50 µs, 500 µs, 5 ms and 50 ms. Evaluation results under 25%, 30% and 35% droop condition are depicted respectively in Figs. 16(b)-16(d). Under 25% and 30% droop conditions, the failures increase linearly with droop duration length without the proposed scheme (no variation adaptive control and always normal mode). Using the proposed scheme (variation adaptive control and adopt switching to enhancing mode), the resilient cache does not fail irrespective of the droop duration length. Under a severe 35% droop condition, failures without the proposed scheme increased numerically to about ten times of those under a 25% droop condition. Using the proposed scheme, the failure rate improved by ×91 of that without the proposed scheme under 50 ms droop duration length.
Processor Performance
The cache reconfiguration affects processor performance. The cache capacity decreases by 16 KB when one block changes its operation mode into bit-enhancing mode. The capacity decrease degrades processor performance since cache misses occur more frequently. Figure 17 shows nor- malized instruction per cycles (IPCs) with respect to the number of bit-enhancing mode blocks. The evaluation is conducted using gem5 simulator [13] , with benchmarks selected from SPEC 2006 [12] . The average IPC loss is 2.88% when all blocks are bit-enhancing mode (128 KB cache capacity). The resilient cache operates in bit-enhancing mode only if the operating margin is insufficient, and continues stable operation though processor performance degrades.
Conclusion
As described in this report, we proposed a resilient cache with bit-enhancing memory and on-chip diagnosis structures in 40-nm CMOS. The resilient cache has a bitenhancing memory that can dynamically change itself to enhancing mode and on-chip voltage/temperature monitoring circuit. It dynamically reconfigures its operation mode using the voltage/temperature monitoring result. It achieves a 91 times better failure rate under 35% droop of V dd compared with that of the conventional design. 
