Abstract-As the semiconductor process technology continues to scale deeper into the nanometer region, the intrinsic parameter fluctuations will aggressively affect the performance and reliability of future microprocessors and System-on-Chip (SoC) applications. These system requires large SRAM arrays that occupy an increasing fraction of the chip real estate. To investigate the impact various source of intrinsic parameter fluctuation (IPF) from systems point of view, a framework to bridge architecture-level and device-level simulation will be utilized for data cache built from transistors with 25 nm, 18 nm and 13 nm technology node. This study found that the IPF will not have any significant impacts on data cache memory systems build with 25 nm while increasing the memory cell ratio, (β) to two will overcome the IPF impacts for the 18 nm. However, the 13 nm technology data cache could not operate even with higher cell ratio. Common, cache memory fault detection and correction such as ECC and redundancy can only partially remove the transaction error caused by these fluctuation sources.
I. INTRODUCTION
The International Technology Roadmap for Semiconductors (ITRS) predicts that the microelectronic industry will benefit enormously from MOSFET miniaturization to the nanometer regime for the next decades. However, the scaling of conventional device size are approaching fundamental physical limits [1] . One of the most challenging by-products of feature scaling that is proving extremely difficult to manage are the increasing variations of the transistor characteristics due to intrinsic parameter fluctuations (IPF). This problem is associated with the fundamental discreteness of charge and matter [2] and cannot be removed by better processing steps or improved equipment [3] . It has been experimentally demonstrated at device and circuit level that with the continuing scaling of the conventional MOSFETs, IPF will adversely affect circuit performance [4] , [5] , [6] . As the process and technology to build the next generation devices and IC are very complex and still unavailable, several simulation methodology to investigate the impact of IPF at circuit-level have been introduced [7] , [8] , [9] .
Application of circuit-level simulation are limited because it is only suitable to investigate a small part of the systems associated with a circuit block. It is critically important to examine the impact of the intrinsic parameter fluctuation to clearly understand and evaluate the performance of nanoregime semiconductor from the architecture level perspective. The cache memory is the most susceptible and sensitive to manufacturing defects and process variations due to the minimum size transistors used. Enhancing the performance and cost ratio for microprocessor and SoC applications requires evaluating the performance of these cache memory that constitute an increasing fraction of the chip real estate.
The rest of the paper is organized as follows. The nature of intrinsic parameter fluctuation in individual 6T SRAM cells based on UTB-SOI MOSFETs commonly used to built cache memory will be presented in Section II. These baseline fault observed will be used as an input to the fault injection framework. The cache memory setup, tolerance technique implemented on the framework cache prototype, fault injection strategy and the benchmark programs used for the purpose of this study will be elaborated. This study results is presented and discussed in detail in Section III. The conclusion of the work is given in Section IV.
II. METHODOLOGY
A. Intrinsic Parameter Fluctuation in 6T-SRAM In this work, the impact on fault tolerance of the scaling limit on fluctuation sensitive microprocessor cache memory due to discrete random dopants (RDD) in the source/drain regions, line edge roughness (LER) and body-thickness variations (BTV) of Ultra-Thin-Body (UTB) SOI MOSFETs will be studied. The impact of these intrinsic parameter fluctuations on individual 6T SRAM cell based on UTB SOI MOSFETs with physical channel length of 7.5 nm and 5 nm have been presented elsewhere [10] . SRAM cells are classified as faulty if the variation of the static noise margin (SNM) [11] are larger than 6σ manufacturing requirement [12] as shown in Figure 1 . SNM can be used to characterize a failure in SRAM cell due to destructive read and unsuccessful write. Increasing the SRAM cell ratio could improve the resistant of the cell to IPF as shown in Figure 1 , however the overall cell size would increase. These data-sets has been statistically analyzed and a Gaussian probability distribution function will be used to inject faulty cells into the cache memory of a microprocessor characterizing the impact of each individual and combined parameter of the IPF. Each of the simulation is repeated ten times and the average is calculated. The inter-die variation has been ignored to clearly illustrate the impact of IPF on cache memory. Since there is virtually no probability of having an occurrence of write transaction fault, the result is not presented.
B. Fault Injection Framework and Cache Memory
The fault injection framework is built on top of Simics [13] , a system level instruction set simulator that allows execution of machine code and monitoring of various component of computer hardware with cycle accurate ability. The operation of the computer hardware can be traced which is especially beneficial for evaluation and debugging of architectural and software implementation. A fully virtual computer platform will be utilized which its specifications listed in Table I .
The target architecture for this study is a generic ARMv5 processor that models an Intel StrongARM 1110 microprocessor. The decision is due to the simple architecture including the instruction set and widely available literature documenting the processor. A minimal Linux kernel 2.4 operating system is used to boot the virtual machine in order to run the benchmark program. Modern processors have large specialized and multi-level cache memory, however L1 data cache is the most susceptible architecture components to the impacts of process variation [14] , [15] .
The configuration of the cache is summarized in Table II . A small size data cache is used to clearly demonstrate the By default, Simics does not model platform specific cache system, thus a data cache model for the ARM architecture was developed. A generic Simics cache system was used for the instruction cache. A mechanism to integrate the fault due to IPF is introduced into the cache memory read and write handlers to simulate faulty cell behaviors. The mechanism that inject fault in the cache memory only applies to the data blocks (actual cache lines) portion while all tags are guaranteed with no faulty cells. Faulty cells in the cache memory does not directly translate into failed read and write transaction. Read and write faults status are determined based on the previous state of the faulty cell and the specific values during read and write transaction [16] , [14] as summarized in Table III and IV. Two simulation modes are available in the framework to analyze the behavior of IPF in cache memory. The normal mode would perform fault injection in the cache memory system and invoke the handler for the read and write fault mechanism. Fault emulation mode would just emulate the occurrence of faulty cell however the read and write fault operation mechanism is not enabled. This mode allows analysis of targeted architecture that does not implement cache memory fault tolerance policy.
C. Fault Tolerance for Cache Memory
Increasing cache memory reliability has always been an interest of the computer industry and research community. The cache memory system induced by IPF has been analyzed with different cache memory fault tolerance configuration. The fault tolerance techniques include individual and combination of parity checking, error-correcting code (ECC) and hardware redundancy techniques. A 1 bit correction and 2 bit detection ECC was implemented for each slip using hamming code, assuming that the additional transistors required are fault-free. Hardware redundancy is implemented by extending the arrays row and column between 5 to 50 percent. The redundant area would be considered for inclusion of the faulty cell. However, the row and column with the most number of faulty cell will be deactivated.
D. Benchmark Selection
Dhrystone benchmark [17] program has been used to reflect microprocessor activities to study the impact of IPF on cache memory. Dhrystone code is dominated by integer arithmetic, string operation, logic decisions and memory accesses that is frequently found in most general purpose computing application. Since Dhrystone is very compact with small binary size, memory access beyond the cache is not exercised. This benchmark characteristics facilitate the study of IPF impact on cache memory.
III. RESULTS AND DISCUSSIONS
Faulty cells due to different sources of individual and combine IPF has been introduce on a processor cache memory. The cache model for the processor architecture used in this study consist of 65536 individual SRAM cell (8 KB) . Figure  2 illustrate typical number of faulty cell for each blocks in the cache memory of a 5 nm transistor induced with RDD. At circuit-level, the cell ratio, β of each cell has to be increased to control the adverse effects of IPF [5] . Table V summarize average faulty cells with β of one and two respectively, obtained by injecting UTB-SOI MOSFETs device from the baseline fault data-sets with 6σ manufacturing tolerance. Cache memory built from 10 nm device does not have any faulty cell for any individual and combined source Table V . The number of faulty cell for the cache memory based on 5 nm UTB-SOI MOSFETs has been reduced significantly for RDD and combined sources of IPF to 98.8 percent and 89.9 percent respectively by increasing the cell ratio to two. There are no faulty cells in cache memory based on 5 nm gate length device injected with LER and BTV. Note that the source of individual and combined IPF is statistically independent and provides some evidence that these sources of fluctuations are uncorrelated [9] .
ARM's Dhrystone version 2.1 benchmark program has been executed on the framework for cache memory built with 5 nm physical gate length UTB-SOI MOSFETs. Dhrystone caused a total of 171675 cache memory transaction in the framework. Whenever the cache memory contains faulty cells, the benchmark program would exit prematurely. The fault emulation mode of the framework is used to obtained the following result. Figure 3 illustrate the impact of various individual and combined sources of IPF on a cache memory system built using 5 nm UTB-SOI MOSFETs with a β of one. Memory transaction that access cache lines addressed words having faulty cell (FT) with RDD dominate almost 75 percent of the total transaction, followed by line edge roughness with 42 percent transaction in FT and 28 percent actual fault occurrence during read transaction. The large number of faulty cells built using 5 nm with β=1, lead to the inclusion of common fault tolerance technique to the cache memory.
Protecting cache memory influenced by IPF with parity check alone would always cause the system to crash due to high occurrence of memory transactions having more than one bit error. ECC technique implemented for each slip is able to correct 1 bit fault while only detecting 2 bit fault. Addressed word (slip) requested by the microprocessor having three or more faulty cell would cause a memory fault. As such, the fault tolerance performance of the cache memory would be determine by the number of faulty cells that lie in the slip. Figure 4 presents the exponential relation of faulty slips percentage with number of cells that could cause read transaction fault in the 8 KB data cache due to different sources of individual and combined IPF.
The reliability of the system defined by the cache system operation without errors can be estimated from the percentage Reliability of L1 data cache with ECC and various hardware redundancy under the impacts of RDD of faulty slips that exist in the cache memory. Figure 5 illustrate the reliability of the L1 data cache built from 5 nm UTB-SOI MOSFETs with a β of one, impacted by RDD. It is clear that ECC technique could not overcome the significant fluctuation of transistors in the cache memory achieving only 10 percent reliability. Applying 5 percent and 10 percent hardware redundancy only contributes to not more than 7 percent and 10 percent improvement respectively. Although a 30 percent and 50 percent hardware redundancy improve the reliability of the cache memory significantly, the overall system can only achieve 40 percent and 60 percent reliability respectively.
IV. CONCLUSIONS
The impact of various individual and combined sources of intrinsic parameter fluctuation from UTB-SOI transistors within the 25 nm and 13 nm technology node in a virtual cache memory system has been presented. Individual and combination of common fault tolerance technique for cache memory such as parity checking, error correcting code and hardware redundancy could not suppress the adverse effect of IPF. Without carefully designing cache memory system, IPF would affect the performance and yield of the corresponding system. An IPF tolerant cache architecture and cache management policy that could overcome the impact of IPF without sacrificing the performance and area efficiency of future microprocessors is required.
