The TRAMS (Terascale Reliable Adaptive MEMORY Systems) project addresses in an evolutionary way the ultimate CMOS scaling technologies and paves the way for revolutionary, most promising beyond-CMOS technologies. In this abstract we show the significant variability levels of future 18 and 13nm device bulk-CMOS technologies as well as its dramatic effect on the yield of memory cells, and what kind of circuit solution would be required to maintain the current yield level. Later, we discuss the impact of errors at the system level, and different approaches at system level to adapt the
INTRODUCTION
The great challenge for future information technologies is building reliable systems on top of unreliable components, which will degrade and even fail during the normal lifetime of the chip. Devising new approaches so that future integrated circuits (and especially memories and microprocessors) are resilient to lifetime degradation in a transparent manner for the users is becoming a requirement.
To achieve these ambitious targets, the TRAMS project models and analyzes the variability and reliability issues from the technology to the circuit level. This information is later leveraged to develop new structures, innovative monitoring and countermeasure mechanisms at the circuit, micro-architectural and system level that help mitigating variability and improving reliability. The focus of TRAMS is the memory subsystem for tera-device multicores in the personal computing and server domain, enabling reliable and cost-efficient ambitious applications based on the grid and cloud computing model.
In this abstract we start with a variability analysis of future sub-22nm CMOS devices. Next, we study the effects of variations the broadly used 6T SRAM cells. Later, we discuss different new mechanisms to enhance the reliability (and yield) of memories at the circuit level. We finalize the abstract with some analysis on the impact of errors at application level and some hints on how TRAMS plans to deal with them, and the heterogeneity that they cause, at different levels.
II. ANALYSIS OF SUB-22NM CMOS DEVICES
The aim of TRAMS at the device level is to deliver key components of the preliminary design kits (PDK) for the bulk, FinFET, and III-V/Ge devices. These will include nominal compact models, design rules and statistical compact models that capture the evolution of the statistical device characteristics as a result of NBTI, PBTI, hot carrier degradation and oxide breakdown. First results have been obtained for bulk CMOS.
In statistical device simulation, the main sources of variability involved are random discrete dopants (RDD) and line edge roughness (LER). In order to extract statistical compact models for future technologies it is necessary to simulate an ensemble of devices; in this case we have simulated 200 microscopically different devices. The compact model extraction requires full ID-VG characteristics to be simulated at low and high drain voltages (VD=0.05V and VD=1.0V for the 18nm devices and VD=0.05V and VD=0.9V for the 13nm devices) for each one of the unique statistical samples within the ensemble. All the required simulations have been performed for the 18nm and 13nm n-and p-channel devices. Figure  1shows the structure and doping profile of such devices. Most device reliability problems are associated with generation of fixed charges or the trapping of electrons and/or holes in defect states in the gate stack during circuit operation. The effect of these fixed charges cannot be considered alone as their effect is inextricably linked to the other sources of variability, in particular the other discrete charges due to RDD. ID-VG curves for the 200 fresh devices have been re-simulated for three levels of degradation with assumed average trap densities of 10 11 cm . Figure 2 shows the results obtained for the 200 samples in a 18nm technology. A two-stage statistical compact model parameter extraction strategy has been implemented. During the first stage, a combination of local optimization and group extraction strategy is employed to extract the complete nominal set of BSIM4 parameter using the Glasgow statistical compact model extractor MYSTIC [1] .
The statistical compact models have been extracted for each fresh device and again for each device under the three different levels of aging. Table 1 shows the resulting high level of variability, expected in future technologies due to the effect of the distribution of dopants in the channel of the transistor. These levels are much higher than the one we have in modern CMOS technologies (>22nm). We present an accurate analysis of the robustness of the 6T bit-cell for different technologies: 45, 32, 22 and 16nm (PTM models), and 18 and 13nm (TRAMS models) [2] .
The result of the expected effects of only device random variability on the yield of an assumed 512 Kb memory are shown in Figure 3 for different technologies (6 technologies) and variability scenarios. A dramatic drop of the yield is observed for moderate and high PTM variability scenarios, while for PTM with very high variability and 18 and 13nm technologies the results show an unacceptable practically with null yield. This part of the analysis has been performed with 6T cell level and assuming minimum device sizes, fact that implies a certain overestimation of the effects. 
IV. CIRCUIT SOLUTIONS
Due to the shown lack of reliability of the devices and memory blocks in late CMOS and emerging technologies, strategic hardware countermeasures are required in order to get reasonable quality of the blocks used at architectural levels. Figure 4 shows the example of the yield obtained in the case of a 32x512 bits memory for different levels of the cell probability failure. The figure shows the relation between yield and the failure probability for different mechanism to improve reliability: no redundancy (original circuit), memory with information redundancy (ECC SEC-DED, implying an increase in area of the order of 37.5%), dynamic reconfiguration (spare parts with self testing and dynamic adaptation, with an overhead that may reach 100%) and hardware redundancy (RMR, with overheads of 200% or higher). Observe that for memory cells with minimum device size the probability of failure for 45, 32, 22, 16, 18 and 13nm bulk-CMOS technologies are shown. For a requirement of a 90% of yield the figure shows the intervals of failure probability where each countermeasure technique can be applied. 45nm technology does not require any specific countermeasure mechanism (Y=90%), 32nm could require the use of information redundancy, 22, 18 and 16nm would require dynamic redundancy while in the case of
112
2011 IEEE 17th International On-Line Testing Symposium 13nm RMR hardware redundancy is required (all cases are for 6T with minimum size devices).
For brevity, in this abstract only the use of proactive dynamic reconfiguration mechanism is commented, due to the fact that will be the more typically use countermeasure technique to improve the quality (yield) of memory blocks in next decade technologies.
One of the first references to proactive wearout recovery techniques approach to extend cache SRAM life time is due to Shin et al from IBM in 2008 [3] . The main difference between proactive and conventional static redundancy (reactive) is that this last is oriented to the repairing of the memory block when a cell is faulty (manufacturing or aging), while the proactive is a dynamic reconfiguration that enlarges the lifetime of the memory systems thanks to the continuous interchange of main and spare parts, extending the life time. Due to the fact that degradation mechanisms are a key limiting factor of memory system in sub-22nm CMOS and beyond technologies, this is a key countermeasure technique for future systems.
In the TRAMS project the mechanism of proactive wearout recovery is applied to different type of cells (6T and 3T1D), technologies (22, 18, 16, 13 CMOS, 10nm FinFETS, CNT) and cell sizing, where in all the cases the devices exhibit extreme random process variability. The results exhibit an increase between 2X and 6X of the memory lifetime.
V. MICROARCHITECTURE SOLUTIONS
The level of unreliability and variability expected for sub-22nm technologies will require very strong reliability techniques. However, error protection mechanisms are designed for the worst case (in this case, the extreme variability), which has a cost in silicon area, power and performance. Therefore, we envision a system that instead of being designed for the worst case, it dynamically adapts the processor to the different actual requirements, employing different levels of reliability as required.
In this section we discuss some new techniques that adapt the heterogeneous multicore to three basic user requirements: besides reliability, we also need to consider performance and power. This layer takes as inputs the memory cell behavior and circuit solutions for reliability developed at the other layers and discussed in previous sections. Solutions provided will include microarchitectural dynamic reconfiguration based on program characteristics and runtime cell behavior, dynamic test allocation, thread allocations, etc.
A. User Requirements Definition
High levels of unreliability will cause what is known as heterogeneous multicores by accident. Therefore, we will need mechanisms to reconfigure the multicores so they can fit the applications.
An important part when defining how to reconfigure a multicore platform is to know what the user expects from this platform in order to provide an adaptation suited to the user requirements.
Most of the existing reconfiguration mechanisms focus on maximizing the overall instruction throughput without taking into consideration the user needs. A high instruction throughput, however, does not always grant the best user experience. If we consider, for example, the case of a video decoding application combined with an antivirus with very high throughput, it is clear that what matters to the user experience is ensuring an adequate frame rate in the video decoder. Therefore, we propose a different way to allow the user select different performances based on priorities.
A division in user priorities takes into account the user experience and is suited for all market segments. In a Desktop/Laptop segment a user can define primary (Video, applications prioritizing the ones required for the current activity.
On the server market, on the other hand, different priorities can be sold at different prices, creating high priority and low priority users. Another benefit of this prioritization is that it allows taking advantage of the increasingly heterogeneous CMPs without directly exposing this heterogeneity to the user. Once the user options have been defined, the reconfiguration mechanism can be implemented taking them into account. The reconfiguration mechanism can be seen in Figure 4 and is divided in two parts, a chip-level mechanism and a core-level mechanism. The chip-level mechanism is responsible for assigning applications to cores (only when new applications start or finish) and assigning the priorities to each core. The core-level mechanism is responsible for reconfiguring that core for maximum throughput (IPS) given the power budget and reliability constraints.
B. Reliability Requirements
Estimating power and performance at runtime has been deeply studied in the past. For instance, it can be based on current and IPS. However, estimating reliability at runtime is not easy.
Most studies estimate reliability in terms of architectural vulnerability factor (AVF). Most attempts are offline analysis with complex simulators. We follow previous work
IEEE 17th International On-Line Testing Symposium
based on runtime AVF calculation [4] that uses linear regressions of processor events and extend it to the memory structures, such as integer and floating point register files, and different caches.
C. User-Driven Reliability
Protecting the processor against errors requires using valuable transistors that could have been used for other purposes. Moreover, they burn power for protecting against the worst of the cases. Therefore, we have explored how to relax the reliability requirements: our goal is to analyze which are the opportunities and challenges to fulfill the user requirements under faulty hardware conditions. In a computer system, whether a fault is unacceptable or benign depends on the level of abstraction at which correctness is evaluated. At the lower levels, the grade of correctness must be much more precise and strict. For instance, circuit blocks must ensure their functionality under nominal conditions or architecture state must be consistent and numerically perfect. On the other hand, the threshold for correctness can be softened at the higher levels. Thus, some kind of errors may affect only performance or quality and then, the program can still appear to execute correctly. Our results based on error injection campaigns show that ~50% of the SPEC and Mediabench programs crashed when only one single soft error happens in the register file. This percentage increases to ~90% for hard faults. Results for the cache are not much better; with a pfail of 0.01%, which seems reasonable for future operational voltage conditions, 80% of the programs crash, and ~20% deliver a wrong output. We highlight some results in Figure 5 for SpecFP.
VI. CONCLUSIONS
In the statistical device simulations we have observed an unexpectedly large variability in the on-current of the aggressively scaled MOSFET, due to RDD in source/drain extensions, which is electrostatic in its origin and can be captured by drift-diffusion simulations. Rare combinations of individual donors and acceptors result in localized depletion and mobile charge starvation in the extensions, and a corresponding on-current collapse, which affects both the linear and the saturation regime of MOSFET operation. This is the cause of such increment of variability.
At memory cell level the robustness of the 6T bit-cell for process variations (only random independent variations) has been shown. The results show a dramatic drop of yield for the variability levels determined in TRAMS for 18 and 13nm, as well as for the PTM technologies considering high and very high variability scenarios. For 6T minimum sized devices, 18 and 16nm would require dynamic redundancy while in the case of 13nm RMR hardware redundancy is required (all cases are for 6T with minimum size devices).
Solutions at the system level can provide dynamic hardware redundancy. Such techniques need to take into account power and performance cost and must also consider user requirements. The reconfiguration mechanisms can employ multi-layer options spanning from circuit solutions like redundancy to thread allocation at the system level. However, we have seen that error protection cannot be removed since even small errors make applications crash. Therefore, we believe that methods to estimate the AVF at runtime to lead the reconfigurations are required. 
REFERENCES

