This paper presents a time-redundant technique to mitigate Negative and Positive Bias Temperature Instability (NBTI/PBTI) ageing effects on the functional units of a processor. We have analysed the sources and effects of ageing from the device level to the Instruction Set Architecture (ISA) level, and have found that an application may stress the critical paths in such a way that the circuit has half of its nodes always NBTI-stressed. To mitigate this behaviour, we propose an application-level solution to balance the stress and put the timing-critical gates of the critical path into a relaxed (balanced) mode. The results show that the lifetime of the system can be doubled by applying balanced stress patterns at the software level during the idle time of a processor system.
Introduction
Ageing or time-dependent variations in CMOS devices represent a challenge to the design of integrated circuits, along with power consumption and performance. With technology feature sizes scaling down, critical issues related to system reliability, such as soft errors, hard errors, and process variations, [1] , are emerging. Ageing-degradation can manifest itself as soft errors that become hard errors, bringing the system to a state where timing constraints are violated, and finally, the system fails to function properly. The mechanisms at the device level responsible for that degradation include: Positive/Negative Bias Temperature Instability (PBTI/NBTI), Hot Carrier Injection (HCI) and Time-Dependent Dielectric Breakdown (TDDB) and manifest themselves as changes in the device threshold voltage, the carrier mobility, and the insulating properties of gate dielectrics, [2] .
In this work, we are primarily concerned with the mitigation of Bias Temperature Instability (BTI) and particularly NBTI. BTI has two different phases:
• Stress: Interface traps are generated at the interface of the substrate and gate oxide layers due to electrical stress (i.e. negative bias for PMOS and positive bias for NMOS) that leads to breaking of some of the Si-H or Si-O bonds. Consequently, the threshold voltage of the transistor increases over time.
• Relaxation/Recovery: some of the generated traps are removed from the interface. However, the relaxation phase cannot completely compensate for the effect of the stress phase and therefore the overall effect of BTI is degradation in the threshold voltage of each transistor. The amount of degradation depends on the ratio between the stress period and the total operating period (the duty cycle).
Techniques for reducing stress and increasing recovery include controlling the signal probabilities. This can be done from the inputs of the circuit (Input Vector Control) or during the synthesis process by hiding those nodes with high probabilities of being zero, or by changing the pin order of the gates on the critical paths, 1 [3, 4, 5] . However, if we try to relax one node, other nodes in the signal path may be stressed.
The hypothesis of this work is that a processor needs to actively relax to recover from stress, rather than simply doing nothing during idle periods. In modern applications, processors tend to have many short idle periods; thus, simple power gating would not be a good solution for stress optimization, [6] . During normal operational periods, the input states of the transistors may be constant, leaving the transistors stressed. NBTI/PBTI could then be mitigated by applying balanced-stress stimuli to the critical paths at the software-level. Running a program on the processor for a non-functional purpose has been used in on-line testing (i.e. Software-Based SelfTest (SBST) methods) as this does not require modification of the hardware design [7] .
The main contributions of this work are:
• A high-level ageing prediction model is proposed which takes into account the actual signal probabilities of the system. To achieve this we have developed a tool to derive the actual stress-recovery ratio of each logic gate.
• An application-level technology-independent mitigation technique is proposed to balance NBTI/PBTI effects. This brings the circuit into a recovery state by changing the nodes that are BTI-stressed to BTI-relaxed.
The organisation of this paper is as follows: ageing causes and mitigation techniques are covered in Section 2. In section 3, NBTI/PBTI stress analysis is presented. The proposed technique for generating and applying the balancing patterns is presented in section 4. The results of running the balancing program are given in section 5. Section 6 concludes the paper.
Background And Related Work

BTI Stress-Recovery
BTI in a transistor is caused by the generation of traps at the channel and dielectric interface. BTI in PMOS transistors is referred to as Negative BTI (NBTI), since the gates of PMOS devices are negatively biased with respect to the source (i.e. V gs = −V dd ). BTI for NMOS transistors is referred to as Positive BTI (PBTI), since the gates of NMOS transistors are positively biased with respect to the source (i.e. V gs = V dd ). NBTI and PBTI have the same consequences, namely that they increase the threshold voltages and decrease the driving currents of the devices, but for different physical reasons. It is generally accepted that NBTI degradation in SiO 2 and High-κ dielectric is due to dielectric interface traps. PBTI was considered to be negligible before the introduction of High-κ MOSFET technology [8] . In High-κ processes, PBTI degradation is due to a build-up of negative charges in the High-κ layer. Although High-κ technology is impacted by both NBTI and PBTI degradation, NBTI is still much more dominant according to recent results for Replacement Metal Gate (RMG) Technology, [9, 10] .
The probability of a signal being zero, SP(0), or the duty cycle, reflects the fraction of time spent in the stress state. We simulated the effect of the duty cycle on the threshold voltage using a commercial reliability model, namely the HSPICE MOSRA Built-in Model Level 3 [11] . MOSRA can evaluate both NBTI and PBTI for a circuit at the SPICE level and obtain gate or path degradation, rather than just threshold voltage degradation or leakage current increase at the transistor level, as given by the basic Reaction-Diffusion (RD) model [12] . Decreasing the duty cycle can impact positively on the V th degradation of PMOS devices and negatively on the V th degradation of NMOS devices. MOSRA simulations use degraded device parameters in HSPICE to calculate gate or path delay degradation in timing simulations, [13] .
Simulation parameters have been tuned to match the degradation given by statistical data, [14, 15] . We applied different SP(0) values to a circuit consisting of two inverters in series to calculate the maximum path delay after 10 years for two different technologies (90nm from Synopsys and 65nm from TSMC). We activate only the NBTI effects because PBTI should barely exist at these technology nodes and if modelled would exaggerate the ageing effect. As can be seen from Figs. 1 and 2, the signal probability has an impact on path delay degradation and, thus, on the lifetime. While one node is highly stressed, it will tend to have more static NBTI and by balancing the signal probabilities, the static stress will be reduced as well. As the delay degradation is less at the end of the device lifetime than at the beginning, decreasing the path delay by a small amount could enhance the lifetime of a device by several years. Fig. 3 shows the lifetime of the two inverter chain with a target maximum delay of 0.237ns for the 90nm technology from Synopsys and 0.149ns for the 65nm technology from TSMC.
NBTI Mitigation Techniques
One approach to mitigation is to reduce or even to eliminate design uncertainties. However, in practice, there is more than one contributor to these uncertainties, including EDA tool limitations and complex environmental stress conditions, [16] . Another solution is conservative design under worst-case scenarios, but circuits do not always run at the worst-case condition, and such an over-designed approach is extremely costly with respect to power and area. At gate level, BTI manifests itself as a timedependent gate delay, leading eventually to timing violations. In [17] , critical gates are identified and optimisation methodologies, e.g. gate resizing, have been proposed for these critical gates. However, NBTI has a dependence on the dynamic operation, such as the supply voltage, spatial or temporal temperature and signal probability and these parameters vary dynamically from gate to gate. Another solution, [3] , uses signal probabilities to restructure the logic gates and arrival times of the input signals to reorder the pins. However, signal probabilities are assumed to be 50% for input signals, but in larger systems, the signal probability is dynamic and application-dependent. A technique to mitigate NBTI has been proposed, [18, 19] , in which the signal probability is modified using Input Vector Control (IVC) or Multiple Input Vector Control (M-IVC) during the stand-by mode of the circuit. However, saving these vectors in memory has costly overheads in terms of area and power and the techniques do not consider the stress probabilities during the operational mode of the circuit.
At circuit and gate levels, different methods have been proposed to reduce NBTI degradation. Supply voltage scaling over time to reduce guard-bands and increase the lifetime of the circuit has been explored, [20] . Scaling the clock frequency has been proposed to detect and mask late transitions generated due to ageing, [21] . Reordering the gate pins and restructuring the transistor network to mitigate the NBTI degradation has also been suggested, [3, 22] .
There are many different approaches to controlling these effects. A technique called Dynamic Wearout/NBTI Management (DNM) has been proposed, with the aim of reducing design margins [23, 24] . The DNM approach reduces the power consumption by running the circuit with the least possible supply voltage and changes the supply voltage periodically based on readings from delay sensors. However, the main challenge of this approach is the accuracy of the sensors and the area overheads that could exceed design constraints.
In general, there are three possible approaches to ageing mitigation: proactive, reactive and ageing-aware (protective). The proactive approach works as an estimator for ageing behaviour and is based on a physi-3 cal model that describes ageing effects (NBTI, HCI and TDDB) at a low level, [25, 12, 4] . Usually, this approach needs to take into account the different contributions of time-dependent variations (e.g. signal probability, switching activity, temperature and supply voltage). The reactive approach is to monitor on-line the real behaviour resulting from ageing by using delay sensors [23, 24] . This approach is more precise and bypasses the complexity of modelling the ageing effects at the system level. However, the main problem of on-line sensors is area and power overheads and to moderate this only a limited numbers of nodes can be monitored. The third, protective, approach tries to alleviate the source of ageing either by reducing the operating temperature or reducing the stress probability in the critical path, [26, 17] .
Data Dependency of Ageing-Induced Degradation of the Processor
Program data determines whether the nodes of the processor are stressed or not. The data includes both opcodes and operands. Firouzi et al, [26] , looked at possible NOP instructions in the MIPS processor to reduce ageing. As well as the standard NOP instruction (sll r0, r0, 0) they considered other instructions that had no result (e.g. adding zero) to minimise stress. They proposed software and hardware techniques to assign the best input vector for these NOP instructions. This method would only be helpful, however, if the rate of NOP instructions is high with respect to the total number of operational instructions. To test this hypothesis, we ran different benchmarks from the MiBENCH suite, [27] , for two different architectures on the gem5 simulator, [28] . Fig. 4 shows that the number of NOP instructions on the MIPS processor is significant, while on the ARM architecture it is negligible. Data from other paths can also stress the critical path. For example, assume the adder is in the critical path, and that during the execution of other operations data is routed to the adder, even though the result is not used. From a BTI perspective the critical path through the adder will be stressed. It has been claimed, [29] , that the core of a processor can be brought to a failing state by executing a malicious program to age the circuit; this does not consider the signal probabilities of intermediate nodes, however (see Fig. 5 ). In practice, it is not possible to put all critical path nodes into a fully relaxed state (a Signal Probability of zero, SP(0) = 0%) or a fully stressed state (SP(0) = 100%), Fig. 5 . On the other hand, it is possible to balance the stress by controlling the signal probabilities during the idle time of the processor.
Class of executed instruction
NBTI/PBTI Stress Analysis
Ageing-Sensitive Critical Path Selection
The selection of paths to reverse the stress needs to consider both the initial path delays from the postsynthesis analyses and the gate types in the paths. A non-critical path at time zero could become a critical path after a number of years because paths degrade according to different factors, for example, duty cycle, temperature, frequency and circuit topology. Estimating the path most sensitive to ageing depends on model parameters that would not be available until the system has been fabricated and tested in the required environment. So, in this research, we have tried to avoid using an ageing model to define the criteria for selecting specific paths that are potentially vulnerable to ageing. Instead, we define a threshold (θ) for the ageing-critical path delay. For example, the maximum critical path degradation has been measured for different benchmark circuits for 10 years and found to be between 12.3% and 19.5%, [30] . Thus, we can define ageing sensitive critical paths as those paths that have slack in the range of zero to (δ 0 × θ), inclusive, or have a path delay between δ 0 and δ 0 (1 − θ), inclusive.
Also we have to consider the effect of process variations on selected paths by defining the worst case possible path deviation due to the process variations as ∆δ pv . Then, an ageing-and process-sensitive critical path should be selected if its path delay is in the range:
These paths have nearly balanced path delays, but they could share instances with the first critical path (for example in an adder, if the carry chain path is shared between nearly-critical paths and the first critical path, then any degradation or reversed degradation on the 4 shared part will also affect the ageing-sensitive critical paths). Alternatively, if the critical paths have instances that are independent from one path to another, then all the ageing sensitive critical paths need to be individually analysed for ageing and possible reversal.
SP(0) Distribution on the Critical Paths of the Processor
Combinational logic circuits may show different degradations in each PMOS transistor because different input patterns can lead to different inputs to the CMOS transistors. Some PMOS transistors may degrade more because they have a SP(0) of 99%, but others may not degrade if they have a SP(0) of 1% at their gates. To date, however, there has been no consideration of the SP(0) of the intermediate nodes of the complex gates. For example, in [31, 29] , the analysis was done only for the input transistors of gates. In the OR gate of Fig. 6 , if IN1 is at "1", Q would be "1" regardless of IN2, but there is a node, QN, inside the OR gate that would be stressed. To measure the delay degradation on each net of the critical path, we need to consider the SP(0) of each input and of the internal nodes of the gate cells. We used the OpenRISC core for this analysis. First, synthesis was done with the full set of cells available in the 90nm Synopsys library, including those cells that have internal nodes. There are 2888 critical paths in the circuit, but various critical paths pass though the same gate cells. For example, the first 100 critical paths share more than 92% of the cells in the most critical path, Table 1 . Thus, if there is any degradation in the shared part, it would affect all these critical paths. Moreover, the average SP(0) for the first 100 critical paths is around 80% when executing a "Hello World" program. This means the program will stress the critical path. In this example, the nodes are totally NBTI stressed because the probabilities of signals being zero are close to 100%. However, this does not consider the hidden nodes of the compound gates and so the average SP(0) is not correct. These hidden nodes would have complementary values and therefore have no NBTI stress but could have PBTI stress. In other words, a circuit with SP(0) close to 0% would have hidden nodes with SP(0) close to 100%. To calculate a more accurate figure, we ran different instructions on a processor synthesised using only cells that have only one-level of transistors, in order to avoid any hidden nodes, as given in Table 2 . The SP(0) at each node is generally the complement of that in the previous node and the average SP(0) is around 50%. So, the object should be to reverse these signal probabilities to obtain signal probabilities that are as balanced as possible (around 50%), rather than reducing one signal probability to avoid NBTI stress. This example is only used to show the signal probabilities of the hidden nodes, as this case is not feasible in real synthesis. 50%  50%  50%  50%  34%  34%  34%  35%  151 NAND2X0  0%  0%  0%  0%  0%  0%  0%  0%  152  INVX0  99%  99%  99%  99%  99%  99%  99%  99%  153 NAND3X0  0%  0%  0%  0%  0%  0%  0%  0%  154 NAND2X0  99%  99%  99%  99%  99%  99%  99%  99%  155  DFFX1  99%  99%  99%  99%  99%  99%  99%  99%  Average SP(0)  49%  49%  49%  49%  48%  49%  48%  48% 3.3. Impact of Instruction/Program Level Workload on the Stress Probabilities To study the effect of different instructions on the stress of the critical or nearly critical paths, we ran two different instructions with four different operands each, to determine whether the opcode or the data have a significant impact. The synthesis was been done using only cells that have no hidden nodes and the SP(0) at each node is generally the inverse of that of the previous node in the path. For the 155 nodes in the critical path, the average SP(0) was around 50%, as can be seen from the symmetry of the histograms in Fig.  7 . The average SP(0) of the paths does not change significantly with the opcodes or operands. However, the signal probability distributions on the critical path do depend on the opcode and operands, as shown in Fig.7("movhi rD,5555_H") , where the signal probabilities tend to the extremes (10%>SP(0)>90%) even with symmetrical distributions that stress the critical paths with both NBTI and PBTI. Similarly different MiBench benchmarks show nearly symmetrical average stress, as can be seen from Fig. 8 . Therefore, it would be desirable to reduce the number of nodes at the extremes of these histograms (e.g. 25%>SP(0)>75%).
NAND3X0
Hence we conclude that a single instruction or program will not relax or stress 100% of the critical paths, but that a program-level solution could relax or stress some specific nodes. In other words, if there are some 6 nodes in the processor that could face a continual stress while others are not stressed all, it possible to balance that effect at the application level.
Gate Level Stress Balancing
We considered balancing the signal state of basic logic gates (inverter, NAND and NOR) compared with inverting the signal probability. We examined how inverting the signal probability would affect the ageing degradation. We have used HSPICE for simulating path delays and modelled the NBTI using MOSRA Level 3 considering two different cases:
• CASE A: Unbalanced stressed nodes -the nodes of the critical paths are either significantly NBTIstressed (SP(0) greater than 75%) or significantly unstressed (or PBTI-stressed) (SP(0) less than 25%).
• CASE B: Balanced stressed nodes -the nodes of the critical path have SP(0) around 50%.
For the inverter, we simulated the degradation of a path of two inverters over ten years using the cases discussed above. The results show an advantage of 23.17% in the path delay and more than 50% in terms of time, Fig. 9 . For NAND and NOR gates, the same simulation as for the inverter was done but also considering two further dependencies: the signal probabilities of the secondary inputs of the gates, and the input pin order. The results show that swapping input pins can decrease the advantage obtained from balancing the signal probabilities. Also, the signal probabilities of the secondary input will not significantly affect the benefits of balancing the signal probabilities over the critical path, Table 3 and  Table 4 .
We also considered how the balanced stress patterns affect the remaining paths of the circuit. To answer this fundamental question, we examined the proposed technique on a two-bit adder. In this example, there is one target critical path and two nearly-critical paths, Fig. 10 . In this example, we extracted the critical paths list after synthesizing the circuit using Design Compiler. Again we used the MOSRA Level 3 model in HSPICE simulations to model the degradation of the circuit, using the two above-mentioned cases. The results show that for all paths, there will be an advantage up to 50% in the expected lifetime from balancing the signal probabilities in the critical path (Fig. 11) . Fig. 11 also shows that nearly critical paths share more than half of their nodes with the target critical path. Thus, any advantage in balancing the signal probabilities of the critical path will lead to an advantage in the remaining paths. If the nearly-critical paths do not share nodes with the critical path, then it is possible to control both the critical and the nearly-critical paths in parallel.
Proposed Technique
We propose a two-phase technique to mitigate the BTI ageing effects. In the first phase anti-ageing patterns (the balance states) are generated and these patterns are applied in the second phase by executing a stress-relief program instead of running a process idle task.
The flow of the first phase of the proposed techniques is illustrated in Fig. 12 . We find the normal states (the Critical Path Stress States) of the nodes that need to be balanced by running different benchmarks and instructions. We obtain the signal probabilities of the nets being stressed to logic zero (SP(0)) from gate-level simulations of the processor executing benchmarks. We calculate the SP(0) of the critical paths that have slacks less than predefined maximum path delay degradation.
The second phase of the technique is to balance the effect of BT by reversing the average signal probabilities by applying stress-relaxing patterns to the timingcritical components in the functional unit of the processor during idle states.
Case Study:Program-Level NBTI/PBTI Balancing
We synthesized an OpenRISC 1200 processor core using the 90nm Synopsys technology 1 . The VCD (Value Change Dump) file from each post-synthesis simulation contains both switching activity, that is used to estimate the dynamic power at the design phase, and the signal probability that we used to estimate the BTI effect on performance degradation. From this we extracted the SP(0) for all the nets of the processor. To balance the effect of signal probability on the critical path, we need to find input patterns that will invert the signal states. This is effectively the same as generating test patterns for single stuck faults. We used an ATPG tool to find test patterns for stuck-at-0 faults on nets that have high SP(0) so as to set those nets to '1'. In the same way, we generated patterns for stuck-at-1 faults on the nets that have low SP(0).
The critical paths of the OpenRISC 1200 processor are in the adder. 38 nodes have an SP(0) greater than 75%; 10 nodes have an SP(0) less than 25%. The ATPG tool found 8 test patterns to set these nodes to balanced stress conditions. Each pattern will apply balanced stress to one or more nodes; the full set is needed for every node. As the results given in Section 5 show, the percentages of stressed nodes will be reduced significantly after applying these patterns. Table  5 shows these patterns as they would be applied to inputs A [31 . . 0] and B [31 . .0] of the adder. The patterns could be applied either in a test mode or by writing a program.
The OpenRISC 1200 Instruction Set Architecture has only a 16-bit immediate mode. These balance-stress patterns have a 32-bit widths and so are stored in consecutive memory locations starting with address K. The program, Fig. 13 , transfers K (immediate value) to register 1 for use as an offset address. Then two patterns are loaded from memory to registers 2 and 3. The two patterns are applied to the adder with an ADD opera-8 tion. The same sequence is applied for the remaining patterns. Finally, this program sits in a loop to be run during the idle states of the system.
Further optimization is possible to the above program to reduce the memory access and thus to reduce the power consumption of the running program, Fig. 14 .
Another consideration is that this program may not have the privileges to run while another program is running. So the scheduler should give the lowest priority to this program and run it when the system is idle. However, if it is decided that this program should run as a routine in response to an interrupt, then the context of interrupted process needs to be saved. In this case the 9 
C r i t i c a l P a t h 1 C A S E A C r i t i c a l P a t h 1 C A S E B C r i t i c a l P a t h 2 C A S E A C r i t i c a l P a t h 2 C A S E B C r i t i c a l P a t h 3 C A S E A C r i t i c a l P a t h 3 C A S E B
5 y e a r s Figure 11 : Path delay degradation of the most three critical paths in the two-bit adder considering unbalanced and balanced signal probabilities over the critical path. 
Evaluation and Discussion
To evaluate the effect of running the balancing program and how the signal probability will be affected on the critical path, we ran the balancing program along with different benchmarks and varied the percentages of the running time of the balancing from 10% to 50%. To compare results, we calculated the number of stressed nodes as follows:
Percentage of stressed nodes = # stressed nodes critical path nodes where the stressed nodes are those critical path nodes that have an SP(0) greater than 75% or less than 25%. As would be expected, to obtain a balanced state for the stressed nodes requires the balancing program to run for 50% of the time. Needless to say, it is not always possible to have this idle time or to add redundant time for the purpose of relaxing BTI. Fig. 16 shows the effect on the stressed nodes of running the BTI balancing program for different percentages of the overall time. Fig. 17 shows the effect of running the BTI balancing program for different times on the percentages of the stressed nodes in critical paths of the OpenRISC processor. The results show that balancing one critical path 10 1 l . movhi r1 , K 2 # S t i m u l a t e t h e f i r s t p a t t e r n 3 l . l w s r2 , 0 ( r 1 ) 4 l . l w s r3 , 1 ( r 1 ) 5 l . add r4 , r2 , r 3 6 # S t i m u l a t e t h e s e c o n d p a t t e r n 7 l . l w s r2 , 2 ( r 1 ) 8 l . l w s r3 , 3 ( r 1 ) 9 l . add r4 , r2 , r 3 . . 34 # S t i m u l a t e t h e e i g h t h p a t t e r n 35 l . l w s r2 , 1 4 ( r 1 ) 36 l . l w s r3 , 1 5 ( r 1 ) 37 l . add r4 , r2 , r 3 38 l . r f e # R e t u r n From E x c e p t i o n Figure 13 : Balancing program will balance other near-critical paths as they share nodes with the first path. On the other hand, if the near-critical paths do not share many nodes with the targeted critical path, it is possible to apply balancing patterns in the same way for the first critical path independently of the other paths. Although balancing signal probabilities would work with embedded systems that run specific applications, it is also possible to use the technique for a general-purpose processor. Fig. 18 shows how the percentages of stressed nodes on the first critical path of the OpenRISC processor reduce when executing the balancing program along with a different program from the MiBENCH benchmarks.
Next, to verify that the balancing program will reduce the degradation in the path delay of the processor, we simulated the adder using HSPICE and modelled NBTI using the MOSRA Level 3. We stimulated the circuit with two cases:
• CASE A (Normal Stressed Mode): Stress patterns with the equivalent signal probabilities of the Hello World program.
• 
Discussion
In our analysis, we expect, for example, 11% degradation in six years as can be seen in Fig. 19 , so a simple solution could be guardbanding. Guardbanding is inevitable, not only for ageing but also for PVT variations. However, adding more guardbanding would negate the advantage of using a smaller technology size. So we have to find an active protective approach as well as estimating, or sensing and reacting to degradations.
The idea of this work is to utilise the short idle periods in a processor, [6] . These are used to reverse the BTI stress rather than running empty loops. In our case study, the OpenRISC processor, the critical paths are in the adder and we can propagate patterns simply by loading the patterns into a register and executing an addition operation. This program should replace the idle task and should be executed whenever the operating system tries to schedule the idle task. In general, if the timing critical component is not the adder, then we have to replace the operation accordingly. If the critical paths are not controllable at the instruction level (e.g. in a control unit that may have many flip-flops) then we need an architectural solution rather than a software solution to propagate the patterns and currently we are working on this issue.
We also need to consider how process variations could affect the critical path ranking. If we get this wrong, we might heal a non-critical path and leave the real critical path unaffected. For this reason, we have 11 to consider not only the critical path but also the nearly critical paths that could become critical with PVT and time-dependent variations. In our case study, we have predefined (θ + ∆δ pv ) to be 20% of the maximum path delay at time zero (δ 0 ), as described in section 3.1, which covers the first 100 critical paths. However, we found that the first 100 critical paths share more than 92% of the cells with the most critical path and in this case balancing the most critical path also includes the 92% shared with the nearly critical paths. However, if the nearly critical paths do not share a big percentage of their cells with the first path, then we have to consider every single path in our analysis and generate patterns for them to balance signal probabilities in parallel. So, even with process variations, this technique would target the nearly-critical paths. If the nearly-critical paths do not share cells with the most critical path, it is important to define a threshold that considers the process and ageing variation contributions and to control these paths in parallel. Finally, running a program to balance the BTI stress raises other design issues. Time overheads and power consumption need to be optimised by reducing the memory access or by using a program that has only immediate mode operands, as memory accesses increase the power consumption. Another issue that needs to be considered is when and for how long the program needs to run. The obvious answer is during the idle time of the processor but also we need to consider the operating system actions during the idle state.
Conclusion
Application-specific high-level ageing analysis has been done to find a technique for CMOS ageing mitigation. In this work, the stress probability has been found at the application level down to the gate level. A cross-layer mitigation technique is proposed to apply stress-relaxing patterns to the critical paths of a functional unit of a processor during idle times. This paper presents a two phase technique to mitigate the BTI ageing effects. The first phase generates balancing patterns. The second phase applies these patterns by executing a program to balance the stress on the critical paths of the embedded systems to alleviate BTI effects instead of running an empty process idle task. In future work we will apply this technique as an architectural solution to control the paths in the non-software-controllable units of the processor. Also, we will apply stress balancing in multiprocessors or many-processors systems. The operating system scheduler of these systems will have a higher opportunity to run anti-ageing programs by assigning an anti-ageing process to a processor in an idle state, concurrently with other processors that are running user or system tasks.
