# An Embedded Health-Monitoring Infrastructure for a Reliable Multi-core Processor

Yong Zhao and Hans G. Kerkhoff

CTIT / Testable Design and Testing (TDT) Group University of Twente, Enschede, the Netherlands yong.zhao@utwente.nl, h.g.kerkhoff@utwente.nl

**Abstract**: An embedded health-monitoring infrastructure for highly reliable SoCs for datastreaming systems is presented. Different from the traditional testing approach for a dependable design, our infrastructure is based on prognostics from health-monitoring sensors that are embedded in the target processor. This enables the preventive repair by spare parts or priority ranking of tasks among processors. The health-monitoring scheme and the sensor structure are presented.

**Keywords**: health-monitoring, dependability testing, MP-SoC, prognostics, repair, reliability sensor, nano electronics.

## 1. Introduction

Multi-Processor SoCs (MP-SoC) have been intensively researched and developed for their powerful data processing capability. However, an emerging problem facing them is reduced reliability as transistors shrink and complexity increase. A test-for-dependability method has been designed earlier for accomplishing a dependable MP-SoC [1]. Rather different from this approach is the recent one [2] being the basis of this paper which applies health monitoring for prognostics to enhance the dependability. The architecture of this approach is illustrated in the second part.

An all-in-one health-monitoring sensor has been designed which is capable of carrying out voltage and temperature measurements as well as non-invasive reliability tests. Such technique features employing prognostics and health monitoring, emerging as achieving efficient maintenance and lowering life-cycle costs [3]. By using the IJTAG (IEEE P1687) standard [4], it communicates with the target processor, and the health information will be used directly by the processor in field operation. In section 3, the health-monitoring infrastructure is introduced and in section 4 we propose an all-in-one health monitoring sensor and its performance simulation. Finally, conclusions and future work directions are given.

# 2. The architecture of the reconfigurable Multicore Processor SoC

An MP-SoC for security applications was first published in (Fig.1, a)) [1], this system features a high degree of scalability, reconfigurability and dependability (including high reliability and availability). The approach for obtaining a high dependability is via periodic structural scan-based tests of the reconfigurable 9-processor SoC (90nm CMOS)

UMC) to detect against potential stuck-at faults. There are five Reconfigurable Fabric Devices (RFD) on one board and if a processor core in the RFD is found faulty by the onchip controlling Dependability Manager (DM), it is quarantined and a spare processor takes over its tasks. Its disadvantage is that it reacts after a fault has occurred.

In order to have a full availability of the system required in several security application, a new approach is applied based on health-monitoring and prognostics by measuring key parameters those can provide a clue on performance degradation in a SoC (Fig.1, b)). In the field operation, with the development of degradation near a core, at some point one can determine to either reduce the stress in the processor or label it faulty. The health-monitoring approach highlights in a non-invasive way to the potential faulty circuit.



Figure 1. a) Our existing MP-SoC system (45 cores) at PCB level. The inset shows the reconfigurable 9-processor SoC. b) Basic architecture of a new generic reconfigurable-processor (RP) mixed-signal SoC.

## 3. The Health-Monitoring infrastructure

Fig. 2 shows the health-monitoring infrastructure that is an IJTAG-based structure normally used during final test via the TAP controller. To improve the quality of the prognostics results, a number of embedded health monitoring sensors have been applied, and they are located close to the core, in a wrapper style. The monitor wrapper is basically a smart register, which can provide commands to the monitors, and collects digital measurement data that can be shifted out. It follows the protocol of the IJTAG.

To get the health-monitoring information for the subsequent task (e.g. remapping) directly in the processor, an 8-bit 2-terminal multiplexer, a single select line and an 8-bit bus are introduced between the IJTAG node and TAP controller. The mux is controlled by the resident embedded Xentium.



Figure 2. Proposed Health-monitoring infrastructure. a) shows the set-up of embedded health monitoring for a single Xentium core [2], b) the communication between health-monitoring sensors and Xentium cores.

# 4. The all-in-one health monitoring design

An on-chip ring oscillator has been proven to be effective in measuring (NBTI) circuit degradation causing additional delay [5]. Meanwhile, the output of the oscillator is affected by different environmental conditions (e.g., temperature and supply voltage). An all-in-one health-monitoring sensor is proposed in Fig. 3a. Two ring oscillators with 101-stage NAND gates and some switches are applied for different tasks. These oscillators are close to each other to ensure the same isothermal region. There are three operational modes for this sensor, being the calibration mode, normal voltage measurement mode and reliability measurement mode. Two pairs of multiplexers (i.e. T\_msm, T\_msm\_L and V\_msm, V\_msm\_L) are used to switch between each mode. First they can set ROSCr into the calibration mode if Vdd is selected. Finally one can turn off ROSCr to activate the reliability measurement mode. The outputs of the sensor in Fig. 3b, c and d have been simulated in the Cadence Spectrum and RelXpert.





Figure 3. a) Proposed structure of the all-in-one sensor for measurement of voltage and delay aging for reliability evaluation. b) and c) abstracted Cadence simulation of measured voltage and temperature versus equivalent integer digital word. d) Cadence RelXpert aging simulation in a 4000 hours lifetime under two different operating conditions.

### 5. Conclusions

An embedded health-monitoring infrastructure for a new highly dependable MP-SoC is proposed. An all-in-one sensor is designed which is capable of carrying out voltage and temperature measurements as well as non-invasive reliability tests. The sensory information for the embedded processor will be sent to the processor via the IJTAG. Based on the prognostics technology, this guarantees a high dependability and 100% availability for a certain lifetime of a very complex system. The future work will focus on the accurate design of the sensor in order to obtain better health monitoring prognostics of the target Xentium and finally the demonstration of the method in a platform.

Acknowledgements: This research is conducted as part of the Sensor technology Applied in

Reconfigurable systems for sustainable Security (STARS) project. For further information:

#### www.starsproject.nl

#### **References**:

- H. G. Kerkhoff and X. Zhang, "Design of an Infrastructural IP Dependability Manager for a Dependable Reconfigurable Many-Core Processor," In Proceedings of the Fifth IEEE International Symposium on Electronic Design, Test & Applications, 2010, pp. 270 - 275.
- [2] H. G. Kerkhoff and Y. Zhao, "The design of dependable flexible multi-sensory Systemon-Chips for security applications," in *IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)*, 2012, pp. 133-138.
- [3] N. M. Vichare and M. G. Pecht, "Prognostics and health management of electronics," *Components and Packaging Technologies, IEEE Transactions on*, vol. 29, pp. 222-229, 2006.
- [4] *IEEE P1687 Documentation*. Available: http://grouper.ieee.org/groups/1687/documentation.html
- J. Keane, *et al.*, "On-chip reliability monitors for measuring circuit degradation," *Microelectronics Reliability*, vol. 50, pp. 1039-1053, 2010.