Abstract: Semiconductor manufacturers aim at delivering high-quality new devices within shorter times in order to gain market shares. First silicon debug and diagnosis are important issues to be tackled in order to minimise the time-to-market and avoid expensive re-spins, while volume testing is necessary for guaranteeing acceptable quality levels. In this study, the authors propose an infrastructure intellectual property (I-IP) intended to be a companion for embedded processor cores. The proposed I-IP is an efficient, flexible, low cost and easy-toadopt solution for managing silicon debug, diagnosis and production test of microprocessor cores and of other cores in a system-on-chip (SoC), offering full support to the three domains of test, diagnosis and debug. A key characteristic of the proposed solution is that the requirements from the three domains are faced in an integrated manner, and the interface to the device during test, diagnosis and debug is a single one, supporting command-based interaction (instead of bit based). A prototypical design has been developed and integrated in an OGG Vorbis decoder SoC including a Leon2 microprocessor core, thus allowing a first practical evaluation about costs and benefits of the introduced I-IP-based approach. On this sample scenario, the key aspects in the process of testing, diagnosing and debugging a typical SoC are discussed.
Introduction
New test, diagnosis and debug techniques supporting the designers' efforts have to be adopted to face the increasing complexity of whole systems implemented on a single die [systems-on-chip (SoC)] [1] . The main challenges are related to two different phases in product design and manufacturing flow. The first one concerns the definition of effective debug and diagnosis methodologies for the new devices, in order to quickly ramp up to volume production. The second one involves devising efficient volume testing techniques for discriminating defective parts from the production lots: their effectiveness guarantees product quality, whereas their efficiency greatly impacts production cost.
An up front, careful planning of the test, diagnosis and debug strategies for a new circuit during system integration allows consequently faster production yield improvement and minimisation of final costs. Together with advanced pattern generation abilities, on-chip design-for-testability (DfT) hardware structures are mandatory for current electronic systems, and are often exploited for diagnosis, possibly with some enhancement [2] ; focusing on the SoC yield ramp up problem, on-chip-specific silicon debug infrastructures have been successfully exploited in the last years [3] [4] [5] [6] [7] . This paper investigates common requirements and shared characteristics of test, diagnosis and silicon debug strategies for microprocessor-based SoCs; it presents an overview of the hardware and software aspects of these topics, and proposes an infrastructure intellectual property (I-IP) able to manage the execution of test, diagnosis and debug while taking advantage of the cross-fertilisation between the activities.
The I-IP provides a unified interface for supporting test and diagnosis of microprocessor-based SoCs, and for interacting with the execution flow enabling easier silicon debug. Differently from other approaches in literature [5, 7] , it does not require any modification to the architecture of the cores or to the main SoC structure, which may introduce performance penalties or may even be impossible to implement on hard IP cores; in addition, it has been studied with a precise aim of flexibility and generality, so to be easily adaptable to different system configurations. Finally, the proposed solution complies with the emerging low-cost testing paradigm [8] by supporting software-based self-test (SBST) and software-based diagnosis (SBD) strategies and by featuring an interface wrapper compliant with the IEEE 1500 Standard for Embedded Core Test [9] . In this way, test, diagnosis and debug can be performed resorting to lowcost testers working at low frequency, while allowing at-speed test and diagnosis when required.
A prototypical design has been developed and integrated in an OGG Vorbis decoder SoC [10] including a Leon2 [11] microprocessor core, thus allowing a practical evaluation about costs and advantages of the approach.
Silicon debug and diagnosis terminology
In order to better understand the objectives and the purposes of our research, the employed terminology has to be initially clarified: as a matter of fact, the terms used concerning test, diagnosis and debug are sometimes confusing and inconsistent, with different professionals often using the same word with different meanings.
In general, the term 'debug' refers to finding errors ('bugs') that are introduced during design before silicon manufacturing. Such errors include logic errors, timing errors or physical design/process errors. A debug strategy should therefore aim at discovering, locating and identifying those errors and providing the designers with the information needed for design and process corrections. Many errors are found during pre-silicon (i.e. before manufacturing) design verification (design debug), relying on static analysis, simulations, design-rule checking and formal verification techniques; some other errors are found post-silicon (after manufacturing), resorting to silicon debug strategies, often based on improved observation and control on the execution of functional procedures on the chip. Diagnosis takes place in the post-silicon phase and aims at finding, locating and identifying physical defects that are possibly modelled by logical faults. The selected fault model determines the diagnosis procedure, which is often composed of a sequence of different test applications. Defects observed post-silicon can be of random nature (spots), or systematic (caused by physical design/process errors).
When the first silicon fails during volume (or end-ofproduction) testing, the nature of the problem (error or defect) is not known yet: this phase can be referred to as silicon debug and diagnosis [12] .
Cross-fertilisation principles
The issues introduced in the previous paragraph face diversified manufacturing aspects, but they present many similarities under the point of view of flow application and software procedure development. Let us consider the common points regarding test, diagnosis and debug.
Testing commonly relies on DfT dedicated hardware structures, such as scan chains and built-in self-test modules, or may exploit functional approaches like SBST. Other than for the implementation of hardware infrastructures, issues arise regarding the development of suitable test patterns and test application programs. The problem is particularly significant in the case of the SBST [13, 14] methodology, which is, anyway, a very attractive solution for current SoCs because of its flexibility, to the possibility of applying atspeed tests and to the reduced application costs and requirements introduced.
Diagnosis implementation stems from the testing background. Principally, it consists in the development of more powerful and usually longer procedures, compared to the testing-devoted ones, which provide an increased amount of very selective results, relying on scan-based and/ or software-based strategies [15 -18] . DfT suitable for testing purposes may be effective also for diagnosis, if adequate ways to monitor SoC responses and to customise test stimuli are available; otherwise, hardware add-ons are requested [design-for-diagnosability (DfD)].
A typical flow for silicon debug on a chip which fails system test includes the following steps [19] : † replicate and detect the failure on the tester, † isolate and identify the root cause of the failure, † confirm and fix the failure.
During a debug session, the SoC is run executing a functional operation for a while; afterwards, correspondingly to the occurrence of an internal event, the system is stopped or halted, and the contents of some SoC memory elements are read out and/or modified, for example, through scan chains. A debug session stops the execution at different points and can lead to a rapid identification of possible design errors in the actual prototypical silicon version; therefore mechanisms to stop a procedure, read the system status, possibly modify it and restart circuit operations are mandatory. In principle, procedure development for debug is more complicated than for test and diagnosis; anyway, SBST test and diagnosis pattern sets may represent a good starting point towards the identification of design or manufacturing process errors; these sets guarantee to stress [3] [4] [5] [6] [7] : they enable the interaction with the program execution flow, incrementing control and observation abilities, either through structural or functional methodologies [6] .
Proposed approach
The object of this paper is to propose a viable methodology for the implementation of an integrated functional/structural strategy for test, diagnosis and debug while maximising the exploitation of common aspects of the different activities. Such an issue is pursued through the development of a suitable I-IP core [20] that encompasses a set of features allowing the application of test, diagnosis and debug procedures at different SoC manufacturing/design stages. The I-IP provides a common interface for the application of the different flows, based on the same hardware and protocol. The adoption of a unique architecture in charge of tackling different purposes is aimed at reducing the redesign efforts, minimising area occupation and stimulating the interaction within different divisions involved in the time-to-market challenges.
The presented I-IP offers full support not only to the execution of SBST and SBD strategies, but also to scan test, and incorporate features for interacting with the execution flow and helping during silicon debug, supplying functional and structural accessibility to the cores [6] , through a common interface. Cross fertilisation among test, diagnosis and debug techniques is achieved by † resource sharing, † common automatic test equipment (ATE) interface, † common system interface.
The proposed methodology is not intended to substitute scan-based test and diagnosis flows, but to complement test and diagnosis with at-speed software-based methodologies and to add functionalities for interacting with the inspected system, facilitating and accelerating the discovery and localisation of errors. 
I-IP integrated abilities
In this sub-section, we address the analysis of the characteristics of a single I-IP supporting software-based test and diagnosis and silicon debug bottom-up: first of all, test requirements are dealt with, since they are the basis for diagnosis and debug, then diagnosis and debug requirements constitute an addition to the test ground.
Test:
To support different strategies is a primary need when testing SoCs. The adoption of SBST strategies is not alternative to the usage of scan chains, but it aims at facing at lower costs different fault models and, eventually, overcome scan test weaknesses. When SBST procedures are included in the testing flow, a suitable test program is written to be run by a microprocessor core; its execution results provide information for discriminating working from failing devices. To support SBST application, the I-IP structure should include facilities for † uploading the test program in a suitable memory area, † activating its execution, that is, letting the program run, thus stimulating the SoC components, † observing the microprocessor behaviour and retrieving the results suitably stored during the test program execution.
Test program code and data uploading strategies are based on the reuse of the system bus. Therefore some suitable mechanisms for code/data upload include: † selection circuitry for directly accessing the memory, † code and data bursts transfer from temporary buffers to the memory by running a preliminarily loaded software procedure. Test program activation may consist in simply resetting the SoC as soon as the upload operation is terminated, then letting the procedure run until its end. An alternative, more advanced solution assumes that the processor supports the interrupt mechanism [21] ; in this case, the basic idea is to transform the test program into an interrupt service routine (ISR). The address of the self-test program must be preliminary recorded into the interrupt vector table and the interrupt activation sequence must be user controllable. Consequently, the complexity of activating the self-test procedure depends on the interrupt protocol supported by the processor.
Results collection and download rely on test program abilities in activating possible faults and transferring their effects on registers or memory locations. Hence, additional instructions need to be added to the test programs to further transfer the fault effects to some easily accessible observability point (e.g. an external port, or a specific location in memory). The amount of test data produced can be huge, and mechanisms able to compact the results and send to the ATE a final signature, only, are often mandatory. A possible solution consists in the use of a multiple input signature register (MISR) integrated onchip, for computing and storing the test signature, which can be eventually downloaded or compared with an expected value to be synthesised in a go/no go bit.
Diagnosis:
The same I-IP features devised to support SBST procedures (program upload and activation, results collection) are exploited to implement a SBD strategy, such as the ones in [17, 22] . Differently from the test procedure, which provides go/no go information, this process may require the execution of a wider set of specific programs and a detailed analysis of results in order to determine and discriminate the faults causing the specific misbehaviour.
The diagnosis programs are generated following the approach proposed by Chen and Dey [22] , through the following steps: † a great number of short test programs are generated in order to partition the fault universe in as many subspaces as possible; † each program presents a reduced set of instructions to isolate faults related to different processor functional parts; † multiple copies of the same program are created, each propagating errors on different observable points in order to distinguish the faults affecting the processor outputs; † at the end of the test set creation a binary tree is built for use in the actual diagnosis process.
The abilities of stopping the test program at a specific instant and observing the results produced up to that point, through functional reading of specific registers or through scan chains [18] , as described next, is crucial to better support diagnosis.
Debug:
A diagnosis/debug session is based on iterated executions of test programs and/or mission-like procedures, stopped at different points in time to access the contents of state holding elements, in order to identify the phases (time) when errors or unexpected behaviours are generated and understand the causes (and location) of the failures. These activities are supported by modelling and simulation, and usefully exploit the possibility of interacting with the system execution flow by reading and writing specific data. In addition to the features to support SBST/SBD approaches, the I-IP structure provides advanced functionalities for the silicon debug of embedded microprocessors/microcontrollers and the surrounding cores in a SoC.
Once a debug program has been loaded and launched, the following functionalities provided by the I-IP may be exploited: † To allow the user to activate breakpoint conditions based on the observation of the system bus and/or other critical signals and on internal counters. † When a given breakpoint condition is met, to enter one of the following debug modes, depending on the solution programmed by the user: -ISR mode: A special ISR is activated that halts the processor; it allows the ATE to access user registers and memory locations, or other peripherals.
-Clock-gating mode: It stops the system and allows the ATE to access to the scan chains.
-Snapshot mode: The flip-flops content is gathered, by downloading the core's scan chains into the I-IP and, discretionally, compressing the collected data through a linear feedback shift register (LFSR); † To single-step the processor execution either acting on the clock (cycle-by-cycle) or relying on trap mechanisms (stepby-step). † To resume the execution of the suspended program, possibly after having changed the contents of the system state-holding elements.
ISR debug mode: This debug mode requires the previous configuration of the processor interrupt control logic and the upload of a specifically developed debug ISR code into memory. These operations can be performed resorting to the previously described code upload features. When activated, the debug-tailored ISR allows the user to interactively perform read and write operations on the processor registers and on memories, and if possible to access other cores connected to the processor bus, providing functional accessibility. When running the debug ISR, the processor cyclically polls the content of an I-IP port connected to the system bus for commands. Based on the word read from such port, the ISR executes the proper action (e.g. read or write a specific register/ memory location, halt, single-step, continue) and may write back results on another I-IP port. The user interactively sends suitable commands and reads results through the IEEE 1500 interface: Fig. 2 shows the realisation principle of this debug mode and the pseudocode of a sample debug ISR.
Clock-gating debug mode: For a more thorough debug technique, however, the I-IP supports structural accessibility through scan-based approaches: in this case, the I-IP has to be interposed between the clock source and the inspected core. An integrated clock gating mechanism allows stopping the microprocessor as soon as the breakpoint hits, so that its internal registers can be accessed through the scan chains (supplying an external clock). The same mechanism may be exploited for single stepping the processor execution (cycle-by-cycle); suitable user commands request the execution of a step or the reactivation of the clock. Snapshot debug mode: If some reason (e.g. timing closure requirements) prevents from the use of clock gating techniques, or the flip flops do not allow the usage of an alternative clock source, this debug mode allows taking a snapshot of the flip flops contents. In this case, when the breakpoint logic hits, the clock is left running while the microprocessor flip flops are switched to scan mode, so to allow the flip flops data to flow through the scan chains. Data coming from them are saved (if needed, compressed through a LFSR module) for observation, and are concurrently fed back into the same scan chain so not to modify the original content. As soon as the whole chain has been downloaded, the scan enable signal is de-asserted, and the system continues running as before the break.
I-IP-specific features
The I-IP is firstly intended to add test, diagnosis and debug functionalities when no other integrated debug infrastructure is included on-chip. However, with respect to state-of-the-art embedded processor-specific proprietary debug interfaces (e.g. Motorola/Freescale's BSM and OnCE, Texas Instruments' MPSD, IBM's JTAG debugger, ARM's EmbeddedICE etc.), the proposed I-IP presents the following main features: † Close integration of test, diagnosis and silicon debug features and flows, from supporting test programs upload in the system memory, test program run and result compaction to circuit behaviour analysis. † Integrated access to scan chains in order to increase observability also on the memory elements normally not accessible through software (e.g. processor pipeline registers). † Support for the debug of other cores within the SoCs, not only for embedded microprocessors. † Flexibility and adaptability to different processor architectures and SoC configurations with minor modifications. † Compliancy with the IEEE 1500 Standard for Embedded Core Test, thus guaranteeing easy integration in current test flows.
With respect to other SoC silicon debug approaches, the proposed methodology provides integrated abilities for software-based test and diagnosis as well with a unique low-cost interface, and does not introduce any requirement for core or system modifications. The SBST support unit includes the logic for test program upload and activation through the interrupt request. The module takes advantage of the connection to the system bus, through which all the required operations are performed. Memory and register addressing and system bus protocol managing devices are included.
I-IP architecture
The debug unit includes the logic for programming and activating the breakpoints, for selecting the debug mode, and actuating it consequently. It includes the logic for implementing the clock gating mechanism and for taking the scan chain snapshot (the latter feature is graphically exposed in Fig. 4) . Moreover, it is the module in charge of interacting with the microprocessor during the debug ISR by communicating on the system bus through suitable ports. The main clock and the scan chain control signals (test scan enable, scan in and scan out) need to be routed inside this unit, thus allowing control over the core execution and scan chain operations.
The results module includes a programmable LFSR/ MISR and its control circuitry. Considering processor models adopting memory mapped I/O addressing, the results module is connected to the system bus and accessed like a memory location: signatures to be compressed are sent by using generic transfer instructions. Otherwise, the results module is connected to a port of the processor core and accessed by specific I/O instructions.
The (optional) memory buffer can be seen by the processor core as a set of memory locations in the I/O space and used to store both instructions and data; it may be used to store the entire test program, or a part of it, while waiting to move it into the system memory, or to directly execute it, if the processor allows this.
Communication with the ATE is achieved resorting to a proper IEEE 1500-compliant interface wrapper, which includes, further than the standard required registers (WIR, WBY and WBR), two registers directly connected to the test command and test data ports of the I-IP. This test interface configuration introduces at least two benefits towards the usage of low-cost ATEs. The former is related to ATE test program creation: communication between ATE and I-IP is not intended as vectorial pattern transmission but relies on high-level commands controlling the test/diagnosis/debug operations and data sending/receiving; this commodity allows abstracting the communication protocol and easily generating the final test/diagnosis/debug program. The latter involves the frequencies management: the IEEE 1500 wrapper structure allows separating the frequencies of test/diagnosis/ debug procedure execution (fast) and commands and data transmission (slow). This property allows the use of low-cost ATEs providing limited high-speed channels, while at-speed tests strategies take great advantage of PLL clock generators included in SoCs design [23] .
The I-IP grants application flexibility to different processor and SoC architectures without drastic redesign. When switching from a system to another one, the I-IP 
Case study
In order to demonstrate the feasibility of the proposed approach for processor test, diagnosis and silicon debug, an implementation of the I-IP has been applied to the Oggon-a-chip, an Ogg Vorbis decoder [10] whose structure is depicted in Fig. 5 . This system is able to receive compressed audio data blocks and to elaborate this information to be sent out as a complete music stream.
The Ogg Vorbis is mainly composed of four functional modules: † Aeroflex Gaisler LEON2 [11] , a 32-bit processor compliant with the SPARC v8 architecture, † a 256 kbyte memory (organised in two banks), † two combinational user defined logic (UDL) cores.
Processor-related issues
There are some issues that have to be addressed once a specific target processor has been selected. In the current case, the I-IP has to be connected with the AMBA bus employed in the system, and suitable addressing and protocol have been implemented. In addition, the ISR needs to be tailored to the specific interrupt protocol. The LEON2 microprocessor integrates an interrupt controller that is used to prioritise and propagate interrupt requests from internal or external devices to the integer unit. In total 15 interrupts are handled, divided in two priority levels. A hardware interrupt causes a trap to occur, transferring the control to a supervisor software addressed by a table, whose address is given by a privileged register (the trap base register).
The processor I/O ports can be used as interrupt inputs from the external devices, corresponding to several interrupt levels. The steps needed to configure the interface to the I-IP are the following:
1. programming the memory configuration register, to enable data transfers on the parallel I/O ports; 2. programming the I/O port interrupt configuration register, to associate one of the parallel I/O ports to the required interrupt;
3. programming the interrupt mask register, to enable the necessary interrupts; 4. updating the trap table, by inserting the first four instructions of each trap handler in the boot sequence. Table 1 is a list of the test, diagnosis and debug commands supported by the I-IP, which can be sent from the outside to the I-IP resorting to the IEEE 1500 interface wrapper through the test command and test data ports. The implemented commands allow performing software-based test and diagnosis operations and grant functional (through the use of an ISR) and structural (through scan chains) accessibility to the SoC state holding elements for debug, as described in Section 2.1.
Implemented commands

Experimental results
To evaluate the effectiveness of the approach in terms of hardware cost of the proposed architecture, the RT-level VHDL description of the I-IP and its IEEE 1500-compliant wrapper have been synthesised using the Synopsys Design Compiler tool with a generic gate library. Results are reported in Table 2 , showing that the added area overhead is negligible when compared to the size of the whole system. It is important to remind that the test, diagnosis and debug functionalities were added to the SoC without modifying the microprocessor core.
Concerning the procedures run on the SoC for test and diagnosis of stuck-at (SA) faults in the Leon2 core, Table 3 shows some figures related to the test results obtained on the described environment through the usage of suitable test procedures written by a skilled test engineer in about 1 month of work. For the SA fault model, the SBST/SBD procedures were generated following the manual approach described in [14] for test and the automatic one discussed in [17] for diagnosis. The testoriented procedure is mainly loop intensive and it is constituted by 18 separated programs, each of them oriented to a particular functional part of the core. Faults The software-based test runs on a low-cost tester: test data transfers are performed at low frequency (10 MHz) through the IEEE 1500 interface, while for the at-speed program execution a free running 100 MHz clock is employed. Application time in Table 3 includes test programs upload and execution.
The SBD procedure directly descends from the test set and maintains unchanged its coverage ability [17] . The diagnostic resolution achieved in the case study is quite high as about 70% of faults can be singularly diagnosed.
For the sake of comparison, a scan test set has been developed and applied to the circuit. The processor counts 49 510 flip flops and eight scan chains, the generated scan set includes 2826 patterns and guarantees 99.7% fault coverage. The application time is calculated considering a high-performance tester running at 100 MHz.
The supported software-based test and diagnosis strategies provide interesting results when considering different fault models as well, including delay-dependent ones [18, 24] ; in this case, the obtained coverage complements the one reachable by scan-based approaches without introducing overtesting issues because of functionally untestable faults [25] .
Concerning debug, test and diagnostic programs have been initially employed; afterwards, additional investigation has been performed by waking up the OGG Vorbis operative system and playing a selection of MPEG-4 audio files. The debug sessions were run on sabotaged versions of the OGG Vorbis SoC (including the I-IP) synthesised on a Xilinx SET_BREAKPOINT x n set breakpoint parameters x and n; the I-IP will trigger a breakpoint as soon as the address bus holds value x for the nth time RESET_BREAKPNT_COUNTER resets the breakpoint counter RESET restart the system (acts on reset signal) and the counter PROGRAM_CK_INT PROGRAM_AD_INT these instructions set the counter to be activated at each rising clock edge or when a matching address appears on the system bus, respectively, and program the debug unit to generate an interrupt for activating the ISR debug mode at breakpoint
PROGRAM_CK_CKG PROGRAM_AD_CKG
same as the previous instructions, but activate clock-gating debug mode at breakpoint (stop the clock at the specified cycle and enable scan chain download)
PROGRAM_CK_SS PROGRAM_AD_SS
same as the previous instructions, but activate snapshot debug mode at breakpoint RD_STATUS read internal status of the I-IP monitoring the microprocessor core (whether it is executing a normal program, a test program, the debug ISR, or it is in halt mode)
READ_REGISTER x READ_MEMORY x
selects the specific microprocessor internal register/memory location to be read When faults cannot be determined through predetermined diagnostic procedures, the debug activity is based on the execution of a functional procedure. As soon as an unexpected event arises, the execution is stopped. Then, a new execution is launched in the same conditions, and a breakpoint is set at some point in time preceding the evidence of the misbehaviour. Then, the ISR debug mode is started and the contents of the processor registers are read; single stepping is used to investigate the circuit behaviour in the following clock cycles, and scan-chain content download can provide more information when needed. Following the guidelines given in [26] , through a reasonable number of iterations of the described process, it has been possible to find the critical clock (to identify temporally when a fault is occurring) and to precisely locate the cause of the failures, by following and comparing the firmware execution on an instruction set architecture simulator and by checking the algorithms computation.
Conclusions
It is widely acknowledged that the issues of test, diagnosis and silicon debug of SoCs require performing analysis and finding solutions since early in the chip design phases. Although the addressed problems are related to different stages of the chip manufacturing flow, they have common requirements that can be supported by a single infrastructure, thus reducing the redesign efforts, minimising area occupation and unifying the protocol for test/diagnosis/debug procedures development and application.
The I-IP architecture presented in this paper provides effective support for test, diagnosis and debug of microprocessor-based SoCs, leveraging the cross fertilisation among the introduced resources and functionalities. The proposed approach guarantees a high flexibility in the applicable flows and easy adaptability to different processor configurations, and is particularly suitable if the SoC includes a microprocessor core that cannot be modified by the designer.
The I-IP debug module supports SBST and SBD strategies, together with different debug methods, including those relying on scan-based read out. We demonstrated the effectiveness of software-based procedures for test and diagnosis in complementing scan-based strategies on a sample case study. Since the I-IP is IEEE 1500-compliant, it can be programmed through high-level commands, and thus it is particularly suited to be interfaced with low-cost external test equipments. Preliminary experimental results show the feasibility of the approach at the cost of limited area overhead. 
References
