The advent of the core-based system-on-a-chip (SOC) and reuse methodologies enables integration of cores from different sources into a single chip. Compared with the traditional multichip system on a board, SOCs offer benefits including higher performance, lower power consumption, smaller size, and so on. Different types of cores are usually incorporated into a single SOC design. These cores can include CPUs, digital signal processors, synchronous RAM, flash memory, analog-to-digital converters, digital-to-analog converters, and phase-locked loops.
In addition to these types, some cores are hierarchical compositions, that is, one complex core composed of multiple simple cores. 1 Although the SOC design process is analogous to that for boards, a SOC's manufacturing test methods are quite different. Board providers have already tested the ICs on the board. Normally, SOC integrators only need to test the board interconnects for manufacturing defects. In SOC design, however, the cores are not yet manufactured and tested. The core integrator is responsible for manufacturing and testing the chip, including the cores. SOC testing includes core internal and external tests, core test knowledge transfer, test access, test integration and optimization, and so on. 1 Testing SOCs is more challenging than testing boards.
Recently, researchers have reported results in testing SOCs using architectures compliant with IEEE 1149.1, also known as the Joint Test Action Group (JTAG) standard. One work proposes a systematic solution for accessing embedded JTAG cores hierarchically, using a test access port (TAP) linking module to handle the interaction between the upstream and downstream TAP controllers. 2 Another work reported a hierarchical TAP architecture. 3 This architecture supports test access to embedded JTAG cores with a snoopy TAP. The pin requirement and behavior of this design's TAP controller is fully compatible with IEEE 1149.1. However, these methods do not handle the test control of IEEE P1500 cores, an important capability because P1500 is more suitable for core-based SOC testing and is getting more and more popular.
Another research project uses a central TAP controller consisting of an 1149.1-like TAP finite-state machine and a counter to control P1500 and other cores with a TAP. 4 Another test control mechanism provides a hierarchical test capability for IEEE 1149.1 and P1500 cores, but it requires 10 additional pins to operate the major component-the central test controller-and tests only one core at a time. 5 In all these works, researchers did not consider controlling memory cores with built-in self-test (BIST).
Our approach is a hierarchical test methodology for testing a SOC with heterogeneous cores, including the 1149.1-wrapped, P1500-wrapped, and BIST memory cores. We propose an 1149.1-based hierarchical test manager that also provides P1500 test control signals. This scheme includes a memory BIST interface, providing both serial and parallel access ports for BIST circuits. Our approach offers low area and pin overhead, and high flexibility.
IEEE P1500 scalable architecture
To solve the problems mentioned previously and for easy test automation, we require a standard test interface for the cores. Figure  1a shows a generic, scalable architecture for SOC test. The IEEE P1500 working group proposed this architecture. The proposed IEEE P1500 standard aims to standardize the core test wrapper and core test language. The scalable architecture consists of
• the user-defined, parallel, test access mechanism (TAM) for delivering the test patterns and responses in parallel, • standard core test wrappers that can isolate the cores and provide different test modes, and • a user-defined test controller for controlling the wrapper and TAM. 6 Sources, which can be off-or on-chip, generate test patterns. Sinks can also be off-or on-chip; they evaluate test responses. Serial test access is always possible using the serial interface layer (SIL) provided by the standardized P1500 wrapper, a mandatory part of any P1500-compliant implementation.
Although the wrapper is mandatory, the TAM and test controller are user defined. Adopting a common test integration and optimization procedure is unnecessary and usually impossible since test requirements and goals of core providers, SOC integrators, and chip fabricators are different. Figure 1b depicts the IEEE P1500 core test wrapper architecture. 6 It contains the following elements:
• Wrapper instruction register (WIR • Wrapper boundary register (WBR). Consisting of the wrapper boundary cells (WBCs) that wrap the cores' normal I/O pins, this register adds control, observation, and isolation capabilities to the core's normal functions.
• TAM. The external TAM couples to the core via the parallel interface layer.
The wrapper has four major operational modes: 6 • normal, in which the wrapper is transparent and the core operates normally; • inward facing, in which the test access is for the core itself; • outward facing, in which the test access is for the external circuitry; and • safe, in which the WBCs force the core inputs to a fixed pattern.
The first three modes are mandatory in P1500-compliant implementations, and the last one is recommended.
At the chip level, we must optimize the TAM and schedule the core tests. Test wrapper and TAM cooptimization is important for the SOC integrator, because this will have a direct impact on the area overhead and automatic test equipment's vector memory depth. Integrators must design the TAM to accommodate the routing constraints among the cores and the systemlevel power constraints. 7 Integrators must schedule the core tests to minimize the total SOC test time, subject to power and area constraints for testing. 8 Here, we explored parallelism at both the chip and core levels to reduce test time.
Test wrapper generation
The P1500 test wrapper provides an interface between the core and its surrounding test circuit on the chip. Manually wrapping the core not only requires full knowledge about the wrapper, but it also is time consuming and prone to error. Therefore, a wrapper generator is necessary.
Wrapper cell library
To support the wrapper's operational modes, we define five events for the WBC: shift, capture, apply, update, and transfer. P1500 mandates the inclusion of the first three events; the last two are optional. Figure  2 depicts three types of single flip-flop wrappers for core input terminals. In the figure
• CFI denotes the cell functional input (from a chip input, another core, or userdefined logic); • CFO, the cell functional output (connected to the core); • CTI, the cell test input; and • CTO, the cell test output.
The control signals include shift control (SC), capture control (CC), apply or capture control (AC), and safe value (SV). The WIR decoder and WIP provide these control signals; they depend on the instruction stored in the WIR and the WIP input signals. The SC, AC, and CC signals determine whether the wrapper cell is under the shift, apply, or capture event. The SV signal forces the CFO's connection to a prespecified safe value. Figure 2a shows a minimal wrapper cell that does not support safe mode. 9 Assume that the upper input is connected to the output of each multiplexer if the multiplexer control signal is logic 1. When the test controller shifts in the test data with SC = 1, the data will load into the flip-flop via CTI. The shift event occurs by asserting ShiftWR. When ShiftWR, UpdateWR, and CaptureWR are 0, SC and AC will both be 0 to hold the test data. In the capture event, the flip-flop captures data when CaptureWR transitions to 1, and both AC and SC are 0. The circuit in Figure 2b supports safe mode, but is otherwise the same as Figure 2a . During test data transformation, SV can select 0 to tie CFO to the safe value. But any signal passing from CFI to CFO incurs a delay crossing each multiplexer. So Figure 2c shows a wrapper cell with a lower delay penalty but higher area overhead than the circuit in Figure 2b .
71

SEPTEMBER-OCTOBER 2002
Wrapper cells for core output terminals are the same as those for the inputs, but the control signals differ. This is because in some instructions, input and output wrapper cells stay in different modes. For example, in the WCORETEST instruction, the capture event only exists in the output wrapper cell, while the apply event only exists in the input wrapper cell. Tristate and bidirectional wrapper cells work similarly, but require an additional enable wrapper cell. We have built a wrapper cell library in a 0.25-micron CMOS technology, implementing 16 types of wrapper cells for input, output, tristate, and bidirectional terminals.
Wrapper instructions
The WIR stores an instruction and provides the interface to other P1500 components (for example, the data registers) and the core. Setting SelectWIR high selects WIR to connect WSI and WSO, and readies WIR for instruction loading. Setting SelectWIR and ShiftWR high shifts in instructions during the clock's rising edge. The instruction will not change if ShiftWR stays low. Setting UpdateWR high loads the instruction into WIR's update stage during the clock's falling edge. WIR will then decode the instruction and send the control signals to WBR and the core data registers.
WIR also supports the capture event. Setting both SelectWIR and CaptureWR high captures a user-specified vector into WIR's shift path.
The proposed P1500 standard provides the following instructions and corresponding wrapper configurations:
• WBYPASS (mandatory). This instruction selects the bypass register to connect WSI and WSO, and makes the WBR transparent to support normal functional access.
• WEXTEST (mandatory). To provide external controllability and observability of the core, this instruction selects the WBR to connect WSI and WSO.
• WCORETEST (optional). This instruction enables external access to WBR through the SIL or TAM, supporting user-specified tests (for example, scanbased tests).
• WSCORETEST (optional). To support core testing, this instruction configures WBR and internal scan chains into a single scan chain between WSI and WSO.
• WPRELOAD (optional). This instruction connects WSI and WSO through WBR to loading the desired test pattern into WBR.
• WSAFESTATE (optional). In this configuration, WBY forms the path between WSI and WSO, forcing a particular core into a safe state such that tests for other cores do not damage or interfere with it.
• WCLAMP (optional). This configuration functions similarly to WSAFE-STATE, except that it loads the safe value from WBR. It also permits users to define more instructions.
Hierarchical test scheme
A hierarchical test scheme lets test engineers use test results obtained at a lower level at higher levels with test protocol expansion. That is, they can reuse a lower-level core test at a higher level. The test architecture must be reusable-that is, engineers develop the test hardware only once and target it for repeated use. Also, designers must define a single test interface and control signals that work for all levels; this makes possible plug-andplay tool integration. Figure 3 (next page) shows an example SOC implementation of the top-level architecture of our proposed hierarchical test scheme. The basic SOC includes 1149.1-and P1500-compliant cores, and a BIST memory core. It also incorporates a hierarchical-test core containing two P1500 cores. tion registers before performing each step. During test configuration, loading the instructions for the wrappers and MBIs configures each core into a specific mode for test or for bypass.
Next, the user specifies the cores to be tested by the TAM by shifting a binary sequence into the selection register of each HTM. The most significant bit of the selection register in a lower-level HTM specifies whether or not the lower-level TAM connects to the TAM at the current level. Other bits specify the connection between the cores and TAM at the same level. For example, there is a three-bit selection register (bit 2, bit 1, bit 0) in HTM 2 of Figure 3 . Bit 2 specifies whether TAM 2 connects to TAM 1, and bit 1 and bit 0 specify whether cores 4 and 5 connect to TAM 2. HTM, wrapper, or MBI instructions can also specify TAM selection. In this case, the designer can remove the selection register and omit this test step.
For test transportation, the TAM and serial I/Os import the test patterns to the cores under test and export their test results according to the test configuration.
Hierarchical test manager
The HTM architecture in Figure 4 consists of a test manager and a hierarchical test interface. The test manager extends from the TAP controller and consists of a finite state machine (FSM); instruction, bypass, selection, and boundary registers; and a wrapper control encoder (WCE). The main difference between the TAP controller and the test manager is the WCE. With the TAP controller, the WCE generates the ECS (including ECS0, ECS1, and ECS2), according to the instruction and the FSM state. This approach greatly reduces the number of control signals for a SOC with many P1500 cores. The hierarchical test interface consists of a switch box, which specifies the connection of the serial test access I/Os. The switch box allots access based on control signals from the test manager. The P1500 BYPASS, EXTEST, and SAM-PLE/PRELOAD instructions are the same as those for the 1149.1. Using these instructions, we can interpret the states in group 1 of the FSM as (state 0, state 1, …, state 6) = (Select-DR, CaptureDR, ShiftDR, Exit1DR, Pause-DR, Exit2DR, UpdateDR). That is, the HTM's function is the same as that of the TAP controller.
The LSELECTWIR, LTAMSELECT, and LSELECTWR instructions are the local P1500 instructions. Under these instructions, only the cores at the same level with the HTM form the WBR; configuring the switch box forms the WBR. The LSELECTWIR instruction forces the HTM into test configuration phase-where (state 0, state 1, … , state 6) = (ShiftWIR, CaptureWIR, ShiftWIR, ShiftWIR, UpdateWIR, CaptureWIR, UpdateWIR)-and selects the bypass register. The WCE encodes the ECS signals according to these HTM states.
The wrappers and MBIs shift and update the instructions into the instruction registers by controlling the TMS_UP. The LTAMSE-LECT instruction enables the selection register such that a binary sequence can be shifted into the register to specify the cores connected to the TAM. The LSELECTWR instruction is for test transportation, under which (state 0, state 1, …, state 6) = (ShiftWR, CaptureWR, ShiftWR, ShiftWR, UpdateWR, CaptureWR, UpdateWR); it also selects the bypass register. The WCE encodes the ECS signals according to these states, and the WCI decodes the ECS signals into the WIP signals. Controlling TMS_UP can then transfer the test patterns to the cores under test. Table 1 shows the WCE outputs for the respective instructions and FSM states. ECS0 depends on the instructions, while the FSM states determine ECS1 and ECS2, such that we can specify the operations by TMS_UP. If it is an 1149.1 mandatory instruction, then (ECS0, ECS1, ECS3) = (0, 0, 0). P1500 instructions define only three different states in group 1: ShiftWR, Table 2 lists all possible switch box configurations. For example, if the BYPASS, EXTEST, SAMPLE/PRELOAD, or LTAM-SELECT instruction is loaded, then the TDO connects to TDO_UP, both TDI_C and TDI_H are 0, and both TDO_C and TDO_H are disconnected (represented by an "X" or don't-care signal). The configurations for LTAMSELECT and GTAMSELECT are different from those of other P1500 instructions because only the HTM selection registers need configuring.
75
SEPTEMBER-OCTOBER 2002
Having the FSM in a group 2 state forces the switch box to the same configuration as GTAMSELECT. All HTMs are serially connected into a chain so that we can load the desired instructions into the HTMs by controlling the TMS_UP pin.
Wrapper control interface
The wrapper control interface (WCI) decodes the ECS signals into wrapper signals SelectWIR, ShiftWR, UpdateWR, and CaptureWR. Figure 6 shows the WCI architecture, composed of a 1-bit register and decoder. All the WCI registers at the same level connect serially into the selection register. Each bit of the register specifies whether the corresponding core connects to a TAM or not. The ECS0 signal directly connects to SelectWIR. The ShiftWR, UpdateWR, CaptureWR, MuxSelect signals are 1 when (ECS1, ECS2) = (0,0), (1,0), (0,1), or (1, 1) . This reduces the routing overhead because the ECS signals go to the cores from the HTM. If MuxSelect = 1, then the register takes the input from Pre_out (previous register output). If the register value is 1, then TAMSel = 1, and the core connects to the TAM.
Memory BIST interface
Testing for embedded memories generally uses BIST, which is becoming the default test solution in this area. However, a SOC can contain tens or even hundreds of memory cores (including buffers and register files). Although most chips usually have only a few (about eight) BIST controlling pins, the total number of test pins would be huge if each memory core had its own BIST interface. This makes it necessary to share the BIST interface (and even the BIST circuits). Figure 7 (next page) shows our proposed memory BIST interface (MBI), which consists of the instruction, bypass, monitor, and status registers, and the programmable switch. output (MSO) transfer test data to and from the registers. The MBI interface port (MIP) contains the same control signals as the P1500 WIP. Although the P1500 wrapper can wrap embedded memories, the MBI does not wrap the functional I/Os because the BIST circuit already isolates them, removing the WBR cost. The programmable switch determines whether the TAM or the MBI handles the BIST I/Os. In the latter case, the monitor and status registers observe the BIST outputs. The monitor register monitors the error flag (indicating whether a memory fault is detected or not) or exports the diagnostic data (consisting of the faulty cell/word address, March syndrome, and Hamming syndrome 10 ) on the fly. The status register records key status values, such as the FAIL (go/no-go) output from the BIST circuit.
Our hierarchical test architecture uses the BYPASS instruction as the MBI default instruction, as does P1500. The RUN_BIST instruction runs the BIST circuit in test mode. The monitor register is connected between the MSI and MSO. The RUN_DIAGN instruction forces the BIST circuit to operate in diagnosis mode, the monitor register exports the diagnosis data. Normally, only one memory core can be in the diagnosis mode at any time. However, if the TAM handles BIST operations, then the TAM's width determines the number of memory cores that BIST can concurrently diagnose. The EXPORT_STA-TUS instruction exports the status register's content, and the TAM_CONTROL instruction configures the programmable switch to connect the BIST I/Os to the TAM.
Under the RUN_BIST instruction, we can test multiple memory cores concurrently. Figure 8 depicts the monitor registers' configuration. When one or more BIST circuits detect faults, the primary MSO (MSO_N) will be high after N − K clock cycles if the concurrent output of the (K + 1) through the N memory cores are fault free. The EXPORT_STA-TUS instruction then exports the status register's values, and the faulty memory cores can be identified.
Hardware and test reuse
One major feature of our proposed test methodology is reusability. If a core design uses this methodology, integrators can incorporate the core into a larger core that implements the same test methodology. Thus the integrator does not need to modify the test hardware. Assume that core 1 incorporates the hierarchical test scheme. If users want to design another core (core 2) with the same test scheme, then they can integrate core 1 into core 2 without modifying its test hardware. In this case, only the upstream HTM I/Os of core 1 connect to the downstream HTM I/Os of core 2. So this strategy reuses both hardware tests.
As discussed previously, the test procedure has three steps: test configuration, TAM specification, and test transportation. sion. We assume the two-level hierarchy shown in Figure 3 to simplify the explanation. Initially, let the FSMs of the HTMs be in the test-logic reset state in Figure 5 . 
Experimental results
We have implemented the proposed hierarchical test scheme in an industrial design containing 11 components: a phase-locked loop, chipset, CPU, two universal asynchronous receiver transmitters (UARTs), and six RAMs. We tested the phase-locked loop through the chip's primary I/Os. Three cores contain other components: Core 1 is the CPU with four RAMs, core 2 is the chipset with two RAMs, and core 3 includes the two UARTs. Cores 1 and 2 are simple cores with P1500 wrappers, having BIST circuits for their embedded memories. Core 3 is a hierarchical core, containing two smaller P1500 cores. We used internal scan to test the logic circuits. Cores 1, 2, and 3 have 16, eight, and two scan chains. An 8-bit daisy chain TAM also transports the test data. Figure  10 shows a photo of the chip implemented using a 0.25-micron CMOS technology. We tested 90 samples using the proposed test methodology, with a yield of 92.22 percent.
The hardware overhead of the hierarchical test scheme is (in terms of gate count) We define the hardware overhead as HO = (wrapped core area − core area) / core area.
The clocks, reset signals, and so on are normally not wrapped. We consider only the wrapped I/Os. Table 3 shows the hardware overhead values for the core test wrappers. Table 4 shows statistics for the internal scan chains. The scan length denotes the length of the longest scan chain in each core. To fit the number of the scan chains to the TAM width, we configured the 16 scan chains into eight scan chains. Core 1 has the lowest fault coverage and the most test patterns. In contrast, core 2 has the highest fault coverage and fewer test patterns. The fault coverage numbers are low because the designs have many unscanned latches. Table 4 's last column reports the corresponding power consumption for individually tested cores (all cores except the core under test are in safe mode). Safe mode can reduce the power consumption and avoid damage to the untested cores during the pattern shifting process.
We now analyze each core's testing time. Here, we do account for the time to shift the instructions to instruction registers because it's short compared with the test time. n the future, we will investigate the SOC diagnosis methodologies based on our hierarchical test scheme. We will also study the development and integration of infrastructure cores for SOC manufacturability, especially along the directions of design debugging and yield enhancement.
MICRO
Jeng-Bin Chen is an associate researcher at the Chip Implement Center. His research interests include design for testability and SOC testing. Chen has a BS in electronic engineering from Chung Yuan Christian University, Chungli, Taiwan, and an MSEE from National Tsing Hua University.
Chih-Pin Su is a PhD student at National Tsing Hua University. His research interests include design and test of high-performance VLSI circuits and SOCs. Su has a BSEE and an MSEE from National Tsing Hua Univer- Chuang Cheng is an ASIC consultant at Faraday Technology, Taiwan. His research interests include design for testability, especially memory and SOC testing. Cheng has a BS and MS in electrical engineering from National Tsing Hua University.
Shao-I Chen was a DFT engineer at Faraday
Technology. She is now a graduate student studying the management of intellectual property law at Queen Mary College, University of London. Her research interests include design for testability and design automation. Chen has a BSEE from Yuan Ze University, Taiwan.
Chi-Yi Hwang is vice president of ASIC technology at Faraday Technology. His research interests include design automation, especially physical design. Hwang has a BS in computer science from Tatung Institute of Technology, Taipei, and an MS and PhD in computer engineering from National Tsing Hua University. Complex SOCs are present in a growing number of applications, from wireless handheld devices to PDAs, but designing these SOCs can be time-consuming and expensive. Platform-based design addresses these problems by reusing hardware and software components and organizing them for specific application domains.
Hsiao-Ping Lin
