2FEUP / DEEC Rua dos Bragas 4000 Porto -PORTUGAL test, and availability during field operation debugging [3,4,5]. During functional debug the golden vectors extracted from simulation are compared against the values captured on the prototype, thus covering the third class of errors. This process is frequency and generally involves a reduced number of vectors. The timing debug phase is done with the prototype working on its normal operating speed. Errors not detected during the structural test (due to the fault models used) or functional debug (due to the reduced clock frequency) have to be detected and diagnosed during this last verification step.
INTRODUCTION
Increasing complexity and quality demand on electronic products combined with shortening time-to-market is creating a bottleneck on the prototype validation phase. Any sound validation strategy must favour the overlapping of the several verification steps with the design flow and first prototype releases, to prevent delays and augment design celerity. Prototype validation is usually formed by the following steps: simulation, structural test, functional and timing debug. These typically address three classes of errors:
Human errors in the specification or design. Technological, implementation or manufacturing errors. These include defective components, soldering problems, broken or short lines, etc. Errors related to the tools. These include errors associated with the synthesis, model generation, simulation or layout (at the IC or PCB level) tools.
Simulation provides the first and best platform for detecting and debugging human errors in the prototype specification or design, although this process in itself is also prone to human / tools errors. The values obtained during simulation provide a database of golden vectors that can later be used for the prototype functional and timing debug phase [ 11. ATPG and Fault simulation are done to create the test program and a fault dictionary able to assist on the diagnosis of faults detected during structural test. Static timing analysis helps to determine the maximum clock frequency, by revealing the longest paths within the design. Pin-to-pin and other delay types are also calculated during this process. 
PROTOTYPE DEBUG AND TEST REQUIREMENTS
The initial phase of our approach included the identification of the prototype debug and test requirements and the conversion of this requirements into operations implemented by both mandatory and / or optional BST instructions, and / or instructions executed by the built-in controller. Requirements analysis covered characteristics of simulation and debug tools, current debug and test techniques, and debug and test mechanisms accessible through BST [ 1, 3, 7, 8, 9] . The analysis process led to a "simplified" debug and test model with five operation types:
Single
Step ( 
I-169
Next, a set of criteria was defined so as to allow an exhaustive dissolution of each operation type in a roll of individual operations. The following list presents the criteria considered for each operation type. Individual operations were obtained by examining minutely each criteria combination. The last phase consisted of analysing the individual operations included in each operation type and converting these into specifications of instructions implemented by the built-in controller or the BST infrastructure. The description of this rather extensive process would go beyond the scope of this paper. For the sake of simplicity and presentation clarity it was decided to omit it.
Due to the requirement of reusing the board-level BIST processor it was decided to design the built-in controller as a dual-processor architecture. One of the processors is responsible for the control of the test logic (the board-level BIST processor), while the other is responsible for the control of the system functional logic, and the synchronisation between the functional and the test logic.
A DUAL-PROCESSOR BUILT-IN CONTROLLER FOR DEBUG AND TEST
Selects the BST chain to be controlled by the following instructions. Forces an asynchronous reset through the /TRST output. Forces a state transition in the internal BST logic of each IC. N bits w i l l be shifted into the selected chain. Bits shifted out of the chain
Control of test logic
I are not compared.
I N bits will be shifted into the selected chain. Bits shifted out of the chain NSHFCP NCSHF and NCSHFCP are used to observe and verify the contents of scan chains without modifying the current value (the value captured on the scan output is placed at the scan input, resulting in a circular shift of the scan chain contents -number of cycles must match the scan chain extension). The values shifted out of the active chain are internally stored in a previously selected temporary buffer (using STh4PBx) and deserialized into 8-bit words placed on an external dual-port FIFO for outside observation. NSHFB2C and NSHFCPB2C are used to shift the selected temporary buffer contents into the active scan chain. These instructions enable the debug & test program to return the active scan chain to a former saved state. STCK enables the second processor to control TCK, through synchronism channel A, as during RT operations it is sometimes necessary to supply an unknown number of TCK cycles.
Control of functional logic
The instruction set of the processor controlling the functional logic is presented in table 2. A first group of instructions implement COV operations on directly accessible pins. A second group of instructions controls the system clock output for SS operations. A third group of instructions controls the system clock output for BP operations. CSTRETCH-N partially implements the cycle stretch technique [ 11, that consists of selectively stretching the clock-cycle length for isolated cycles prior to a detected failure. The theory is that when the cycle, where a long-path is exercised, is stretched then enough time will be allowed for the correct data to be captured / registered. To accomplish this, the process has to be run iteratively, with successive cycles stretched, to find when the subsequent external failure has indeed been eliminated. When it has, the current stretched cycle is the one that exercises the longpath. The last group of instructions controls synchronism channels and the internal resources. STORE C24 stores the contents of the internal 24-bit counter in an external dual-port LlFO. This counter is used for implementing the cycle stretch technique, so as when the time-related fault is no longer detected, its contents identify the exact cycle where the long-path is exercised.
Internal control and synchronisation STMPBO,
I Selects the internal 2048 x I bit temporary buffer 0 or 1 for storing the 
Detect condition
The goal is to activate a Condition Detected Output (CDO) pin when the comparison between the vector present at the BS register PIS and the vectors stored at the capture/shift and update stages results true, according to one of eight condition types: Equal to expected vector (vector compared through a mask) Different from expected vector (compare with mask)
Greater than limit A (vector > limit A)
Greatedequal to limit A (vector 2 limit A)
0
Lesser than limit A (vector z limit A)
Lessedequal to limit A (vector I limit A) Between limit A & B (limit A c vector c limit B) Outside limit A or B (vector c limit A or vector > limit B)
The optional instruction SEL-COND places a 3-bit test register between TDI-TDO, which selects the type of condition to be detected. The expected vector (or limit A) is stored in the update stage and the mask (or limit B) is stored in the capture/shift stage. To place the expected vector in the update stage it is necessary to shift the SamplePreload instruction and then shift in the expected vector. During Update-DR, the vector is stored in the update stage. To place the mask in the capturelshift stage it is necessary to shift in the optional instruction DET-COND and then shift in the mask. The values present in the BS register when DET-COND is active are not modified in Capture-DR or Update-DR states. At the end of the shift process the mask is stored in the capturekhift stage. Condition is evaluated while TAP controller is in RunTest/IdZe, where CDO exhibits the result. TCK has no effect on the evaluation process. To support this operation the BS cells have to be modified to the structure illustrated in fig. 1 . 
Store sequence after condition
The goal is to store a sequence of two contiguous vectors after a certain condition. The condition detection and the sequence storing correspond to the functionality defined in the previous optional instructions. To implement the optional STORE-AFTER-COND instruction, a dedicated FSM with states monitor condition, capture sequence I, capture sequence I I , and end of sequence, was added. The expected vector (or limit A) and the mask (or limit B) are entered the way defined for instruction DET-COND. The FSM is initially at monitor condition. Merging the BS cells illustrated in fig. 1 and fig. 2 allows the implementation of this optional instruction. Fig. 3 presents the time diagram of the STORE-AFTER-COND instruction. 
Store sequence until condition
The goal is to store a sequence of two contiguous vectors until a certain condition. Condition detection corresponds to detecting a logic '1' at CDI. fig. 2 is also able to implement this optional instruction.
IMPLEMENTATION
The built-in controller is implemented in an EPFlOK30. The MaxPlus I1 development system is able to generate a gate-level VHDL description of the design, thus enabling an easy transition to other development systems. The complete set of optional instructions is implemented, together with the mandatory BST infrastructure, in FPGAs emulating the '244 (an 8-bit non-inverting buffer) and the '373 (an 8-bit latch with tri-state outputs). The complete system, including two memories containing the programs for each processor, is now undergoing extensive functional and timing co-simulation. The test programs are initially written in assembly, and an in-house developed application generates the correspondent Memory Initialisation Files (MIFs). These files are read each time a new simulation is performed. The system-level model consists of the individual models of each component (memories, built-in controller, '244, and '373) interconnected for system-level co-simulation. The simulation tool accepts mixedlevel modelling, so each component model may either correspond to a behavioural or gate-level model.
CONCLUSION
A set of prototype debug and test requirements was initially identified and converted into five basic operation types forming a "simplified" debug and test model. Individual operations included in each operation type were obtained by considering all possible combinations of the gathered criteria. These were then analysed and converted into specifications of instructions implemented by the BST infrastructure or by a board-level built-in controller for debug and test. Mandatory / optional instructions described in the Std. were first considered, and a set of new optional instructions for debug support was then defined. These included: DET-CONDconcurrently detecting conditions in RT at the BS register (corresponding to values appearing at IC pins); STORE-SEQstoring sequences of two contiguous vectors at the BS register;
STORE-AFTER-COND -storing sequences of two contiguous vectors after a certain condition; STORE-UNTIL-COND -storing sequences of two contiguous vectors until a certain condition. Estimated overhead for the circuitry needed to implement the optional instructions is approx. 100% in relation to the mandatory BST infrastructure. This number suggests that for an IC where the mandatory BST infrastructure represents an overhead of 2-3%, implementing the optional instructions would raise this value to 4-6%. The built-in controller was implemented as a dual-processor architecture. One of the processors controls the board-level scan chains, while the other controls the system clock, thus guaranteeing synchronisation between system functional and test logic.
The proposed solution is now undergoing extensive system-level functional / timing co-simulation. Small and large programs are being run in the simulation environment, and a database of golden vectors is being extracted for later comparison with values captured during system normal functioning.
