Abstracf-Most digital systems at some time during use have areas (modules) that are "dead" in the sense that they do not contain valid data, i.e., the data that was processed or generated by that area has been passed on to a subsequent stage and will not be required (read) again. In a synthesized system, where the flow of data is determined explicitly by an on-chip (synthesized) controller, the question of which arras will be dead or not (and when) is known in advance. There are areas and times when the "use" is data-dependent, but then the use is known to the controller at that time. This deadtime can be exploited to run a test pattern (either complete or in part) through the unused area, thereby giving the ability to continuously monitor the "health" of the overall system with very little (sometimes zero) impact on the processing capability. This has obvious applications in situations where reliability is a concern. There exist systems where an area is so heavily used that it is impossible to perform any testing at a serious rate; in this case the area may either be partially tested (or tested at a lower rate) or the processing of "real" data periodically halted to allow a more thorough test to take place with concomitant throughput degradation. This paper describes a behavioral synthesis system that can detect and exploit dead areas for automatic testing. Pertinent aspects of the controller are described, and a number of dead area statistics (including "test throughput") generated from real designs are reported.
aspects of the controller are described, and a number of dead area statistics (including "test throughput") generated from real designs are reported.
Index Terms-Synthesis, test.
I. INTR~DU~I~N
T HE COMPLEXITY increase seen by ASICS over the last few years iscomparable to that experienced by very large scale integration (VLSI) a decade ago. This increase is not only due to advances in processing technology but also to the use of DA tools enabling predefined cells to be "glued" together. The use of synthesis tools has brought an added dimension to the complexity of digital systems: the internal architecture of synthesized circuits is largely opaque to the designer.
The testing of these increasingly complex systems is a vital part of the design process, and issues of testability require consideration at all levels. Techniques such as hierarchical testing, partial scan paths, and the analysis of high-level descriptions to determine hard to detect (HTD) faults and their correction through test statement insertion can help to enhance the testability of designs early in the design process [I] , (21. Other relevant work in the area incorporates scan and partial scan paths into the overall penalty function calculations [3] , efficient transforms for the inclusion of area-efficient scan paths [4), [5] , and speed-efficient testing regimes [6] .
For critical systems, the testing problem becomes an even greater design issue. To ensure their continual successful operation, critical systems require frequent testing and/or the use of redundancy at the component or system level. The testing of a system conventionally requires part or all of the system to be temporarily taken out of service while a built-in self-test (BIST) or external test procedure takes place. Even though methods such as adding deflection operations to the control and dataflow graph to reduce scan paths [4] improve out-of-service testing, this approach is usually unsatisfactory where the system downtime can jeopardize the function of the system.
The approach described in this paper utilizes the idle time of units in a synthesized synchronous system in a way that enables individual units to be tested while the system remains in operation. This approach provides an almost continuous (discrete, granular) indication of the condition of the SYS- tern without the overhead of excessive redundancy or the inconvenience of taking the system out of service. The purpose of this paper is to describe and evaluate only the testing infrustrucrure. The nature and derivation of the test vectors and coverage of the tests is beyond the scope of this paper, but it should be noted in passing that a high stuckat-fault coverage does not necessarily imply a corresponding defect coverage. The techniques described here are not limited to tests based on the stuck-at model. They apply equally well to functional and exhaustive tests.
Fault rolerunt synthesis [7] , [S] introduces orthogonal complexity into the issue and is not considered here. silicon area and speed. The optimization is performed by applying a series of local reversible transformations (similar to those used in other systems such as CAMAD [I I]) to the control and datapath graphs under the general guidance of a simulated annealing algorithm' [ 121, [13] . The design space has as many dimensions as the user optimization criteria; for example, area, speed, power dissipation, and "testability." The space, defined by the user-supplied behavioral description, is discrete, degenerate, and highly irregular. Steepest descent and related algorithms perform extremely badly on problems of this nature, notwithstanding the observation that the concept of adjacency in a discrete space is poorly defined. Details of the algorithm, cooling schedule, and termination criteria may be found in [9] and [IO] .
II. THE SYNTHESIS SYSTEM
The synthesis system, on which this project is based, is described in detail elsewhere 191. 1101 . It is a hiah-level ' Recently. an optimizalion technique using "nndomized branch and bound behavioral synthesis system that optimizes a design with steepest descem" has been nrpned 131. This may be considered similar 10 a simulawd annealing algorilhm with a position-sensitive bias on the random respect to a combination of user specified constraints on -_ number generator. The input design is specified at a behavioral level using VHDL, and the design is implemented as a datapath plus controller architecture which is output as a netlist of parameterizable cells in structural VHDL. A variety of implementations can be generated from a single design specification by varying the target optimization criteria.
III. DEAD AREA ANALYSIS
Accepting that a certain temporal "slack" may now be acceptable in a synthesized design to allow time for testing somewhat warps the penalty function values associated with each optimization transform. For example, without testing, in a design having two adders (Al, A2) only used in separate temporal intervals, it makes sense to multiplex access to a single physical adder [Al = A2). The overhead is simply the cost of the multiplexers. Including the dimension of testability into the design space complicates the situation. as using two physical adders means that the first can more easily be tested in its idle or dead time while the second is in use. Note that this is not simply device redundancy. The overhead is now the extra adder, but the multiplexer has gone, and the test objective is more easily met.
An Example
Consider the rather contrived example of Fig. I : the design takes data input from three registers a, 6, and c and produces the roots to the quadratic equation ax2 + bx + c = 0, with the results appearing in registers b and c. The figure shows a possible dataflow implementation of the system at some early point in the synthesis process. The vertical axis indicates processor slots which are not in general of equal length and each column in this example is a common register element.
3.1.1 The Dead Area Profile: From the dataflow graph of Fig. 1 , we can trivially construct the area acriviry intervals: Ml; M2; S; M3:(0-2):
:\14:(2-4); S2:(4-5); &1:(5-g); Al; S3: (9-10); Dl; D2: (10-12). These simply show the times at which various modules in the design are in use. We can then construct the dead area projile for all the operators in the design. This may be likened to the inverse of the lifetime of a variable, but applied to operators rather than variables. Assuming that all the operator instances in Fig. 1 are mapped onto different physical devices, we get the profiles shown in Fig. 2 . These show the distribution of time in which each particular device is dead (idle) and can therefore be tested. For example, Ql is unused for slots A. B, and C (i.e., the first 2 + 2 + 1 clock cycles) and slots E and F (i.e., the last 1 + 2 clock cycles). The profile, therefore, shows one dead area of five cycles in length and one of three. If we associate with each type of operator a resfing threshold, To. the number of clock cycles required to test the operator, we are in a position to assess how well this particular design meets the "testability" criteria set by the user. Looking at the profiles of Fig. 2 , we see that every instance of every operator has at least one dead area above the appropriate threshold, and so the design may be considered completely testable in the sense that we have the rime to test each operator at least once inside the normal execution of the synthesized design. Applying a "'conventional" transformation allows us to see how this new dimension interacts with the existing two.
If we move operator M3 to slot B. we can map (Sl. S2, and S3), (Ml and M4). and (M2 and M3) onto the same physical operators: (Sl E 52 5 S3], [Ml E M4] , and [M2 z M3], respectively. The profiles change as a consequence of this transformation.
The maximum of [Sl f S2 -S3] now falls below the T(S) threshold, and so there are insufficient contiguous clock cycles to test the operator during the normal course of operation of the design. This, of course, does not necessarily cause rejection of the design. It may be possible to perform a number of partial subtests or to increase the overall execution costs in order to get the test in, or indeed, to omit the test altogether. if the "testing" criteria is weaker than other constraints. Note that [Ml E M4) and [M2 E M3] still have profile components above T(M).
Design Staristics
This section contains a discussion of the dead area analysis of a number of real circuits. Complexity precludes schematics, but the salient details of each circuit are given in Table I . During each clock cycle, the controller activates a set of datapath units, thereby implementing instructions in the original high-level description. At the end of each clock cycle, the instruction results are loaded into registers. The "register load" instructions give the data from which the necessary statistics and results are derived.
3.2.1 Deterministic Dataflow: Circuit A contains 21 instructions and operates as either a complex number divider or multiplier depending on an external signal. The datapath is thus configurable. For either operation a subset of the datapath units will be active. Fig. 3 shows the datapath of circuit A, optimized for speed, with no area constraints. The controller path length is four cycles, and the corresponding dead area profiles for each unit type are obvious from the figure. The profile for,each unit changes depending on whether the circuit is configured for multiplying or dividing. The profiles for the multiply and divide unit types (these may be trivially derived from Fig. 3 ) depict more than one physical unit where individual dead areas are accumulated.
Ideally the profile for each individual unit should be separately shown. However, in this case the deadtimes for similar unit types are equivalent. Fig. 4 represents the same design, where the design has been optimized for area. For each operator type, the units of Fig. 3 have been merged into single units to conserve area. Here, to accommodate unit sharing the controller path length has increased to seven cycles.
Each unit may have one or more dead areas. In order for a unit to be completely tested, the longest dead area is the most significant. The longest dead area profile is constructed using only the longest deadtime for each physical unit. Although the cumulative profiles lose some information about individual units, they give an overall indication for the testable state of the design and are useful for illustration within this paper.
To test the 16-bit adder used in circuit . that the application of each test vector takes one clock cycle, 16 dead cycles are required for the adder. The dead area profile shows there is insufficient deadtime to completely test a unit in one dead area or even one controller cycle. Within the framework outlined, therefore, circuit A cannot be tested properly without incurring a speed penalty, which is the extra cycles required to complete the test.
Nondeterministic (Probabilistic) Dataflow:
We have so far looked only at dataflow graphs where activation of units is deterministic. However, in the majority of synthesized designs, a significant fraction of the dead area profiles will be data-dependent, a consequence of conditionals in the highlevel description. This leads to circuits containing units that do not have clearly defined activities for nondeterministic operations. Without knowledge of the exact data that will be passing through the system, we cannot precisely calculate the dead area times. However, we can make some assumptions and derive best/worst case times and corresponding profiles between which the actual profiles will fall.
The worst case dead area times are calculated assuming that a unit always executes all of its assigned instructions, whereas the best case dead area times are calculated by assuming that only one of a units probabilistically executed instructions is executed each controller cycle. This results in II + 1 different dead area profiles where n is the number of possible instructions: one for each probabilistically executed instruction, and one where none are executed. For example, the adder (N20) of Fig. 4 has a worst case dead area of two where each add operation is assumed to be executed. The best case dead area times where each instruction is assumed probabilistic are two, and four for the first instruction [is], four and two for the second instruction [i8], six for the third [i19], and seven for none. The best and worst dead area times for all units of circuit A, their complete and longest dead area profiles, and' combined best/worst profiles are shown in Fig. 5 for the area optimized design of Fig. 4 . Comparing the actual dead area profiles corresponding to Fig. 4 with the best and worst case profiles of Fig. 5 illustrates that the real profiles lie between, and are similar to, the combined best/worst profiles.
3.2.3 Further Statistics: The testability situation eases with increasing design size and varies with design data or control domain dominance. For a 32-bit subtracter within circuit B. 
IV. THE CONTROLLER SUBSYSTEM
If the design is small or highly data-domain dominant, and complete testing is required with no timing impact, then some units must be tested over a number of dead areas and controller Once the test requirements are defined in terms of dataflow, the testing functional units can of course be optimized alongside the rest of the circuit. The optimizer will never compromise the tests per se. as the result of the tests is always required by the output of the system. This is a complex area and will be covered in detail in a later paper. 
