In-line Test of Synthesised Systems Exploiting Latency Analysis by Williams, A.C. et al.
5
 The effect of these assumptions on the results and hypotheses thus formulated will be
discussed in the results section.
2. Latency analysis
Section 2.1 describes the principles of latency analysis and the obstacles that must be
overcome in order to obtain accurate results. The discussion describes latency analysis in the
context of the architecture generated by the MOODS synthesis system. This is followed by a
detailed description of the methods and algorithms implemented in an analysis tool used to
determine latency and testability figures.
2.1 Design latency analysis
The MOODS synthesis system generates a distributed architecture containing a
control unit which organises the flow of data through a data path. The control path consists of
a single bit register for each state joined together by arcs through which tokens are
conditionally passed, in a similar manner to that of Petri-nets. It comprises a combination of
parallel and conditional branches, nested loops and sub-program calls. Each state may
conditionally activate any number of functional units (adders, multipliers etc.) in the data
path, which are interconnected by nets directing data through the system.
An initial behavioural description is translated from VHDL[16] source (the primary
input language for the system) to an unoptimised register-level implementation whereby each
instruction occupies a single control state, and each data path unit executes one instruction,
the results of which are stored in registers at the end of each clock cycle. During optimisation
both the control and data path graphs are transformed so that the user’s objectives for
constraints, such as area, speed and power dissipation are more closely met. This results in
control states conditionally activating many data path units, each of which may be used to
implement a number of instructions.
The purpose of latency analysis is to calculate the distribution of inactive clock cycles
for a given data path unit. Armed with this information and the time required to perform a
complete test on the unit, a figure may be obtained describing its in-line testability. This
section explains the basis behind the various terms and figures used for latency analysis and
in-line testability.
Figure 2a shows a simple control graph comprising a sequential section of four states
linked by arcs. During execution control may exist in any one of the four states and follows14
clock speed. This improved performance is achieved by sacrificing area in order to increase
the component count and enable a different scheduling scheme.
There are two main mechanisms involved in the optimisation process: unit sharing
and graph compaction. Unit sharing takes two data path units, each activated by one control
state, and combines these into a single unit with multiplexed inputs activated by a number of
control states. This results in a reduction in area, but there may be a speed penalty to pay due
to increased complexity resulting from the use of input multiplexors. In addition, combining,
for example, a 16-bit and 32-bit adder into a shared 32-bit unit will significantly slow down
the smaller instruction and may affect the maximum clock speed permitted depending on the
schedule. Graph compaction utilises the slack time within a state, that is the difference
between the state delay and the clock period to combine two adjacent control states into one,
thereby shortening the control path.
Both of these mechanisms have a profound effect on the length and distribution of
dead vectors. Figure 4 shows histograms of the test completion probability for each data path
unit in the area and delay optimised implementations based upon the sample test sets. These
results serve to illustrate a number of important factors influencing the in-line testability of an
implementation. It should be noted that comparisons may only be drawn between the full set
of units of the same type since each unit implements a different subset of instructions. For
example, in the area optimised design (figures 3b and 4b) unit 29 implements all six of the
system’s subtract instructions, whereas the delay optimised version (figures 3a and 4a)
spreads these across two subtractors, units 29 and 30.
The units in each histogram split into two general groups: arithmetic units and
comparators. During optimisation none of the comparators will be shared since, for the
technology library used, the additional hardware cost involved in sharing a unit is greater than
the cost of a separate comparator. Comparing the results of the two optimisations it can be
seen that the comparators in the area optimised implementation have a generally higher test
completion probability than those in the delay optimised version. This is attributed to the
greater number of control states (and hence longer execution time) as illustrated in figure 3b.
In contrast, the arithmetic units in this implementation are much more heavily, thus the test
completion probability for these units is lower than in the delay optimised implementation.
The results for the two types of unit show that both increased sharing and decreased graph
length conspire to limit the in-line testability of a synthesised design.18
[5] V.D. Agrawal, C.R. Kime and K.K. Saluja, “A tutorial on built-in self test (I):
Principles”, IEEE Design and test of computers, 1993, DTC-10, no 1, pp 73-92.
[6] D. Gizopoulos, A. Paschalis and Y. Zorian, “An effective BIST scheme for
datapaths”, 1996, IEEE International test conference, pp 76-85.
[7] A.D. Brown, K.R. Baker and A.C. Williams, “On-Line Testing of Statically and
Dynamically Scheduled Synthesized Systems”, IEEE Transactions on Computer-Aided
Design, CAD-16, No. 1, January 1997, pp. 47-57.
[8] K.R. Baker, A.J. Currie and K.G. Nichols, “Multiple objective optimisation in a
behavioural synthesis system”, IEE Proceedings-G, 140, no. 4, August 1993, pp 253-260.
[9] K.R. Baker, A.D. Brown and A.J. Currie, “Optimisation Efficiency in Behavioural
Synthesis”, IEE Proceedings-G, 141, no. 5, October 1994, pp 399-406.
[10] A.C. Williams, “A Behavioural VHDL Synthesis System using Data Path
Optimisation”, PhD. Thesis, University of Southampton, October 1997.
[11] S. Dey and M. Potkonjak, “Nonscan design fro testability techniques using RT-level
design information”, IEEE Transactions on computer-aided design, CAD-16, no 12, pp 1488-
1506.
[12] I. Ghosh, A. Raghunathan and N.K. Jha, “Design for hierarchical testability of RTL
circuits obtained by behavioural synthesis”, IEEE Transactions on computer-aided design,
1997, CAD-16. no 9, pp 1001-1014.
[13] I. Ghosh, A. Raghunathan and N.K. Jha, “A design for testability technique for RTL
circuits using control/data flow extraction”, IEEE Transactions on computer-aided design,
1998, CAD-17. no 8, pp 706-723.
[14] G.L. Craig, C.R. Kime and K.K. Saluja, “Test scheduling and control for VLSI built-
in self-test”, IEEE Transactions on computers, 1988, C-37, no 9, pp 1099-1109.
[15] TransTest User Guide, Version 1.0, TransEDA Ltd., 1992.
[16] IEEE Standard VHDL Reference Manual, IEEE Std 1076-1987, IEEE Catalogue No.
SH11957, 1987.
[17] P.G. Harrison and N.M. Patel, “Performance Modelling of Communication Networks
and Computer Architectures”, Addison-Wesley, 1994, pp 81-163, ISBN 0-201-54419-9.
[18] D. Freedman, “Markov Chains”, Springer-Verlag, 1983, ISBN 0-387-90808-0.
[19] F.N. Najim, “Transition Density: A New Measure of Activity in Digital Circuits”,
IEEE Transactions on Computer-Aided Design, CAD-12, No. 2, February 1993, pp. 310-323.19
[20] D. Singh, J.A. Rabaey, M. Pedram, P. Catthoor, S. Rajgopal, N. Sehgal and T.J.
Mozdzen, “Power Conscious CAD Tools and Methodologies: A Perspective”, Proceedings of
the IEEE, 83, No. 4, April 1995, pp. 570-594.
[21] Microelectronics Centre of North California, “High-Level Synthesis Workshop
Benchmarks”, 1989, 1991.
[22] P.R. Panda and N. Dutt, “1995 High Level Synthesis Design Repository”, University
of California, Irvine, Technical Report #95-04, 7 February 1995.20
Table and figure captions
Table 1 Functional unit test sets
Figure 1 Design space
Figure 2 Example control graphs
Figure 3 Area and delay optimised FFT control graphs
Figure 4a Delay optimised FFT test completion probability per unit
Figure 4b Area optimised FFT test completion probability per unit
Figure 5 FFT subtractor dead vector probabilities
Figure 6 FFT subtractor test completion probability vs. test time
Figure 7 Area optimised FRISC test completion probability per unit, with and without
fully unshared adder
Figure 8 Area optimised FRISC test completion probability per unit optimised for in-
line testing21
DP unit type
(number) Bit width Number of test
sets
Fault coverage
(%)
Add (14) 32 22 98.4
Minus (15) 32
64
22
30
99.6
99.6
Rshift (32) 16 16 N/A
Comparators (20-
25)
1
16
32
3
18
35
100
100
100
Table 1 Functional unit test sets22
     
Quickest
structural
designs
Boundary ("hull")
of set of
realisable designs
“Best”  ?
structural
design
Smallest
structural
designs
Original
(user input)
design
(A)
(D)
(C)
(B)
Delay (ns,
clock cycles)
Area (mm
2 of
silicon, CLB
usage in an
FPGA)
Figure 1 Design space
1
28.6%
2
14.3%
3
14.3%
4
14.3%
5
28.6%
0.5 0.5
1.0
1.0
1.0
1.0
1
25.0%
2
25.0%
3
25.0%
4
25.0%
1.0
1.0
1.0
1.0
(a) (b)
Figure 2 Example control graphs23
2
8
16
26
30 73
100 37
44
48
58
77
88
90
93
97
98
2
8
16
26
30 74
100 37
44
45
48
77
88
90
93
97
98
68
49
55
58
60
61
66
a) Delay optimised FFT control graph b) Area optimised FFT control graph
Activates unit
N29 (minus)
Activates unit
N30 (minus)
Activates unit
N29 (minus)
Figure 3 Area and delay optimised FFT control graphs24
0
10
20
30
40
50
60
70
80
90
100
div (44)
lshift (63)
lshift (64)
lshift (66)
lshift (67)
lshift (84)
lshift (86)
lshift (90)
lshift (92)
minus (29)
minus (30)
mult (34)
mult (35)
mult (79)
ne (62)
plus (32)
plus (33)
Data path unit (DP node)
T
e
s
t
 
c
o
m
p
l
e
t
i
o
n
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
1 bit 32 bit 64 bit
Figure 4a Delay optimised FFT test completion probability per unit
0
10
20
30
40
50
60
70
80
90
100
div (34)
le (86)
lshift (63)
lshift (64)
lshift (66)
lshift (67)
lshift (84)
lshift (90)
lshift (92)
minus (29)
mult (30)
ne (62)
plus (32)
Data path unit (DP node)
T
e
s
t
 
c
o
m
p
l
e
t
i
o
n
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
1 bit 32 bit 64 bit
Figure 4b Area optimised FFT test completion probability per unit25
0
2
4
6
8
10
12
14
16
5 1 52 53 54 5
Dead vector length
D
e
a
d
 
v
e
c
t
o
r
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
Unit 29 - Delay Optimised Unit 30 - Delay Optimised Unit 29 - Area Optimised
Figure 5 FFT subtractor dead vector probabilities
0
10
20
30
40
50
60
70
80
90
100
5 1 52 53 54 5
Required test time (cycles)
T
e
s
t
 
c
o
m
p
l
e
t
i
o
n
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
Unit 29 -Delay Unit 30 - Delay Unit 29 - Area
Figure 6 FFT subtractor test completion probability vs. test time26
0
10
20
30
40
50
60
70
80
90
100
minus (16)
plus (14)
rshift (18)
minus (16)
plus (12)
plus (13)
plus (15)
plus (17)
plus (19)
plus (21)
plus (22)
plus (24)
plus (52)
plus (53)
plus (54)
plus (55)
plus (60)
plus (62)
plus (64)
plus (82)
plus (84)
plus (86)
rshift (18)
Data path unit (DP node)
T
e
s
t
 
c
o
m
p
l
e
t
i
o
n
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
16 bit adders 32 bit adders 16 bit 32 bit
Area
Optimised
Area Optimised - 
Adder Unshared
Figure 7 Area optimised FRISC test completion probability per unit, with and without
fully unshared adder
0
10
20
30
40
50
60
70
80
90
100
minus (13) minus (17) plus (12) plus (14) plus (15) rshift (16) rshift (18)
Data path unit (DP node)
T
e
s
t
 
c
o
m
p
l
e
t
i
o
n
 
p
r
o
b
a
b
i
l
i
t
y
 
(
%
)
16 bit 32 bit
Figure 8 Area optimised FRISC test completion probability per unit optimised for in-
line testing