True Coverage: A Goal of Verification by Gary Feierbach
True Coverage: A Goal of Verification
Gary Feierbach
Apple Computer, Inc.
feierbach@apple.com
Vijay Gupta
Apple Computer, Inc.
vgupta@apple.com
Abstract
There are a number of RTL  coverage  tools on the
market today that essentially tells you only that a set of
signals has been toggled by a particular diagnostic test.
This is useful in showing what areas of the RTL design
are definitely not covered by the diagnostic test but tells
you  very  little about the set of signals that have  been
toggled. In an extreme case a diagnostic test may fail to
fail when any one of these signals are in error. The
following is a strategy for examining the coverage
indicated by a commercial coverage  testing software
package  and obtaining a truer picture of a diagnostic
test's real coverage.  This  concept is extended to a full
regression test suite.
Keywords
Coverage,  design verification, VLSI, stuck faults, fault
simulation,  toggle  coverage,  true  coverage,  diagnostic
strategy.
1. Introduction
It is usual for a ASIC or VLSI verification group to
run a verification regression test suite through an RTL
simulation with a coverage tool before tape out to see if
their test suite has missed checking  some of the RTL
design. This has proved very useful in finding major gaps
in coverage but no one is under the illusion that there are
not other gaps that the coverage tools indicate as covered.
This is because  the  coverage  tools only check to see if
signals  have been toggled in the course of running a
diagnostic but the coverage tools have no way of knowing
whether a diagnostic would pass independent of the state
of a given signal. A way to test this would be to force
each "covered" signal to true one by one then false one by
one running the test over for each  case.  This  would, if
done  blindly,  consume enormous computational
resources. What follows is a strategy to minimize the
necessary resources making this a practical alternative.
We will term a signal one covered for each signal that
test fails on when forced to true. Similarly a signal will
be called zero covered if that signal forced to false causes
the diagnostic test to fail. If a signal is both one and zero
covered it will be termed one/zero covered.
1.1. Historic perspective
The idea of forcing a given signal true or false to see if
failures occur in a given test is not new and is routinely
done in hardware  emulators (e.g. Quickturn) and
simulation accelerators (e.g. Ikos).  It is generally  called
stuck fault testing or fault simulation and can be used for
a variety of purposes from producing stuck fault
syndromes to measuring the value of a diagnostic test. It
was not thought to be practical in normal  simulations
since the run time would be:
Tn = 2*N*Tdn where
Tn = the total time to explore the coverage of diagnostic
test n.
N = the number of signals in the RTL to be tested for
one/zero coverage
Tdn = the time to run diagnostic test n
To get an idea of the numbers involved suppose that
test n takes one minute to run and there are 5000 signals
in the RTL to be covered. Tn would then be 10000
minutes or over 160 hours. A typical regression suite for
a complex VLSI chip generally consists of from 500 to
1000 tests and some tests may run for several hours.
Clearly the numbers become intractable.
1.2. Exploiting hierarchy
Efforts have been made to exploit the hierarchy in a
design by using a bottom up approach  that  concentrates
on lowest levels of the design hierarchy  first and builds
toward the top [1]. Signals that are one/zero covered at a
lower level are then eliminated from consideration as the
next layer is considered. This is certainly a valid approach
and the results achieved  were  several  times  better  than
brute force. For reasons that should become clear, we feel
that the design hierarchy can be considered irrelevant and
results in orders of magnitude better can be achieved. It is
also the case  that to exploit the design hierarchy  tests
need to be written that are  customized to this approach.
Taking advantage of a suite of previously written tests is
not part of this strategy.
2. Methodology
The approach used is not bottom up but rather more of
a top down approach. It involves using already  existing
commercial coverage tools as a guide to which signals to
test for one, zero and one/zero coverage. The next step is
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0-7695-1881-8/03 $17.00 ￿ 2003 IEEE to automatically divide tests into groups that have the
least number of signals in common. A group is then
selected from this collection of groups based on having
the most distinctive signals and a diagnostic test is
selected from this group on the basis of a figure of merit
formula. This is done iteratively until no further  signals
can be eliminated as one and zero covered.
 Special tests can be constructed (or selected) for their
short run times and maximum coverage. These can be
used first to one and zero cover as many signals as
possible and therefore eliminate them from consideration
in longer tests which may be necessary to probe and test
signals that require a complex state. The Perl script
implemented to perform this was called  tcov and it was
also able to use a pool of machines (using another
commercial tool called LSF).
2.1. Using a coverage tool first
If we use a coverage tool first then we can reduce these
numbers significantly as the following formula indicates:
Tn = 2*Nn*Tdn where
Tn = the total time to explore the coverage of diagnostic
test n.
Nn= the number of signals in the RTL indicated as
covered by the coverage tool for diagnostic test n
Tdn = the time to run diagnostic test n
In this case the coverage tool
1 tells us that there is no
need to run other than the Nn signals through the one/zero
coverage  testing. If, say, a particular  test n covers 500
signals then the time is reduced to 1000 minutes or about
16 hours. This is a bit more tractable  but still far too
long.
The coverage  tool  indicates a signal is covered if it
cycles through at least one complete state change like 1-
>0 then 0->1 or 0->1 then 1->0 within the duration of the
diagnostic test run. 1->Z and Z->0 are considered to be 1-
>0 transitions. Likewise 0->Z and Z->1 are considered to
be 0->1 transitions.
The coverage results are saved in a common directory
for ease of subsequent processing.
2.2. Using multi-machine parallelism
The first obvious way to shorten calendar  time is to
use a pool of machines. Using a machine  farm
management tool (e.g. LSF) we can first run the coverage
tool on all of our diagnostic tests and compile covered
signal lists for each. Next we take a short test with fairly
broad  coverage  (as indicated by the coverage  list) and
compile the test with a parametrically  driven  forcing
function that can force any listed signal to either true or
false. We now have 2n tests we can  distribute to the
machine pool. If there are 100 machines in the pool and
all have  the  appropriate simulator runtime licenses then
                                                
1 The coverage tool used was CoverMeter by Synopsys but most any
coverage tool will do if one can gain access to the coverage database.
the elapsed time is just 10 to 15 minutes for this one test.
This hardly solves the overall problem since longer tests
will still take an inordinate amount of computation.
2.3. Binning tests
Dividing tests up into bins that are serviced in a round
robin fashion is a way to make sure that each subsequent
test run picks up a maximum number of new  signals.
This is achieved  with a script generate_diag_bin  that
selects  tests  for  grouping  based on the following
parameters:
Selection_lower_bound SLB (5%)
Selection_upper_bound SUB (10%)
Inclusion_lower_bound ILB (18%)
Inclusion_upper_bound IUB (60%).
A signal is first examined to see how many diags it is
covered by. If it is represented in at least SLB tests but
not more than SUB tests then it is considered a grouping
signal. If the tests that cover this signal have at least an
IUB percent overlap with an existing bin then that signal
and those tests are now represented by that bin. If these
criteria are met by more than one bin then these tests are
merged into all bins that meet the criteria. If the overlap
is less than ILB percent  then a new bin is created.
Everything that doesn't meet these criteria  goes into a
catchall  bin we call other. The process of running the
script that creates these bins only takes a few minutes so
these parameters can be manipulated to get the desired
results. The percentages shown are those we used to get
eight bins. We felt this was optimum for distributing our
suite of diagnostic tests with a minimum of overlap.
2.4. Regression reordering
Now that we have a set of bins containing tests they
can be serviced in a round robin manner but in order to
know which test in a bin to run next we need to compute
a figure of merit. This figure of merit needs to be based
on the number of additional signals not either one or zero
covered to this point and the run time for each test. The
run time can vary since other jobs can be running
simultaneously on a given Unix system so we have used
the number of simulation clocks the test takes to
complete instead. The figure of merit M for test n is then:
Mn = (Nnz+Nno)/Cn where
Mn = the  figure of merit (potential signal  coverage per
clock)
Nnz = the number of signals indicated by coverage and
not zero covered at this point.
Nno = the number of signals indicated by coverage and
not one covered at this point.
Cn = the number of simulation clocks to complete (pass)
test n.
After  each  test is run through the one/zero  coverage
testing the additional one and zero covered signals are
added to the coverage  list.  The selected  test is deleted
from all bins. The figures of merit for the remaining testsin the next bin are recomputed to select the next test to be
run. An empty bin is skipped. This is repeated  until
either the regression test list is exhausted or there are no
more signals indicated as covered  that  have  not  been
one/zero covered (i.e. all remaining Mnz and Mno=0).
If we find a signal is one/zero covered by a prior test
then it can be eliminated from further  testing since we
have it "covered". This allows us to totally ignore
overlapping coverage as we progress  through a series of
tests.
An additional saving can be accomplished by not
repeating the same testing of signals that are  either one
covered or zero covered. It should be sufficient if one test
finds a signal one covered  and another test finds the
signal  zero covered then the signal should be declared
one/zero covered. The logic here is that these tests, if
combined, would have indicated the signal to be one/zero
covered.
2.5. Using specially structured tests
One can add tests to the mix that are  specifically
structured to cover a large amount of territory (i.e. have
large coverage lists) but do so in a minimum number of
cycles. Such tests should run very fast and hopefully that
coverage  will hold up during the one/zero  coverage
testing. In one particular  design we created a test with
85% coverage as measured by standard  coverage  testing
and the test runs for about 7 minutes. In our case
Nn=3173 out of a total of 3721 signals (85%) giving us a
total run time over a 100  machine  farm of about 8.5
hours. This led to our first surprise that only 1183 signals
(32% of the total) were  one/zero  covered!  This  indicates
that conventional coverage  results  can be highly
optimistic and very misleading.
It should be a goal to build a few tests that will test
(and  therefore  eliminate  from future testing signals that
the  coverage  tools  indicate  are in nearly every test.
Examples of these are signals triggered by reset, busses,
unit activity indicators and performance counters.
With a little care a series of tests can be designed
which are very short that can cover the vast majority of
the RTL. If these are added to the regression suite  they
should have a high figure of merit and get run early. The
remaining longer tests, which will likely have a low
figure of merit and get run later,  may only need to be run
a few times since they will be toggling very few signals
that  are  disjoint with the set of prior one/zero  covered
signals.
2.6. Common signals
There  will be a number of signals that get toggled
during  coverage  that most tests in question are not
concerned with. This is especially true of bus signals that
go to a unit that most tests don't address, performance
counters, etc. If these are  not  taken  care of initially in
specially structured tests then it generally is a good  idea
to  eliminate them from the figure of merit calculation.
This can be accomplished by eliminating signals that are
common to most tests for the figure of merit calculation
(but not from the actual testing.) If this is not done the
figure of merit can be substantially biased toward short
tests that test very little.
2.7. Rebinning
After each test is run many signals and that test have
been eliminated so it makes sense to re-compute the bins
with this reduced set of signals and tests. It was decided
to  make  this  automatic and part of the main coverage
regression script tcov. The method used to attempt to get
a spread of about 8 bins was simulated annealing. That is,
each parameter (SLB, SUB, ILB and IUB, was
incremented  and  decremented in the script
generate_diag_bin each time keeping the results that were
closest to 8 bins. This process is repeated until  either 8
bins are achieved or a pass through the parameters made
no improvement.
After another test is selected  from  the bin with the
largest set of disjoint signals and put through one/zero
testing the rebinning process is repeated and so on until
there are either no more signals or tests.
2.8. Some results
Table 1 shows the beginning of a run using the
described techniques. Note that the specially constructed
diagnostic test Coverall got picked early on in the run
indicating that our selection criteria  was working
reasonably well in this instance. It is noteworthy that we
were  able to attain nearly 60% one/zero  coverage  with
only 10 automatically selected tests in just over 25 hours.
Using one or more specially constructed tests can give
one a significant head start in obtaining 100% coverage as
illustrated above and in Table 1. As previously stated, the
objective is to remove as many signals from consideration
as possible so that we might be able to run a full
regression  through  this  procedure. In particular, signals
with high commonality among tests should be targeted
early on since eliminating them will ultimately save the
most time.
2.9. Regression grouping
In a short while this strategy will fail to keep  all the
machines in the pool busy since the number of new
signals left to one/zero cover will be less than the pool
size. This need not be the case if one or more tests in the
still to be run group have a mutually disjoint set of
signals left to cover. One can automatically pick a group
of tests with a reasonably disjoint set of signals with the
highest figure of merit to run simultaneously. The worst
case is that one may redundantly one and/or zero cover a
few signals but that isn't really a problem.Table 1. Cumulative coverage
SIGNAL COVERAGE – TOTAL NETS=3721
Test Name CoverMeter
Add. covered
1 Covered
0 Covered
one/zero
covered
Sum 1/0
covered
 Run Time
100 CPUs
cr_mult_write 679 421/473 373 373/10.02% 0.6 hrs
Coverall 2829 1017/1038 872 1245/33.46% 5.0 hrs
vlut_rand 1998 194/221 130 1375/36.95% 0.5 hrs
vvld_ENF 141 18/49 3 1378/37.03% 0.7 hrs
vvld_ADBS 1491 32/71 18 1396/37.52% 1.7 hrs
vart_coherbit 137 50/81 36 1432/38.48% 0.2 hrs
random_ist 1867 97/117 37 1469/39.48% 8.2 hrs
vvld_addr2ex 1631 651/696 631 2100/56.34% 14.4 hrs
macroblks5 1006 96/131 64 2164/58.16% 2.2 hrs
venum 86 9/35 1 2165/58.18% 0.1 hrs
It  is  very  likely that all of the signals that can be
one/zero  covered by a given  regression  suite will be
covered before completely exhausting the regression suite.
This doesn't necessarily mean that those remaining tests
can be thrown away. It is possible that one might want to
keep the test because it is a specific sub-unit test that may
be run independently as part of a partial regression suite
whenever a change is made to that sub-unit. Such a test
may also be checking for signal  interactions or timing
that one/zero coverage doesn't address. In any case, care
should be taken before eliminating tests from a regression
suite.
3. Caveats
A problem arose testing vector registers. We are only
able to force an entire vector into one state or another
(Verilog IEEE 1364-1995 Section 9.3.2 one cannot force
a  particular  bit of vector  register.) We haven't tried to
refine this as of this date. This is a considerable lapse for
data path logic.
Interactions  between  signals  are  only minimally
checked through one/zero coverage testing. Neither timing
nor protocol violations are directly addressed by one/zero
coverage.  In these  areas we rely on the skill of the
diagnostic programmer  and a comprehensive test  review
process.
We have  also found that there  are a few signal that
when forced to one or zero can cause either a false pass or
a  timeout. An example of a false pass is a signal that
indicates that no more instructions are left in the queue to
execute. A timeout  can be caused by a cache  control
signal forced to indicate that whatever  line is fetched is
never in the cache causing an infinite fetch loop. There are
not likely to be many of such signals so they can be
handled on a case by case basis.
4. Conclusion
The task of finding the one/zero  coverage of a full
regression suite of tests if done in a brute force manner
could easily take months if not years of calendar time,
making it impractical. Using the simple methodology
above an entire regression suite can be one/zero coverage
tested in just a few  days making it practical for VLSI
development. In our case a regression  suite of tests had
already been created but this need not be the case. Tcov
can be used as an optimization tool to get the most out of
a new test design just as we did with test "coverall"
before adding it to the regression suite.
This methodology, embodied in tcov, proved to be a
very useful tool for not only getting a better handle on
our actual coverage but to also check to see if tests were
really working as specified. We have been surprised by
the lack of real  coverage  indicated by tcov for a given
diagnostic test versus the coverage  indicated by
commercial coverage tools. This may be a property of the
design  under consideration but it should serve as a
warning that standard coverage tools are far from adequate
in painting a real picture of test coverage.
As with any coverage  tool tcov doesn't check  that
which isn't present. One still must rely on monitors and
the ability of the diagnostic writer to ferret  out missing
functionality. In this respect it is also prudent to have an
independent verification engineer and not use a logic
designer to verify his own design. If the logic designer
misunderstands the design specification then tests could
end up being written with that same misunderstanding.
5. Reference
[1] Bernhard H. Seiss and Hannes C. Wittmann, "Highly Efficient Fault
Simulation Exploiting Hierarchy in Circuit Description",  Asian  Test
Symposium 1992