High quality functional coverage based trace signal selection for post-silicon validation by Ma, Sai
HIGH QUALITY FUNCTIONAL COVERAGE BASED TRACE SIGNAL
SELECTION FOR POST-SILICON VALIDATION
BY
SAI MA
THESIS
Submitted in partial fulfillment of the requirements
for the degree of Master of Science in Electrical and Computer Engineering
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2014
Urbana, Illinois
Adviser:
Assistant Professor Shobha Vasudevan
ABSTRACT
Due to the drastic growth of design complexity and also shrinkage of the
time-to-market window, it is always impossible to capture all bugs during
the pre-silicon verification phase; therefore, a small number of bugs escape
after a chip is manufactured. Once a chip is manufactured, it is considered
as a black box since internal observability is lost. To increase observability
for post-silicon validation, an effective silicon debug technique is to use an
on-chip trace buffer to monitor and capture the circuit response of certain
selected signals during its post-silicon operation. Since this aforementioned
debugging trace buffer introduces area overhead, the amount of signals se-
lected to be stored on this trace buffer is very limited. Therefore, a major
challenge in this field is to select a powerful subset of internal signals to recon-
struct the majority of the remaining signal values. Existing methods use a
greedy selection process to converge to a locally optimal selection; this kind
of method suffers from severe diminishing restoration ratio effect as more
trace signals are selected. In addition, all of the previous publications have
been focused on increasing restorability; none of them has ever been able to
interpret the trace signals as high level meaningful debugging information.
In this thesis, we formulate the trace signal selection problem into a data-
mining problem and propose two approaches using PageRank and HITS al-
gorithm. Our experimental results demonstrate that our algorithm can ef-
fectively alleviate the diminishing restoration ratio effect. Furthermore, we
propose a new metric to evaluate the quality of selected trace signals instead
of restorability, which is the number of functionalities they cover when debug-
ging; this is an angle previous publications have not addressed. According
to our experimental results, the two new algorithms proposed in this thesis
select better and more meaningful trace signals in terms of debugging.
ii
To my family and friends, for their love and support
iii
ACKNOWLEDGMENTS
I am using this opportunity to express my gratitude to everyone who sup-
ported me throughout the course of my master’s study. I am thankful for
their inspiring guidance and friendly advice during the project. I am sin-
cerely grateful to them for sharing their truthful and illuminating views on
a number of issues related to my study, my research and my thesis.
Foremost, I would like to express my sincere gratitude to my advisor, Pro-
fessor Shobha Vasudevan, for the continuous support of my graduate study
and research. She trusted me from the very beginning of my graduate study
and never stopped trusting me until the day I graduated. She established
my confidence by sending me to the BaseBand ESL R&D department of
FutureWei Technologies to implement an intelligent debugging and anal-
ysis system. She demonstrated extreme professionalism and perseverance
through the process of publishing the paper “Code Coverage of Assertions
Using RTL Source Code Analysis,” which won the best paper award of the
2014 Design Automation Conference. She has guided me to overcome numer-
ous difficulties encountered during the research and taught me a lot about
both academia and life.
I would like to particularly express my appreciation to Debjit Pal for count-
less inspiring discussions and patient tutoring of EDA tools and hardware
verification concepts, Rui Jiang for implementing the evaluation platform for
my thesis and constantly helping me with debugging and project evolving,
Seyed Nematollah Ahmadyan for invaluable constructive criticism and sug-
gestions of the direction of my project, Matthew Curtis Amrein for sharing
experience and research ideas about post-silicon validation, and Tian Xia for
discussion and manual evaluation in the early stage of my thesis.
I would also like to express my heartfelt thankfulness to my wonderful col-
leagues, mentors and friends, who supported me throughout ups and downs
and shared with me their wisdom and knowledge.
iv
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
1.1 Pre-silicon Verification vs. Post-silicon Validation . . . . . . . 1
1.2 Introduction to Post-silicon Validation . . . . . . . . . . . . . 3
1.3 Scan Chain and Trace Buffer . . . . . . . . . . . . . . . . . . . 6
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
CHAPTER 2 PRELIMINARIES . . . . . . . . . . . . . . . . . . . . 13
2.1 Signal Restoration . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Probability Based Trace Signal Selection Algorithm Using
Restorability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Problems of Probability Based Trace Signal Greedy Selec-
tion Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 19
CHAPTER 3 TRACE SIGNAL SELECTION USING HITS AL-
GORITHM WITH RESPECT TO RESTORABILITY . . . . . . . 21
3.1 Network Construction . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Restoration Probability Computation . . . . . . . . . . . . . . 22
3.3 Authority and Hub Score Calculation . . . . . . . . . . . . . . 27
3.4 Post-Processing of Analysis of Authority and Hub Score . . . 28
CHAPTER 4 PAGERANK ALGORITHM WITHOUT
RESTORATION PROBABILITY . . . . . . . . . . . . . . . . . . . 32
4.1 Restoration Probability vs. Functionality . . . . . . . . . . . . 32
4.2 Functionality Coverage Definition . . . . . . . . . . . . . . . . 33
4.3 PageRank Algorithm . . . . . . . . . . . . . . . . . . . . . . . 35
CHAPTER 5 EXPERIMENTAL RESULTS . . . . . . . . . . . . . . 43
5.1 Signal Restoration Ratio Comparison . . . . . . . . . . . . . . 43
5.2 Functionality Coverage Comparison . . . . . . . . . . . . . . . 45
CHAPTER 6 SUMMARY AND FUTURE WORK . . . . . . . . . . 68
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
v
CHAPTER 1
INTRODUCTION
1.1 Pre-silicon Verification vs. Post-silicon Validation
Over the past four decades, microprocessors have permeated our world, ush-
ering in the digital age and enabling numerous technologies, without which
today’s lifestyle would be all but impossible. Nowadays, the everlasting
progress of processors brings extremely complex architectures along with
inevitable challenges in verifying their functionality. According to recent re-
search, the verification effort takes more than 70% of the design cycle and has
become the single most challenging bottleneck of current industry [1]. The
current hardware verification process can be divided into two major phases:
pre-silicon verification and post-silicon validation. Pre-silicon techniques are
deployed in the early stage of a processor’s design. During the pre-silicon
phase, simulation and formal verification are applied to capture functional
errors. Therefore, at this stage, verification engineers have full observability
of the design under verification (DUV), but the number of cycles in which
the DUV can be simulated is several orders of magnitude smaller than actual
hardware. Therefore, untested cases and corner cases introduce bugs into the
post-silicon phase, in which a manufactured chip only has standard inputs
and outputs. Post-silicon validation involves operating one or more manufac-
tured chips in actual application environments to validate correct behaviors
over specified operating conditions. According to several industry reports,
post-silicon validation is becoming significantly difficult and prohibitively ex-
pensive because existing techniques cannot cope with the sheer complexity of
future systems [2, 3, 4]. At this stage, verification engineers have no internal
observability of the DUV, but can observe the actual hardware operation
speed of the DUV. The key advantage of post-silicon validation is there-
fore its significantly faster verification speed than pre-silicon software-based
1
simulation, thus it could provide many guarantees of correctness and ensure
that no bugs escape to the field. However, post-silicon validation suffers
from low observability since it is impossible to monitor internal signals once
a prototype is implemented. Therefore, errors can only be detected when
they generate invalid results, or cause the system to hang. As a result, the
exposure and diagnosis of bugs are difficult.
Post-silicon validation has significant overlap with pre-silicon design verifi-
cation and manufacturing (or production) testing. Traditionally, most hard-
ware design bugs are detected during pre-silicon verification, and manufactur-
ing defects are targeted by manufacturing testing. While both manufacturing
testing and pre-silicon verification continue to be essential, post-silicon val-
idation is becoming extremely important because of several unique aspects
[5, 6, 7].
1. Pre-silicon alone cannot be replied upon to capture all bugs since sim-
ulation is several orders of magnitude slower that actual hardware.
Popular pre-silicon verification techniques such as constrained random
simulation and formal verification all suffer from scalability; therefore,
they are infeasible for full-chip verification.
2. As circuits become more and more complex and hard to correctly
model, electrical bugs, such as signal integrity (cross-talk, power supply
noise) and thermal effects, can only be detected once a chip is manufac-
tured; therefore, it is impossible to test before post-silicon validation.
3. Unlike manufacturing defects, post-silicon bugs may be caused by sub-
tle interactions between a design and physical effects (the so-called
electrical bugs) or by design errors (the so-called logic bugs). It may
be very difficult to create accurate and effective fault models for such
bugs.
4. A primary reason behind the success of manufacturing testing tech-
niques is the existence of test metrics such as single-stuck-at cover-
age, transition fault coverage and N-detect coverage, and experimental
demonstration of the effectiveness of such metrics using actual chips.
Such metrics enable automatic test pattern generation and fault sim-
ulation. For pre-silicon design verification, such metrics are far less
2
standardized. However, there exist new opportunities for establishing
coverage metrics for post-silicon validation.
5. Unlike manufacturing testing where the primary objective is to detect
defects, post-silicon validation involves localizing, root-causing and fix-
ing bugs. Most defect diagnosis techniques rely on scan design for
testability (DFT), which enables a sequential circuit to be treated as
a combinational circuit in test mode. Such opportunities may not be
available for bug localization during post-silicon validation.
1.2 Introduction to Post-silicon Validation
Post-silicon validation comprises four major steps [5]; it can be summarized
by the flow illustrated in Fig 1.1:
1. Detecting a problem by running test programs ranging from random
instruction sequences to end-user applications.
2. Localizing the problem to a small area from the system failure.
3. Identifying the root cause of the problem.
4. Fixing or bypassing the problem by patching, circuit editing or as a
last resort, re-spinning using a new mask.
Numerous challenges exist in every aspect of post-silicon validation. In-
stead of a comprehensive survey of all challenges, we highlight a few most
important ones as follows:
1. Test pattern generation: In order to check a circuit for functional cor-
rectness, it is necessary to verify the internal signal states of the circuit
with some golden reference. While tests can be run at-speed on the
hardware, test generation and simulation constitute the bottleneck in
this process, limiting it to the performance level of pre-silicon sim-
ulation. Consequently, design houses are forced to spend enormous
computational resources on test generation and simulation servers [8].
2. Reliance on system-level simulation: System-level simulation is several
orders of magnitude slower than actual silicon (e.g., 1,000 cycles / sec-
ond in simulation vs. 1 billion cycles / second for a 1GHz chip) [6].
3
Figure 1.1: General flow of post-silicon validation
In order to obtain a golden reference design, system-level simulation is
required to achieve correct signal values of every cycle for the entire
design. This simulation is very slow; therefore, a functional bug typi-
cally takes hours to days to be localized vs. electrical bugs that require
days to weeks.
3. Failure reproduction: When a bug is detected, we need to restore the
circuit to bug-free state and then re-simulate the circuit with error-
causing stimuli. Unfortunately, many bugs, such as electrical bugs
[9, 10], bugs in complex SoCs with multiple clock domains and asyn-
chronous I/Os, are very hard to reproduce [11, 12].
4. Coverage metrics: Code coverage analysis is already a very hard do-
main of pre-silicon verification. Quantifying coverage of post-silicon
validation tests is very challenging due to limited controllability and
observability; hence, it is harder to mutate a design or monitor an
assertion failure [13, 14].
5. Observability enhancement: A major challenge in post-silicon valida-
tion is the limited observability of internal states caused by the limited
storage capacity available for post-silicon validation. For one to be
4
able to gather as much data as possible from the DUV in order to
understand the nature of the error, design-for-debug (DFD) hardware
is commonly inserted into the design. Scan chains and trace buffers
are the two most commonly used DFD techniques. However, as the
hardware and software interactions between different blocks in an SOC
are difficult to be verified during pre-silicon verification, it is expected
that more DFD hardware will be deployed in the future. In order to
reduce the implementation cycle, while at the same time avoiding an
excessive overhead introduced by the DFD hardware, it is becoming a
significant challenge to reach the suitable decisions for DFD insertion
[8][15-35].
6. Error detection: Long error detection latency, the time elapsed be-
tween the occurrence of an error due to a bug and its manifestation as
an observable failure, limits the effectiveness of existing bug localiza-
tion techniques. Simulation is orders of magnitude slower than actual
silicon [36]; formal analysis over more than hundreds of cycles can be
difficult [37]; and tracing is limited by the availability of on-chip stor-
age [2]. In addition, long error detection latencies may also result in
increased error masking, i.e., an error may not propagate to an observ-
able point [38, 39, 40]. Bugs in uncore components of SoCs, such as
cache controllers, memory controllers and on-chip networks, can result
in very long error detection latencies of several millions to billions of
clock cycles unless special attention is paid to shorten these long error
detection latencies [41].
7. Bug localization: The process of identifying the location of a detected
hardware bug and the cycle(s) during which the bug produces error(s),
is a major bottleneck for complex integrated circuits. Among the four
major steps we mentioned in this chapter, bug localization dominates
the cost [10]. Many post-silicon bug localization techniques can be
used, such as IFRA [42, 43, 44] and BLoG [45], simulation-based debug
[46, 47], or debug techniques based on formal methods [48, 49, 50].
These techniques can directly benefit from the extremely short error
detection latencies and improved coverage of QED. With the increasing
complexity of uncore components in SoCs, new techniques for localizing
bugs inside uncore components are required.
5
To tackle the aforementioned challenges, many excellent and inspiring
works have been proposed. They can be categorized, but not restricted,
to the following fields. In this thesis, we will focus on observability enhance-
ment; specifically, we will propose new methods to select high quality trace
buffer signals to help debugging for post-silicon validation.
1. Error detection
(a) Test suites generation
(b) Quick error detection
2. Observability enhancement
(a) Efficient scan/trace signal selection
(b) Trace signal compression
(c) Bridging pre-silicon verification and post-silicon validation
3. Bug localization
(a) Formal method for error localization
(b) Special on-chip recorder, collecting footprints of execution
(c) Root cause identification
4. Bug fixing
(a) Microcode patches or special design techniques
5. Error-resilient system design
1.3 Scan Chain and Trace Buffer
Since we have completely fabricated silicon, it is not possible to observe each
and every internal signal. To capture the bugs that escaped pre-silicon verifi-
cation, recent design-for-debug (DFD) developments are commonly inserted
into the DUV. Scan chain and trace buffer are the two main DFD techniques
of DFD [35].
6
Figure 1.2: Scan-based debug
The primary goal of the scan-based technique is to reuse the internal scan
chains, which are placed in the DUV to increase the controllability and ob-
servability during manufacturing test [51]. During manufacturing test, the
functional pins are used as scan pins for loading multiple scan chains con-
currently to reduce test time. However, during debug, these scan chains
are concatenated, as shown in Fig 1.2. In this case the scan chain is load-
ed/unloaded through a serial interface, which is accessible in-system. By
capturing data in the internal state elements and oﬄoading them through
the scan chains (called scan dump), failure analysis can be performed oﬄine
to identify bugs in a design [52]. However, before the debug experiment is
reproducible, which is not the case for most failures in-field, there is little
knowledge about what caused the failure at the observable outputs. There-
fore, stopping the debug experiment and re-running it will not guarantee
that the failure will occur again. One can let the circuit execution continue
by oﬄoading the scan chains through shadow latches, but this may incur a
larger area penalty; besides, until a scan dump is completed it is not possible
to capture data in consecutive clock cycles. More importantly, the captured
data is always done in reaction to an event of interest, and hence it is difficult
to record the states that lead to that particular event, which may be crucial
during debugging [53, 54].
The previously-outlined limitations of scan chains are addressed through
the use of embedded logic analyzers [55]. An example of an ELA is shown
7
Figure 1.3: Example of an embedded logic analyzer
in Fig 1.3, which is divided into four components: control unit, trigger unit,
sample unit, oﬄoad unit. The control unit contains one or more finite state
machines (FSMs) with programmable registers. The programmable registers
can be configured using a serial interface like JTAG for receiving control
instructions. This allows the FSMs to control the other units in the ELA to
gather different sets of data in multiple experiments. embedded logic analysis
(ELA) has enabled storing some of the signal states onto an on-chip trace
buffer, which can later be used to reconstruct the unknown signals. Because
this trace buffer introduces an additional area penalty to the actual circuit,
its size is strictly limited. This limited capacity constrains the number and
cycles of signals to be stored; hence, selecting a powerful subset of internal
signals becomes one of the most important topics in post-silicon validation
[14-34].
As mentioned before, scan chain is more like a snapshot of a system mo-
ment while trace buffer records more temporal information of the system
execution [56]. Scan chain can provide the value of many signals, but only in
a short time span, say, one cycle; on the contrary, trace buffer can only store
a small amount of signals, but each of them can have thousands of cycles.
Since this trace buffer is very small, the amount of data that can be collected
during a single post-silicon validation run is ultimately limited by the capac-
ity of on-chip trace buffers [57]. First of all, only a small set of signals over a
specified number of clock cycles will be loaded into trace buffers. And then
8
more signals over more cycles will be restored from this original set of signals
to be used for pinpointing and fixing bugs. Secondly, there is no systematic
way to select trace buffer signals in industry; engineers just put signals of
interest onto the trace buffer based on experience. Since only about 2% bugs
escaped to post-silicon validation [1], manual selection of trace buffer signals
might not help since bugs might appear in unexpected parts of the circuit.
Furthermore, the more human power involved, the lower the efficiency. Post-
silicon validation requires accumulated experience of verification engineers;
therefore, the training of new engineers requires long time.
1.4 Motivation
Different techniques to select trace buffer signals have been proposed over
the years [14-34]. Both Ko et al. [15, 18] and Liu et al. [16, 17] have pro-
posed similar approaches of signal selection based on partial restoration, in
which a signal refers to the probability that the signal value can be recon-
structed using known values of some other traced signals. For each signal,
the sum of the partial restorability of all the signals in the circuit is com-
puted. If the trace buffer width is n, the n signals providing highest sum
of partial restorability are chosen for tracing. Since partial restoration tech-
niques are insufficient for signal reconstruction, later Basu et al. [19, 20, 24]
proposed a method using total restoration, in which the group of signals se-
lected can completely restore a certain amount of untraced signals; that is,
it is a special case of partial restorability with restorability value of 100%.
Their method is found to provide a higher signal restoration ratio than any
of the existing approaches using the ISCAS 89 benchmarks. All aforemen-
tioned methods share a common structure, with a metric to estimate the
restoration capacity of a certain set of state elements and a greedy selec-
tion algorithm to decide which ones to trace, based on the estimator metric.
Since the regional growth of selected signals has a clustering effect, and the
later selected signals depend on the initially selected signals, these methods
suffer from a diminishing restoration ratio effect, in which the restoration
ratio decreases dramatically as more trace signals are selected. Unlike these
aforementioned methods, a new method proposed by D. Chatterjee, C. Mc-
Carter and V. Bertacco [21] fundamentally differs from these previous ones
9
as it relies on simulation for estimation instead of a probabilistic metric.
This novel algorithm overcomes many key shortcomings of different heuris-
tics for estimating the state restoration capabilities of a group of signals
and provides up to 34% better state restoration. Most importantly, this
new method overcomes the intrinsic shortcoming of probabilistic methods of
diminishing returns; namely, when the number of traced signals increases,
additional restored state elements increase sub-linearly. Although the qual-
ity of the selected trace signals increased dramatically, this simulation-based
method is extremely computationally intensive; therefore it is not scalable to
circuits of even moderate size. At this stage, the state-of-the-art has shifted
to hybrid methods combining probability-based and simulation-based meth-
ods [28], scalable algorithms applying machine learning [33], multi-mode se-
lection of both scan signals and trace signals [23, 26, 27], and eventually
specialized system bridging pre-silicon verification and post-silicon valida-
tion [29, 22, 58, 59, 60]. Other works targeting specific tasks, systems or
platforms were also proposed during the past few years, such as post-silicon
verification of cache coherence [61], post-silicon validation of mixed/analog
circuits [62], post-silicon bug diagnosis with inconsistent execution [63] and
in-system silicon validation [64, 65]. All these aforementioned works build
upon the current best probability-based method [20] and simulation-based
method [21]; none of them has ever improved these two fundamental el-
ements. Furthermore, all of the previous works have focused on increasing
the visibility, and none of them has explained how those selected trace signals
are used during debugging.
We have proposed two methods to select trace signals. One uses a pure
PageRank algorithm without restorability, in which we transfer the design
circuit into a network and apply the PageRank algorithm to select impor-
tant sequential components. The other is a trace signal selection method
using HITS algorithm and restorability, in which we transform the design
circuit into a network with combinational components replaced with a single
restorability value and then analyze the authority and hub score of each sig-
nal to decide which signals to choose. The reasons that we apply PageRank
and HITS algorithm are as follows: In general, the World Wide Web (www)
is a system of interlinked hypertext documents [66], so is gate-level circuit
transformed into network. WWW provides an architectural framework for
accessing linked documents spread out over millions of machines all over the
10
Internet, and as such is an industrial-level circuit with billions of logic ele-
ments connected to each other. Formulating the trace buffer signal selection
problem into the page ranking problem is just a suitable but not intuitive
thought. Finding the set of most valuable websites among the haystack of
internet pages and selecting the most important set of trace signals for post-
silicon validation are actually similar to each other.
The pure PageRank method comprises two phases: network construction
and trace signal selection using PageRank algorithm. In the network con-
struction phase, we write a SystemVerilog netlist parser to parse the connec-
tion relationship of each circuit element and then construct a corresponding
network, in which logic elements are denoted as nodes and their connections
are denoted as directed edges. Then we directly apply PageRank algorithm
to select the most important nodes as trace buffer signals. The HITS method
comprises four phases: network construction, restoration probability compu-
tation, authority and hub score calculation and post-processing analysis of
authority and hub score. The network construction phase is the same as
the pure PageRank method; however, we have added a Structural Verilog
parser to make our method compatible with the ISCAS 89 benchmark. In
the restoration probability computation phase, we replace all combinational
circuits with a single value of restoration probability using independent and
dependent probability calculation methods and then reassign the connections
between all sequential circuits. All sequential circuits are preserved and the
directed edges between them have weights in terms of the aforementioned
restoration probability. In the authority and hub score calculation phase, we
slightly vary the original HITS algorithm to compute the authority and hub
score of each node. In the post-processing phase, we analyze the restora-
tion power of each node with a combination of authority and hub scores and
eventually select trace signals based on trace buffer size. As our experimental
results demonstrate, these two new algorithms select better trace signals in
terms of functionality coverage in the debugging process.
The contributions of this thesis are summarized as follows:
1. Represents design under test as a network (di-graph).
2. Formulates trace signal selection problem as a data-mining problem
(rank sequential component with respect to their importance).
11
3. Points out the intrinsic limitations of restoration probability and restora-
tion probability based methods.
4. Proposes a new metric, functionality coverage, to evaluate the quality
of selected trace signals.
5. Selects trace signals with much better quality in terms of functionality
coverage, which helps engineers with debugging in post-silicon valida-
tion.
6. With dynamic trace buffer infrastructure, enables the dynamic selection
of signals with different characteristics.
1.5 Outline
The rest of the thesis is organized as follows.
• Chapter 2 provides preliminaries of restorability and trace signal selec-
tion to maximize restoration ratio.
• Chapter 3 describes the HITS algorithm with respect to restoration
ratio.
• Chapter 4 describes PageRank algorithm without using restorability.
• Chapter 5 presents experimental results with several examples and
benchmarks.
• Chapter 6 concludes the thesis with a brief discussion of possible future
work.
12
CHAPTER 2
PRELIMINARIES
2.1 Signal Restoration
In digital circuit theory, combinational logic, sometimes referred to as time-
independent logic, is a type of digital logic, implemented by Boolean circuits,
where the output is a pure function of the present input only. This is in
contrast to sequential logic, in which the output depends not only on the
present input but also on the history of the input. In other words, sequential
logic has memory and so it can be used as storage, while combinational logic
does not have memory and so cannot store values.
The values at flip-flop outputs are estimated assuming uniform random
distribution of 0 and 1 logic values at the primary inputs. Given these as-
sumptions and using the knowledge of the traced signal values, a probabilistic
model of the visibility of 0 and 1 values at the other circuit nodes can be
generated. This probabilistic model leverages the circuit topology and logic
functionality of individual gates, and the estimation process performs for-
ward and backward propagation of probability values across logic gates. The
final state restoration capacity estimate is then expressed as a sum of the
predicted visibility of 0 and 1 values at the state elements of the circuit.
The first paper to propose an automated algorithm for selecting trace sig-
nals and the idea of state restoration was [15]. In this paper, the authors
provided the first step to develop the understanding of how computer-aided
design (CAD) could contribute to the post-silicon validation. It also high-
lighted the key factors for efficient state restoration and trace signal selection.
To illustrate the idea of state restoration that was introduced by [15],
we will use a simple sample circuit shown in Fig 2.1(a). The easiest way to
debug this sample circuit is to trace all five flip-flops for five continuous clock
cycles. In order to store this signal information, one needs a trace buffer with
13
Figure 2.1: Sample circuit for state restoration. (a) CUD. (b) Restored
data in sequential elements.
size of 5 by 5 bits. However, when debugging in reality, it is impossible to
trace all the internal signals of a design, especially when the design is very
large and complex. Thus, instead of tracing all the signals, if only FFC is
traced, one will be able to restore some of the missing data of the sampled
circuit using state restoration. The basic idea of state restoration is to apply
forward propagation and backward justification of Boolean relations to the
known values of traced signals in order to reconstruct the missing data. For
state restoration, no branching decision and backtracking are needed. It only
checks whether a circuit node can be restored. If not, an undefined value will
be recorded for that circuit node. However, to apply the state restoration
to the trace signals, the gate-level netlist of the CUD needs to be available.
If the gate-level netlist of certain blocks in the circuit is not available, then
the proposed state restoration cannot be applied on those blocks. Once the
state restoration is applied, the designers are able to use the expanded set of
data to verify the behavior of the CUD against design specification.
In order to understand the algorithm proposed by [15], one will need to
understand the principal operations of state restoration. For any given digital
14
Figure 2.2: Principle operations for state restoration. (a) Forward. (b)
Backward. (c) Combined. (d) Not defined.
circuit design, the combinational logic part of the circuit can be decomposed
into a network of primitive gates with two inputs (e.g., AND, OR, XOR, and
NOT). The proposed algorithm relies on two principle operations: forward
operations and backward operations. A forward operation can be applied to
a gate when the input values of that gate are known. It will try to determine
the value of the output from the given input values using Boolean relations.
As shown in Fig 2.2(a), the forward operations are applied to an AND gate
and an OR gate. When one of the inputs to the AND gate is 0, the output has
to be 0. When the input to the OR gate is 1, the output has to be 1. On the
other hand, a backward operation can be applied to a gate when the output
values of that gate are known. It will try to determine the value of the input
from the given output. As shown in Fig 2.2(b), the backward operations are
also applied to an AND gate and an OR gate. When the output of the AND
gate is 1, its inputs are guaranteed to be 1. When the output of the OR gate
is 0, its inputs also have to be 0. However, in some cases, the forward and
backward operations are insufficient to reconstruct the missing value. Thus
a combined method, in which both input and output values are known, is
used to find the missing values. As shown in Fig 2.2(c), the output of the
AND gate is 0, and one of its input is 1, thus the value of another input has
to be 0. Similarly, the output of the OR gates is 1, and one of its input is
0, thus the value of another input has to be 1. Note that the same principle
operations for state restoration can also be applied to primitive gates other
than AND and OR. From Fig 2.2(d), it is also obvious that in some cases,
missing values cannot be reconstructed using any of the principle operations
described above due to insufficient information.
15
Fig 2.1(b) is a simple example that can demonstrate how the principle
operations are applied. The sample circuit only has 5 flip-flops, and FFC is
the only signal that was traced for four cycles. Using forward, backward and
combined operations, the state restoration method is able to use restore 10
data for the other 4 flip-flops. All the values recorded as X refer to the values
that cannot be restored from available data.
An ideal post-silicon debugging solution would enable pre-silicon quality
observability; i.e., every signal value is observable at each cycle, with little
design effort and area overhead. A more realistic goal is to attain partial
observability by tracing a small set of signals and use them to restore more
signals over more cycles and eventually find the root cause of the bug. Several
previous solutions have suggested automatic signal selection algorithms to
determine which state elements allow maximum restoration if traced. An
intuitive measure for evaluating restoration quality is the state restoration
ratio, defined as
SRR =
NTraced +NRestored
NTraced
(2.1)
where NTraced is the number of traced state elements and Nrestored is the
number of restored ones during the time window dictated by the trace buffer
depth. Automated signal selection strives to maximize SRR.
2.2 Probability Based Trace Signal Selection
Algorithm Using Restorability
The algorithm proposed by Basu in [20] using total restorability consists of
five major steps:
1. Computation of edge values
2. Initial value computation for flip-flops
3. Initial region creation
4. re-computation of flip-flop values
5. Region growth
16
Before the computing edge values, a network is constructed using the cir-
cuit gate-level netlist, in which each node in the graph represents a flip-flop
and each edge represents the path taken between two nodes while passing
through only combinational gates between them. The edge can be either for-
ward or backward in direction. For calculating the edge values, there are two
cases that we need to consider: independent signals and dependent signals.
Figure 2.3: Sample circuit
Figure 2.4: Network constructed using gate-level netlist of Fig 2.3
Once the first step in the algorithm is done, the algorithm will perform
the second step, which is to calculate the initial value for flip-flops. The
algorithm defines the value of a flip-flop to be the sum of all the edges that
are attached to it. For example, the value of C in Fig 2.4 is 3, since it has
4 edges attached to it: CA, CB, CD and CE. Each edge has a value of 3/4,
thus the initial value for flip-flop C is 3/4× 4, which is 3. It is important to
17
note that the algorithm also uses a parameter called “threshold” in order to
prevent combinational loops.
Figure 2.5: Region creation and region growth
Once the initial values for flip-flops are calculated, we are able to use those
flip-flop values to create an initial region. A region is defined as a collection of
flip-flops that are attached together. It is not necessary that all the flip-flops
are connected with each other, but each flip-flop must have at least one edge
connected to a node in the region. The proposed algorithm selects the node
with the highest node value, and all flip-flops that have an edge connected
to that node will be added to the region to form an initial region. As shown
in Fig 2.5(a), the node with the highest value is C. Thus C will be selected
for tracing, and at the same time, every node that has an edge connected to
it will be included in the region. Thus the initial region includes C, A, D, B
and E.
Once the first traced signal is selected, the values for flip-flops inside the
region are recomputed. Those flip-flops might have edges inside and outside
of the region. The edge inside the region will have greater weight than the
edge outside the region since many state restorations will require knowledge
of the inside of the region to increase the total restorability of those signals.
Once the values for flip-flops are recalculated, the flip-flop with the highest
value within the region will be selected. If two nodes have the same value, the
one with higher forward restorability will be chosen since forward restorability
tends to have a better restoration than backward justification. Once a new
node is selected, all nodes that are connected to that node will be added to
the region. As shown in Fig 2.5(b), the node A is selected, and thus G is also
added into the region as the region growth is performed. Once the region is
18
updated, steps 4 and 5 are repeated until the trace buffer is full.
2.3 Problems of Probability Based Trace Signal
Greedy Selection Algorithm
2.3.1 Misleading Restoration Probability Estimate
An intrinsic limitation of probabilistic algorithms is the low degree of correla-
tion with the actual SRR in the post-silicon post-analysis. This phenomenon
is demonstrated in Fig 2.6, which plots average real SRR vs. the estimated
one obtained with Liu and Xu’s [16] restoration capacity estimation metric.
Although the correlation is positive, the extent of correlation is poor. The
fundamental reason behind this phenomenon is the lossy nature of probabilis-
tic algorithms. According to Fig 2.7, a probabilistic method will estimate
the restoration probability of value 1 of this AND gate to be 0.5× 0.5=0.25.
However, if the sequence of V1(a) and V1(b) in real execution is 1X1X1X
and X1X1X1 correspondingly, the actual restoration is 0 for the output sig-
nal V1(c) in all cycles. Therefore, an inevitable misleading direction, which
the probabilistic method cannot avoid, is the low correlation with the actual
restoration.
2.3.2 Diminishing Restoration Ratio
The aforementioned limitation of probabilistic methods also results from the
inaccuracy of the estimation metric, as well as the very nature of the greedy
selection, in which the restoration ratio saturates quickly when a large num-
ber of flip-flops are traced. The newly selected flip-flops are constrained by
the previous selected flip-flops. Therefore, at the very beginning, if the initial
set of selected signals is not good, this effect will propagate to all subsequent
selections. Hence, the proposed method in [21] applies a backward greedy al-
gorithm, which starts off with the set of all FFs, and then iteratively reduces
this set until the desired cardinality is obtained.
By utilizing a backward greedy algorithm, pruning based simulation elim-
ination and customized weight assignment, this proposed method achieves
19
Figure 2.6: Region creation and region growth
Figure 2.7: Region creation and region growth
up to 34% performance improvement over all previous probabilistic meth-
ods; however, this algorithm suffers from O(N2) complexity, which makes it
unscalable to any circuit of decent size.
20
CHAPTER 3
TRACE SIGNAL SELECTION USING HITS
ALGORITHM WITH RESPECT TO
RESTORABILITY
Our algorithm comprises four phases: network construction, restoration prob-
ability computation, authority and hub score calculation and post-processing
analysis of authority and hub score.
3.1 Network Construction
Figure 3.1: Example circuit
To formulate the trace signal selection process into a data-mining problem,
we first transform the DUV into a network, i.e. a directed graph using a
Structural Verilog /(System Verilog netlist) parser. Structural Verilog is a
hardware description language describing the connection relationship of all
circuit elements. We write a Structural Verilog parser since all previous
publications in this field use ISCAS 89 benchmarks, and all the circuits are
written in Structural Verilog. For example, a physical circuit depicted in Fig
3.1 can be written in a Structural Verilog code presented in pseudo code 1:
We also write a System Verilog netlist parser since it is very hard to inter-
pret the functionality of ISCAS 89 benchmarks. To test the functionality of
21
Pseudo code 1: Structural Verilog code of example circuit
input in1, in2;
output out1, out2;
dff A(Clk,in3,in1);
dff B(Clk,in5,in2);
dff C(Clk,in4,in16);
dff D(Clk,in8,in7);
dff E(Clk,in9,in6);
dff F(Clk,in11,in10);
dff G(Clk,out1,in14);
dff H(Clk,out2,in15);
and and1(in7,in4,in3);
and and2(in6,in5,in4);
and and3(in14,in3,in11);
and and4(in15,in5,in11);
or or1(in16,in5,in3);
or or2(in10,in9,in8);
the selected trace signals, we synthesized several System Verilog projects into
gate-level netlists using Synopsys Design Compiler with a standard library.
Nested System Verilog modules are flattened by Design Compiler and eventu-
ally synthesized into a single netlist. We use parsers to parse the connection
relationship between all circuit elements and then construct a network such
as that shown in Fig 3.2. Each node is a logic element and each edge is the
connection between a pair of circuit elements. The edge is directed, recording
the circuit topology.
3.2 Restoration Probability Computation
An edge between two flip-flops is the path taken to reach a flip-flop from
another, while passing through a number of combinational gates but not any
other flip-flop. For example, in Fig 3.1, flip-flops A and D are connected
passing through an AND gate. Generally, there can be any number and any
type of logic gates in between a pair of flip-flops. Once the circuit network
is constructed, we then remove all logic gates between any pair of flip-flops
and replace them with a single value of restoration probability, which is
the probability that the destination flip-flop can be restored by the start flip-
flop. The calculation of restoration probability is divided into two categories:
22
Figure 3.2: Constructed network of example circuit
independent probability and dependent probability.
3.2.1 Independent Probability
Consider the path AC and BC in Fig 3.1; these two paths are independent
since flip-flop C is driven independently by flip-flop A and C. To compute
independent probability, we use the generic example in Fig 3.3 to demonstrate
the calculation.
Figure 3.3: Independent scenario
Fig 3.3 has two flip-flops, K and L. We want to find how the input of L is
sensitized by the output of K. The input of L corresponds to the output of the
NAND gate. The path from K to L is independent of any other path through
which the output of K propagates. We define four probabilities: P I0,i, P
I
1,i,
PO0,i, P
O
1,i, in which P
I
0,i indicates the probability that a node i (gate or flip-
flop) has an input value of 0 when another node is controlling it. Similarly,
23
P I1,i, P
O
0,i and P
O
1,i indicate the cases for input value of 1, output value of 0 and
1, respectively. Consider the first AND gate; we define the overall control
probability from K to AND gate as
PAND = P
O
0,AND + P
O
1,AND (3.1)
Now define PO0,AND and P
O
1,AND. Let Pcond0,AND and Pcond1,AND be the prob-
ability that the output of the first AND gate follows the output of K; i.e.,
the output of the AND gate is 0 or 1 when output of K is 0 or 1 respectively.
Therefore,
PO0/1,AND = Pcond0/1,AND × P I0/1,AND (3.2)
Consider the characteristics of a 2-input AND gate; if one of the inputs is
0, the output will be 0; the output is only 1 when both of the inputs are
1s. Therefore, Pcond0,AND is 1, while Pcond1,AND is 0.5. Consequently, we
obtain PO0,AND = 0.5 and P
O
1,AND = 0.25. It can be seen that the probability
K controlling the first AND gate is 0.75. This specific process will then
be carried out on remaining logic gates in the combinational circuit chain
between flip-flops K and L, in which, the previous gate’s PO0/1 will serve
as the P I0/1 of the next gate. For instance, P
O
0,AND = 0.5 and P
O
1,AND =
0.25, hence, P I0,OR = 0.5 and P
I
1,OR = 0.25, P
O
0,OR= Pcond0,OR × P I0,OR =0.5
× 0.5=0.25, PO1,OR= Pcond1,OR × P I1,OR=0.25 × 1=0.25; POR, which is the
control probability of K controlling the second OR gate, is then 0.5. In this
way, the calculation continues until we reach L, to obtain the value of the
edge KL. When there are n combinational gates between K and L, we get
PO0/1,Gi =
∏
0≤i≥m
Pcond0/1,Gi × P I0/1,Gi (3.3)
Finally, PO0/1,Gi are added together to compute PGi , which corresponds to the
restoration probability of the edge between flip-flops K and L.
3.2.2 Dependent Probability
In case of dependent probability, we need to know the probability of a flip-flop
output influencing an m-input gate, when the output of the flip-flop affects
more than one input of the gate, i.e. there is more than one path between a
24
pair of flip-flops. Take the example circuit in Fig 3.4 as an example.
Figure 3.4: Dependent scenario
It can be seen from Fig 3.4 that both of the inputs x and y of L are
affected by flip-flop K; therefore, their probabilities of being 0 or 1 are not
independent of each other. At this stage, our goal is to compute the two
independent paths until the last AND gate and then treat the last AND
gate as a special dependent gate. To compute the independent probability
of the two separate paths, we can simply apply the equations discussed in
section 3.2.1. Once we get PO1,x, P
O
0,x, P
O
0,y and P
O
1,y, the dependent probability
calculation is as below. If either x or y is 0, the output of the last gate will
be 0 since Pcond0,AND is 1. The probability that one of the inputs of the last
AND gate is 0 is computed as
P I0,AND = P
O
0,x + P
O
0,y − PO0,x × PO0,y (3.4)
Therefore,
PO0,AND = Pcond0,AND × P I0,AND (3.5)
Only when both inputs of the last AND gate are 1s will the output of the
last AND gate be 1. The probability of that both x and y are 1 is computed
as
P I1,AND = P
O
1,x × PO1,y (3.6)
and then
PO1,AND = Pcond1,AND × P I1,AND (3.7)
The overall dependent probability is then P I1,AND+P
O
1,AND. Computing the
controlling input probability depends on input number m and path number
p; the equation to compute P I0/1(controlling) varies slightly based on basic prob-
ability knowledge. For example, if the last AND has three inputs and all of
them are initially generated from start flip-flop K, the probability of one of
25
the inputs being 0 is computed as
P I0,AND = P
O
0,x+P
O
0,y+P
O
0,z−PO0,x×PO0,y−PO0,x×PO0,z−PO0,y×PO0,z+PO0,x×PO0,y×PO0,z
(3.8)
This dependent probability will be carried out on all logic gates with depen-
dent scenarios; the rest of the gates will apply the independent probability
equations discussed in section 3.2.1.
3.2.3 Example
Consider the example circuit in Fig 3.1; following the computation described
in sections 3.2.1 and 3.2.2, the network constructed in Fig 3.2 will then be
transformed into a new network with only flip-flops preserved as nodes and
all edges of logic gates replaced with a single value of restoration probability
as shown in Fig 3.5.
Figure 3.5: Network with respect to restoration probability
26
3.3 Authority and Hub Score Calculation
Most signal selection algorithms presented in the literature so far share a
common structure. First, a metric is devised to estimate the state restoration
capacity of a given set of signals; second, a greedy selection process guided by
the metric is used to converge to a locally optimal selection. Our proposed
algorithm is fundamentally different, using a data-mining technique called
HITS.
Once the new network is constructed, we will apply a slightly varied HITS
algorithm to compute the authority and hub score of each node. Hyperlink-
Induced Topic Search, also known as HITS, is a link analysis algorithm that
rates Web pages, developed by Jon Kleinberg [67, 66, 68]. It is a precursor to
the famous Google PageRank algorithm [69, 70]. The idea behind Hubs and
Authorities stemmed from a particular insight into the creation of web pages
when the Internet was originally formed; that is, certain web pages, known
as hubs, served as large directories that were not actually authoritative in
the information they held, but were used as compilations of a broad catalog
of information that led users directly to other authoritative pages. In other
words, a good hub represented a page that pointed to many other pages,
and a good authority represented a page that was linked by many different
hubs. This evaluation metric to evaluate the quality of a single page in a
network can also be applied to a flip-flop in a DUV. If a flip-flop is a good
hub, it means the output of this specific flip-flop affects the operations of
many other logic elements behind it in the circuit topology, such as an input.
If a flip-flop is a good authority, it means the inputs of this specific gate can
be affected by many other logic elements in front of it in the circuit topology,
such as an output. Since the number of internal signals is several orders of
magnitude larger than the number of standard inputs and outputs, almost
all of the nodes in the network created in section 3.3 are a combination of
hub and authority scores. How to select the trace signals based on their hub
and authority scores will be discussed in section 3.4. Now, let us discuss
the HITS algorithm we implemented in this phase to compute the hub and
authority score of each flip-flop.
The pseudo code of the HITS algorithm we use in this work is presented as
algorithm 1. The original HITS algorithm initializes the hub and authority
score of each node to be 1; instead, we initialize the authority and hub
27
Table 3.1: Authority and hub score of each flip-flop in example circuit
Node Authority Score Hub Score Final Score
A 0.0 0.602 0.361
B 0.0 0.602 0.361
C 0.526 0.372 0.434
D 0.425 3.128e-43 0.170
E 0.425 3.128e-43 0.170
F 7.157e-43 0.372 0.223
G 0.425 0.0 0.170
H 0.425 0.0 0.170
score of each node to be the sum of weights of all incoming edges and sum
of weights of all outgoing edges respectively. We define max iter as the
maximum iteration number of the whole HITS process; the authority and
hub score evaluation will be done at most max iter times. We also define a
tolerance value of 1e-08 to decide whether the hub and authority of all nodes
in the network have converged. If within max iter times, the aforementioned
tolerance is not satisfied, an error will be raised to request a new max iter
number from the user. Otherwise, whenever the hub and authority converge,
the HITS process will stop and return the hub and authority score of each
flip-flop node in the network we constructed.
The hub and authority score of each flip-flip in the example circuit (Fig
3.1) is summarized in Table 3.1.
3.4 Post-Processing of Analysis of Authority and Hub
Score
One phenomenon of the probability-based trace signal selection algorithms
proposed so far is that the selected trace signals tend to cluster together
in the graph. This has two big drawbacks. First, in a large module, if
the selected signals cluster in one small group, it does not give enough in-
formation about the circuit in a global context. Second, once the selected
signals start to cluster, they will have a diminishing restoration ratio effect,
so that each additional signal selected will have a significantly smaller ad-
ditional gain added to the total restoration. This limitation is intrinsic so
that it cannot be solved using greedy selection common structure. Conse-
28
Algorithm 1 pseudo-code of HITS algorithm
Require: graph: network of circuit
Ensure: authority and hub score of each node in graph
/* initialization */
/* n.auth is the authority score of the flip-flop n */
/* n.hub is the hub score of the flip-flop n */
1: for flip-flop n in graph do
2: n.auth=sum(incoming edge weights)
3: n.hub=sum(outgoing edge weights)
4: end for
5: function HITS(graph)
/* run the algorithm for maxiter times */
6: for step from 1 to maxiter do
7: norm=0
8: for flip-flop n in graph do
9: n.auth=0
/* n.incomingneighbours is the set of flip-flops that link to n, which is the
predecessor of n in this network */
10: for flip-flop g in n.incomingneighbours do
11: n.auth+=q.hub
12: end for
13: norm+=n.auth2
14: norm=
√
norm
15: end for
16: for flip-flop n in graph do
17: n.auth=n.auth/norm
18: end for
19: norm=0
20: for flip-flop n in graph do
21: n.hub=0
/* n.outgoingneighbours is the set of flip-flops that n links to, which is the
successor of n in this network */
22: for flip-flop r in n.outgoingneighbours do
23: n.hub+=r.auth
24: end for
25: norm+=n.hub2
26: norm=
√
norm
27: end for
28: for flip-flop n in graph do
29: n.hub=n.hub/norm
30: end for
31: for flip-flop n in graph do
32: auth error=sum(|n current auth-n last auth|)
33: hub error=sum(|n current hub-n last hub|)
34: end for
35: if both auth error and hub error < tolerance then
36: return
37: else
38: raise error: authority and hub score don’t converge within maxiter times
39: end if
40: end for
29
quently, the whole research direction shifted to simulation-based trace signal
selection in the same year. Unlike all previous probability-based trace signal
selection algorithms, we analyze the hub and authority scores of each flip-flop
node globally. Comparatively, our algorithm has the potential to break the
aforementioned clustering effect.
Figure 3.6: Example circuit with authority and hub labeled
During the post-processing step, our goal is to figure out a way to use hub
and authority scores to assess the ability of each node to restore others. From
experimental results, we notice that a node with good hub and authority
scores is definitely an important node for state restoration. We also found
that in state restoration, we would prefer to select a node with high hub
score over a node with high authority score. As indicated in the sample
circuit in Fig 3.6, inputs A and B are pure hubs, and outputs G and H
are pure authorities. Intuitively, inputs can restore more data compared to
outputs. As a result, when calculating the final score for each node, we add
the authority score and the hub score together while giving a higher weight
30
to the hub score. Thus the formula for calculating the final node value is
Resn = 0.6n.hub+ 0.4n.auth (3.9)
For this example, the trace buffer size is 2; thus, the signals we selected are
C and A, which are the same results as those of the current best probability-
based signal trace signals selection [20].
To summarize, our trace signal selection method using HITS algorithm
with respect to Restoration Probability has the potential to break the clus-
tering effect and therefore tackle the diminishing restoration ratio effect.
Also, as we discussed with our industry connection, currently there are trace
controller designs that would enable verification engineers to dynamically se-
lect a set of trace signals for improved error detection. This kind of dynamic
trace signals selection infrastructure is also discussed in [71, 72]. Our HITS
algorithm would become especially useful since we can categorize sequential
elements into hub and authority. If we detect errors when we are observing
authority nodes as trace buffer signals in post-silicon debugging, we can dy-
namically change trace signals into hub nodes to analyze or even localize the
cause of the bugs. Although we have analyzed and tried different algorithms
for trace signal selection to maximize signal restoration ratio (SRR), we ac-
tually discovered the more interesting phenomenon that higher SRR does
not imply better quality for debugging, and lower SRR does not mean the
signals are not good for debugging. Fundamentally, we discovered that SRR
is not a good metric to evaluate the trace buffer signals when we evaluate
their quality in terms of debugging.
31
CHAPTER 4
PAGERANK ALGORITHM WITHOUT
RESTORATION PROBABILITY
4.1 Restoration Probability vs. Functionality
As we discussed in section 2.3, the restoration probability based method
has not only the intrinsic limitation of low degree of correlation with the
actual SRR in the post-silicon analysis, but also diminishing restoration ratio
effect. These two disadvantages are generated because of the lossy nature of
probabilistic algorithms and the nature of greedy algorithms. As we try to
overcome these limitations, we realize the restoration probability itself is not
a good metric for post-silicon debugging. For example, we have analyzed the
implementation of an LC3B processor, in which the most important signals
would be isdu state signals, which indicate the current states of the LC3B
machine. However, the restoration probability based method will not select
these control state signals since its restoration probability is very low. The
very intuitive explanation of the phenomenon is demonstrated as in Fig 4.1:
Figure 4.1: Example connection between isdu state signal register and
other registers.
As illustrated in Fig 4.1, one example path between an isdu state signal
register and a mem2io register has 9 combinational logic elements in-between.
Then the restoration probability value on this edge between these two nodes
32
in the circuit network will be very low. Since all the paths originating from
isdu state signal registers will go through many combinational gate elements
to arrive at the next register, the degree value of isdu state signal node, which
is the sum of the weights of all incoming and outgoing edges in the circuit
network, will be extremely small compared to other nodes. This phenomenon
can be generalized to all control signals, since control state registers usually go
through many more combinational logic elements than other regular registers
because they control other entities of the circuit, and a majority of the control
logic is done in combinational logic elements. Therefore, due to the very
nature of the restoration probability based method, those signals connected
to many other combinational logic elements, for instance, the aforementioned
control state registers, which are supposed to be the most important signals,
are actually ignored due to diluted degree.
In addition to the above analysis, all of the previous literature has used
many techniques and algorithms to maximize SRR; however, no work has
ever discussed what are the high-level meanings of those selected trace buffer
signals and how they are used in post-silicon debugging. Unlike all of the
previous methods, we propose a new method with a new metric that dis-
cards the whole restoration probability concept and selects the signals that
will cover more functionality in terms of assertions and eventually help engi-
neers debug better. As demonstrated in experimental results section, higher
restoration probability does not imply better functionality coverage for de-
bugging. Our method can select better signals than the previous restoration
probability based method in terms of functionality coverage.
4.2 Functionality Coverage Definition
Assertion, as a succinct representation of the design under verification, has
prevailed in pre-silicon hardware verification monitoring combinational and
sequential behaviors [73, 74]. In general, an assertion is a statement about the
intended behavior of a design. Assertions ensure consistency between design
intent and design implementation. Traditional verification approaches rely
on injecting random stimulus into the design under verification and checking
the results at the output. However, the increasing complexity of design
makes coverage and debugging much harder. Assertions can help improve
33
the verification process in many different ways, such as concisely expressing
behaviors that span multiple clock cycles, providing a mechanism for precise
documentation of design intent and assumption, pinpointing errors at the
point of origin, etc.
In industry, assertions are used by design verification teams at early stages
of the design cycle to monitor simulation; they are also heavily used by for-
mal verification teams to formally verify the functionality of designs in later
stages of the design cycle. Assertions are manually written by design engi-
neers and verification engineers in an ad-hoc manner targeting highly risky
behaviors of the design entity. Therefore, the most important characteris-
tics the designers are concerned with, i.e., the key functionality of a design
entity, can be represented by a set of assertions tested in the pre-silicon ver-
ification stage. However, since an assertion can monitor both combinational
and sequential behaviors, while the post-silicon validation trace buffer can
only trace sequential circuits, the sequential functionality of a design entity
is a subset of assertions created in the pre-silicon stage.
Regarding the LC3B processor we mentioned above, example assertions
monitoring sequential behaviors for the instruction fetch stage would be as
follows:
1. (cpu/isdu/state reg[4 : 0] == 5′d18)|− > (cpu/MAR/Dout reg[15 :
0] == cpu/PC/Dout reg[15 : 0])
2. (cpu/isdu/state reg[4 : 0] == 5′d18)|− > ##1(cpu/PC/Dout reg[15 :
0] == $past(cpu/PC/Dout reg[15 : 0]) + 16′d1)
3. (cpu/isdu/state reg[4 : 0] == 5′d18)|− > ##1(cpu/isdu/state reg[4 :
0] == 5′d33)
4. (cpu/isdu/state reg[4 : 0] == 5′d33)|− > (cpu/MDR/Dout reg[15 :
0] == mem2io/hex data reg[15 : 0])
5. (cpu/isdu/state reg[4 : 0] == 5′d33)|− > ##1(cpu/isdu/state reg[4 :
0] == 5′d35)
6. (cpu/isdu/state reg[4 : 0] == 5′d35)|− > (cpu/IR/Dout reg[15 :
0] == cpu/MDR/Dout reg[15 : 0])
7. (cpu/isdu/state reg[4 : 0] == 5′d35)|− > ##1(cpu/isdu/state reg[4 :
0] == 5′d32)
34
An assertion is defined as covered if all signals presented in this
assertion are selected as trace signals. For instance, if cpu/isdu/state reg[4 :
0] are selected as trace signals, we would say assertions 3, 5 and 7 are cov-
ered. However, other assertions listed above are not covered since we cannot
monitor the behavior monitored by those assertions with the partial signal
bit selected.
4.3 PageRank Algorithm
PageRank [70, 75, 76] is a Web page ranking technique that has been a
fundamental ingredient in the development and success of the Google search
engine. It is not the only algorithm used by Google to order search engine
results, but it is the first algorithm that was used by the company, and it
is the best-known. The method is still one of the many signals that Google
uses to determine which pages are most important [77, 78]. The main idea
behind PageRank is to determine the importance of a Web page in terms of
the importance assigned to the pages hyperlinking to it. For instance, we
create a web page i that includes a hyperlink to web page j. If many other
pages also link to j, we then consider j important on the web. On the hand,
if j only has one in-link, but, this link is from an authoritative web page k
(like www.google.com, www.yahoo.com, or www.bing.com), we also think j
is important because k can transfer its popularity or authority to j. Although
PageRank is widely used for networks, it has never been used in the field of
post-silicon trace signal selection. However, a circuit can be transferred into
a directed graph, i.e., a network using gate-level netlist parser. Then, the
trace signal selection problem can be formulated into a data-mining problem.
The contributions of this method using PageRank to select trace signals
are summarized as follows:
1. Represents design under test as a network (di-graph).
2. Formulates trace signal selection problem as a data-mining problem
(rank sequential components with respect to their importance).
3. Points out the intrinsic limitations of restoration probability and restora-
tion probability based methods.
35
4. Proposes a new metric to evaluate the quality of selected trace signals.
The pure PageRank method comprises two phases: network construction
and trace signal selection using the PageRank algorithm. In the network con-
struction phase, we write a SystemVerilog netlist parser to parse the connec-
tion relationship of each circuit element and then construct a corresponding
network, in which logic elements are denoted as nodes and their connections
are denoted as directed edges. Then we directly apply PageRank to select
the most important sequential nodes as trace buffer signals.
4.3.1 Network Construction
Figure 4.2: Example circuit
To formulate the trace signal selection process into a data-mining problem,
we first transform the DUV into a network, i.e., a directed graph using a
System Verilog netlist parser. We wrote a System Verilog netlist parser since
it is very hard to interpret the functionality of ISCAS 89 benchmarks. To test
the functionality of the selected trace signals, we synthesized several System
Verilog projects into gate-level netlists using Synopsys Design Compiler with
a standard library. Nested System Verilog modules are flattened by Design
Compiler and eventually synthesized into a single netlist. We use parsers
to parse the connection relationship between all circuit elements and then
construct a network such as that shown in Fig 4.3. Each node is a logic
element and each edge is the connection between a pair of circuit elements.
The edge is directed, recording the circuit topology.
36
Figure 4.3: Constructed network of example circuit
4.3.2 Degree Calculation Using PageRank Algorithm
Once the circuit network is constructed, we will apply PageRank to compute
the importance/popularity of each node. Suppose for instance we have a
directed graph based on the circuit in Fig 4.3 that has only 14 logic elements
(8 sequential element and 6 combinational logic gates), one for each node.
When node i references j, we add a directed edge between nodes i and j in the
graph. In PageRank model, each node should transfer evenly its importance
to the nodes that it links to. For example, node A has 3 out-links, so it will
pass on 1/3 of its importance to node OR1, 1/3 to AND2, and 1/3 to AND3.
In general, if a node has k out-links, it will pass on 1/k of its importance to
each of the nodes that it links to. According to this importance transition
rule, we can define the transition matrix of the graph, say P, as follows:
37
P =

0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
1
3
1
3
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 1
3
1
2
0 0 0 0 0 0 0 0 0 0 0
1
3
0 1
2
0 0 0 0 0 0 0 0 0 0 0
1
3
0 0 0 0 1
2
0 0 0 0 0 0 0 0
0 1
3
0 0 0 1
2
0 0 0 0 0 0 0 0

Starting with the uniform distribution, the importance of each node is
1
14
. Let pi denote the initial PageRank value vector, having all entries equal
to 1
14
. Because each incoming link increases the PageRank value of a web
page, we update the rank of each page by adding to the current value the
importance of the incoming links. This is the same as multiplying the matrix
P by pi. Numeric computations are given as follows. We can observe that the
iterations of pi, Ppi, P 2pi, P 3pi, ..., P kpi tend to converge to the value P kpi,
which is all 0s for this case since we have dangling nodes. However, in other
scenarios, where the matrix P has no lines or columns filled with all 0s, the
iterations of P kpi will converge to a matrix pi∗, which is the PageRank vector
in our circuit network.
38
pi =

1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14
1
14

Ppi =

0
0
0.0714
0.0714
0.0714
0.0714
0.0714
0.0714
0.0476
0.1429
0.0595
0.0595
0.0595
0.0595

P 2pi =

0
0
0.0476
0.0595
0.0595
0.1429
0.0595
0.0595
0
0.1429
0.0357
0.0357
0.0357
0.0357

P kpi =

0
0
0
0
0
0
0
0
0
0
0
0
0
0

(4.1)
As demonstrated in our simple circuit example, there are some nodes hav-
ing no out-links, such as node G and node H, which are called dangling nodes.
In the constructed network, our random surfer will get stuck on these nodes,
and the importance received by these nodes cannot be propagated. In the
other scenario, if our network has two disconnected components, the random
surfer that starts from one component has no way to get into the other com-
ponent. All nodes in the other component will receive 0 importance. Since
dangling nodes and disconnected components are actually quite common on
the Internet as well as in common circuits, considering the large scale of the
web and of industrial circuits, in order to deal with these two problems, a
positive constant d between 0 and 1 (typically 0.15) is introduced, which we
call the damping factor [70]. Now we modify the previous transition matrix
based on d into P ′ = (1− d) • P + d •R, where
39
R = 1
N

1 1 ... 1
1 1 ... 1
...
... ...
...
1 1 ... 1
1 1 ... 1

This new transition matrix models the random walk as follows: most of
the time, a surfer will follow links from a node if that node has outgoing
links. A smaller, but positive, percentage of the time, the surfer will dump
the current node and choose arbitrarily a different node from the web, and
“teleport” there. The damping factor d reflects the probability that the surfer
quits the current node and “teleports” to a new one. Since every node can
be teleported, each page has 1 probability to be chosen. This justifies the
structure of R.
After the dangling nodes adjustment, we will have new P’ as follows:
P’ =

0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86 0.01
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.86
0.29 0.29 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.86 0.86 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.29 0.44 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.29 0.01 0.44 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.29 0.01 0.01 0.01 0.01 0.44 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.29 0.01 0.01 0.01 0.44 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01

40
Eventually, a new PageRank vector generated from iterations of P kpi is
shown as follows:
pi∗ =

0.029
0.029
0.043
0.052
0.052
0.104
0.074
0.074
0.032
0.104
0.042
0.042
0.068
0.068

(4.2)
The PageRank formula based on the previous discussion is as follows:
PR(pi) =
1− d
N
+ d(
∑
Pj links to Pi
PR(pj)
L(pj)
+
∑
Pj has no out−links
PR(pj)
N
) (4.3)
And the pseudocode of the implementation of PageRank is demonstrated
in Algorithm 2. Eventually, if the trace buffer width is 2, the sequential
elements selected would be registers F and G.
41
Algorithm 2 pseudo-code of PageRank algorithm
1: procedure PageRank(G, iteration)
/* damping factor */
2: d = 0.85
/* outlink count hash from G */
3: oh ← G
/* inlink count hash from G */
4: ih ← G
/* number of nodes from G */
5: N ← G
/* initialize PageRank */
6: for p in graph do
7: opg[p] ← 1
N
8: end for
9: while iteration > 0 do
10: dp ← 0
/* PageRank values from nodes with out-links */
11: for p has no out-links do
12: dp ← dp+d × opg[p]
N
13: end for
14: for p in graph do
15: npg[p] ← dp+ 1−d
N
/* PageRank values from random jumps */
16: for p in graph do
17: npg[p] ← npg[p]+ d∗opg[ip]
oh[ip]
/* PageRank values from in-links */
18: end for
19: end for
20: opg ← npg
21: iteration=iteration-1
22: end while
23: end procedure
42
CHAPTER 5
EXPERIMENTAL RESULTS
5.1 Signal Restoration Ratio Comparison
We applied our approach on a standard LC3B processor and a cache coher-
ence controller (MESI ISC) to compare two of our methods with the current
best restoration probability based method created by Basu [20] and hence
show the effectiveness of our algorithm. We have designed an event driven
simulator along the lines of that described by [15], which conducts simula-
tion in both forward and backward directions. We have implemented the
simulator as an iterative process which terminates when it is not possible to
restore any more states. We have fed the simulator with 10 sets of random
values and noted the average restoration ratio.
The experimental results are shown in Table 5.1.
Although two of our methods, HITS method and PageRank method, are
not created to maximize restoration ratio, as shown in Table 5.1, the restora-
tion ratios of these two algorithms are still higher than Basu’s method for the
LC3 benchmark. The possible reasons for this phenomenon are summarized
as follows:
• Basu’s method selects incomplete and sparse trace signals, hence, we
know neither the stage the processor is in, nor the instruction it is
executing.
Table 5.1: Restoration ratio of three methods
Method LC3B MESI ISC
Basu 1.3 5.1
HITS 3.3 /
PageRank 3.4 2.2
43
• All the previous works use ISCAS 89 benchmark, which only includes
simple logic gates: AND, OR and NOT. According to the characteris-
tics of these three simple logic gates, their restorability is quite high.
However, in this thesis, we use real projects implemented in System
Verilog and synthesize them with an industry-standard tool, Design
Compiler, which generates complicated logic components such as adder,
subtractor, mux and tri-state buffer, etc. These logic components add
more complexity to get a deterministic restored value.
• Besides more complicated combinational logic elements, the sequential
element, D flip-flop (DFF) synthesized by Design Compiler, is much
more standard than the DFF used in ISCAS 89. DFF synthesized by
Design Compiler has a control signal synch enable, which is the load
signal of a DFF. This is a typical industrial standard DFF. In contrast,
the DFF used in ISCAS 89 has no load signal; that is to say, the output
is always updated at the rising edge of the clock, which only represents
a small portion of the DFFs used in a circuit nowadays.
– ISCAS 89 DFF: dff DFF 22(Clk,g6031,g6027);
– Design Compiler synthesized DFF:
**SEQGEN** cpu/PC/Dout reg[15]
( .clear( 0 net ), .preset(1’b0), .next state(cpu/PCin [15]),
.clocked on(Clk), .data in(1’b0), .enable(1’b0),
.Q(cpu/PCout [15]), .synch clear(1’b0), .synch preset(1’b0),
.synch toggle(1’b0), .synch enable(cpu/LD PC ) );
• Basu’s method does not select cpu/isdu/state reg[4 : 0], which are the
most important signals in LC3 indicating which stage the processor is
currently in. These 5 control signals control the synch enable signal of
the majority of the DFFs in LC3 through combinational circuits, such
as mux and tri-state buffer. The direct consequence of not selecting
cpu/isdu/state reg[4 : 0] on gate-level is that the load signal of each
DFF is undetermined, and so is the output of each DFF. Therefore,
the restorability is severely limited.
44
5.2 Functionality Coverage Comparison
5.2.1 LC3B Processor
The LC-3 processor is a simple micro-controller with 15 types of instructions
implemented for its ISA. The architecture is a load-store architecture; values
in memory must be brought into the register file before they can be operated
upon. It specifies a word size of 16 bits for its registers and uses a 16-bit
addressable memory with a 216-location address space. The register file
contains eight registers, referred to by number as R0 through R7. All of the
registers are general-purpose in that they may be freely used by any of the
instructions that can write to the register file, but in some contexts (such as
translating from C code to LC-3 assembly or JUMP instruction) some of the
registers are used for special purposes.
Arithmetic instructions available include addition, bitwise AND, and bit-
wise NOT, with the first two of these able to use both registers and sign-
extended immediate values as operands. These operations are sufficient to
implement a number of basic arithmetic operations, including subtraction
(by negating values) and bitwise left shift (by using the addition instruction
to multiply values by two). The LC-3 can also implement any bitwise logical
function, because NOT and AND together are logically complete.
Memory accesses can be performed by computing addresses based on the
current value of the program counter (PC) or a register in the register file;
additionally, the LC-3 provides indirect loads and stores, which use a piece
of data in memory as an address to load data from or store data to. Values
in memory must be brought into the register file before they can be used as
part of an arithmetic or logical operation.
The LC-3 provides both conditional and unconditional control flow instruc-
tions. Conditional branches are based on the arithmetic sign (negative, zero,
or positive) of the last piece of data written into the register file. Uncondi-
tional branches may move execution to a location given by a register value
or a PC-relative offset. Three instructions (JSR, JSRR, and TRAP) support
the notion of subroutine calls by storing the address of the code calling the
subroutine into a register before changing the value of the program counter.
The LC-3 does not support the direct arithmetic comparison of two values;
comparing two register values arithmetically requires subtracting one from
45
the other and evaluating the result.
Figure 5.1: High-level block diagram of LC-3 micro-controller
The main components of an ISA are data path and control unit as shown
in Fig 5.1. The behavior of the LC-3 microarchitecture during a given clock
cycle is completely determined by the 49 control signals, combined with nine
bits of additional information (inst[15:11], PSR[15], BEN, INT, and R), as
shown in Fig 5.1. We have said that during each clock cycle, 39 of these
control signals determine the processing of information in the data path
and the other 10 control signals combine with the nine bits of additional
information to determine which set of control signals will be required in the
next clock cycle. We say that these 49 control signals specify the state of the
control structure of the LC-3 microarchitecture. We can completely describe
the behavior of the LC-3 microarchitecture by a state machine depicted in
Fig 5.2.
To test the functionalities of the LC-3 micro-controller, we wrote Sys-
tem Verilog Assertions to monitor the behaviors of the control unit (Fig
5.2), which is the flow of the entire LC-3 micro-controller. In each stage,
instructions and state transition are represented as assertions. For exam-
ple, in stage 18, instruction MAR ← PC is transformed into assertion A1:
46
Figure 5.2: Finite state machine in LC-3 control unit
(cpu/isdu/state reg[4 : 0] == 5′d18)|− > (cpu/MAR/Dout reg[15 : 0] ==
cpu/PC/Dout reg[15 : 0]). Overall, we wrote 79 assertions monitoring only
sequential logic elements to thoroughly test the functionalities of LC-3 micro-
controllers, and these assertions are listed in Table 5.2.
The LC-3 micro-controller we implemented has 216 registers and 1341
combinational logic elements, including basic logic gates, such as AND, OR
and NOT, as well as more advanced components such as full-adder, tri-state
buffer, MUX and buffer. Since LC-3 is a 16-bit architecture, we need at
least 16 registers selected to have a complete design component such as PC,
MAR and MDR. Therefore, we selected approximately 20% of registers to be
47
Table 5.2: Assertion table of LC3. Stage: number of stage in LC3 control
unit finite state machine (Fig 5.2). Name: name of assertions. Assertion:
assertion contents.
Stage Name Assertion
18
A1
(cpu/isdu/state reg[4 : 0] == 5′d18)|− >
(cpu/MAR/Dout reg[15 : 0] == cpu/PC/Dout reg[15 : 0])
A2
(cpu/isdu/state reg[4 : 0] == 5′d18)|− >
##1(cpu/PC/Dout reg[15 : 0] == $past(cpu/PC/Dout reg[15 : 0]) + 16′d1)
A3
(cpu/isdu/state reg[4 : 0] == 5′d18)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d33)
33
A4
(cpu/isdu/state reg[4 : 0] == 5′d33)|− >
(cpu/MDR/Dout reg[15 : 0] == mem2io/hex data reg[15 : 0])
A5
(cpu/isdu/state reg[4 : 0] == 5′d33)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d35)
35
A6
(cpu/isdu/state reg[4 : 0] == 5′d35)|− >
(cpu/IR/Dout reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A7
(cpu/isdu/state reg[4 : 0] == 5′d35)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d32)
32
A8
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d1)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d1)
A9
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d2)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d2)
A10
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d3)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d3)
A11
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d4)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d4)
A12
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d5)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d5)
A13
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d6)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d6)
A14
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d7)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d7)
A15
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d8)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d8)
A16
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d9)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d9)
A17
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d10)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d10)
A18
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d11)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d11)
A19
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d12)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d12)
A20
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d13)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d13)
A21
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d14)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d14)
A22
(cpu/isdu/state reg[4 : 0] == 5′d32&&cpu/IR/Dout reg[15 : 12] == 4′d15)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d15)
1
A23
(cpu/isdu/state reg[4 : 0] == 5′d1&&cpu/IR/Dout reg[5] == 1′d0)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/Registers/SR1/Dout reg[15 : 0]
+cpu/Registers/SR2/Dout reg[15 : 0])
A24
(cpu/isdu/state reg[4 : 0] == 5′d1&&cpu/IR/Dout reg[5] == 1′d1)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/Registers/SR1/Dout reg[15 : 0]
+cpu/IR/Dout reg[4 : 0])
A25
(cpu/isdu/state reg[4 : 0] == 5′d1)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
5
A26
(cpu/isdu/state reg[4 : 0] == 5′d5&&cpu/IR/Dout reg[5] == 1′d0)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/Registers/SR1/Dout reg[15 : 0]
&cpu/Registers/SR2/Dout reg[15 : 0])
A27
(cpu/isdu/state reg[4 : 0] == 5′d5&&cpu/IR/Dout reg[5] == 1′d1)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/Registers/SR1/Dout reg[15 : 0]
&cpu/IR/Dout reg[4 : 0])
A28
(cpu/isdu/state reg[4 : 0] == 5′d5)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
48
Table 5.2: Continued
Stage Name Assertion
9
A29
(cpu/isdu/state reg[4 : 0] == 5′d9)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/Registers/SR/Dout reg[15 : 0])
A30
(cpu/isdu/state reg[4 : 0] == 5′d9)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
15
A31
(cpu/isdu/state reg[4 : 0] == 5′d15)|− >
(cpu/MAR/Dout reg[15 : 0] == {8′d0, cpu/IR/Dout reg[7 : 0]})
A32
(cpu/isdu/state reg[4 : 0] == 5′d15)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d28)
28
A33
(cpu/isdu/state reg[4 : 0] == 5′d28)|− >
(cpu/MDR/Dout reg[15 : 0] == mem2io/hex data reg[15 : 0])
A34
(cpu/isdu/state reg[4 : 0] == 5′d28)|− >
(cpu/Registers/R7/Dout reg[15 : 0] == cpu/PC/Dout reg[15 : 0])
A35
(cpu/isdu/state reg[4 : 0] == 5′d28)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d30)
30
A36
(cpu/isdu/state reg[4 : 0] == 5′d30)|− >
(cpu/PC/Dout reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A37
(cpu/isdu/state reg[4 : 0] == 5′d30)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
14
A38
(cpu/isdu/state reg[4 : 0] == 5′d14)|− >
(cpu/Registers/DR/Dout reg[15 : 0] ==
cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[9 : 0])
A39
(cpu/isdu/state reg[4 : 0] == 5′d14)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
10
A40
(cpu/isdu/state reg[4 : 0] == 5′d10)|− >
(cpu/MAR/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[8 : 0]
A41
(cpu/isdu/state reg[4 : 0] == 5′d10)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d24)
24
A42
(cpu/isdu/state reg[4 : 0] == 5′d24)|− >
(cpu/MDR/Dout reg[15 : 0] == mem2io/hex data reg[15 : 0])
A43
(cpu/isdu/state reg[4 : 0] == 5′d24)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d26)
2
A44
(cpu/isdu/state reg[4 : 0] == 5′d2)|− >
(cpu/MAR/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[8 : 0]
A45
(cpu/isdu/state reg[4 : 0] == 5′d2)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d25)
26
A46
(cpu/isdu/state reg[4 : 0] == 5′d26)|− >
(cpu/MAR/Dout reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A47
(cpu/isdu/state reg[4 : 0] == 5′d26)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d25)
25
A48
(cpu/isdu/state reg[4 : 0] == 5′d25)|− >
(cpu/MDR/Dout reg[15 : 0] == mem2io/hex data reg[15 : 0])
A49
(cpu/isdu/state reg[4 : 0] == 5′d25)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d27)
27
A50
(cpu/isdu/state reg[4 : 0] == 5′d27)|− >
(cpu/Registers/DR/Dout reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A51
(cpu/isdu/state reg[4 : 0] == 5′d27)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
6
A52
(cpu/isdu/state reg[4 : 0] == 5′d6)|− >
(cpu/MAR/Dout reg[15 : 0] ==
cpu/Registers/DR/Dout reg[15 : 0] + cpu/IR/Dout reg[5 : 0])
A53
(cpu/isdu/state reg[4 : 0] == 5′d6)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d25)
11
A54
(cpu/isdu/state reg[4 : 0] == 5′d11)|− >
(cpu/MAR/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[8 : 0]
A55
(cpu/isdu/state reg[4 : 0] == 5′d11)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d29)
29
A56
(cpu/isdu/state reg[4 : 0] == 5′d29)|− >
(cpu/MDR/Dout reg[15 : 0] == mem2io/hex data reg[15 : 0])
A57
(cpu/isdu/state reg[4 : 0] == 5′d29)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d31)
49
Table 5.2: Continued
Stage Name Assertion
7
A58
(cpu/isdu/state reg[4 : 0] == 5′d7)|− >
(cpu/MAR/Dout reg[15 : 0] ==
cpu/Registers/DR/Dout reg[15 : 0] + cpu/IR/Dout reg[5 : 0])
A59
(cpu/isdu/state reg[4 : 0] == 5′d7)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d23)
31
A60
(cpu/isdu/state reg[4 : 0] == 5′d31)|− >
(cpu/MAR/Dout reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A61
(cpu/isdu/state reg[4 : 0] == 5′d31)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d23)
3
A62
(cpu/isdu/state reg[4 : 0] == 5′d3)|− >
(cpu/MAR/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[8 : 0]
A63
(cpu/isdu/state reg[4 : 0] == 5′d3)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d23)
23
A64
(cpu/isdu/state reg[4 : 0] == 5′d23)|− >
(cpu/MDR/Dout reg[15 : 0] == cpu/Registers/SR/Dout reg[15 : 0])
A65
(cpu/isdu/state reg[4 : 0] == 5′d23)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d16)
16
A66
(cpu/isdu/state reg[4 : 0] == 5′d16)|− >
(mem2io/hex data reg[15 : 0] == cpu/MDR/Dout reg[15 : 0])
A67
(cpu/isdu/state reg[4 : 0] == 5′d16)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
0 A68
(cpu/isdu/state reg[4 : 0] == 5′d0)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
22
A69
(cpu/isdu/state reg[4 : 0] == 5′d22)|− >
(cpu/PC/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[8 : 0]
A70
(cpu/isdu/state reg[4 : 0] == 5′d22)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
12
A71
(cpu/isdu/state reg[4 : 0] == 5′d12)|− >
(cpu/PC/Dout reg[15 : 0] == cpu/Registers/BaseR/Dout reg[15 : 0])
A72
(cpu/isdu/state reg[4 : 0] == 5′d12)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
4
A73
(cpu/isdu/state reg[4 : 0] == 5′d4)|− >
(cpu/Registers/R7/Dout reg[15 : 0] == cpu/PC/Dout reg[15 : 0])
A74
(cpu/isdu/state reg[4 : 0] == 5′d12&&cpu/IR/Dout reg[11] == 1′d1)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d21)
A75
(cpu/isdu/state reg[4 : 0] == 5′d12&&cpu/IR/Dout reg[11] == 1′d0)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d20)
21
A76
(cpu/isdu/state reg[4 : 0] == 5′d21)|− >
(cpu/PC/Dout reg[15 : 0] ==
(cpu/PC/Dout reg[15 : 0] + cpu/IR/Dout reg[10 : 0]
A77
(cpu/isdu/state reg[4 : 0] == 5′d21)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
20
A78
(cpu/isdu/state reg[4 : 0] == 5′d20)|− >
(cpu/PC/Dout reg[15 : 0] == cpu/Registers/BaseR/Dout reg[15 : 0])
A79
(cpu/isdu/state reg[4 : 0] == 5′d20)|− >
##1(cpu/isdu/state reg[4 : 0] == 5′d18)
50
Table 5.3: Selected trace buffer signals by Basu
Name Signal
S1 cpu/PC/Dout reg[15:8][6][4:0]
S2 cpu/MDR/Dout reg[15][13][6][3][0]
S3 cpu/IR/Dout reg[15][3]
S4 cpu/Registers/R1/Dout reg[0]
S5 cpu/Registers/R5/Dout reg[3]
S6 cpu/Registers/R3/Dout reg[15][3]
S7 cpu/Registers/R0/Dout reg[13][3]
S8 cpu/Registers/R2/Dout reg[13][3]
S9 cpu/Registers/R7/Dout reg[6][4][3]
S10 cpu/Registers/R4/Dout reg[13][3]
S11 cpu/Registers/R6/Dout reg[15][0]
S12 mem2io/hex data reg[15][3]
S13 cpu/MAR/Dout reg[15][3]
Table 5.4: Selected trace buffer signals by HITS
Name Signal
S14 cpu/PC/Dout reg[15:0]
S15 cpu/MDR/Dout reg[15:0]
S16 cpu/isdu/state reg[4:0]
S17 cpu/Registers/R0/Dout reg[7]
S18 cpu/Registers/R1/Dout reg[0]
S19 cpu/IR/Dout reg[5]
loaded onto the on-chip trace buffer. Therefore, for each method, we selected
40 trace signals based on their importance evaluated in that specific method.
The selected trace signals of each method are presented in Tables 5.3, 5.4
and 5.5.
Functionality coverage for each method is summarized in Tables 5.6 and
5.7, and both restoration ratio and functionality coverage are compared and
plotted in Fig 5.7. From our experimental results, we can conclude that our
PageRank algorithm without restoration probability selects the best signals
in terms of both functionality coverage and restoration ratio; the HITS algo-
rithm with respect to restorability also selects trace signals with good quality
for debugging. However, the best existing method maximizing SRR selects
very sparse and incomplete trace signals compared to the other two methods
we implemented.
51
Table 5.5: Selected trace buffer signals by PageRank
Name Signal
S14 cpu/PC/Dout reg[15:0]
S16 cpu/isdu/state reg[4:0]
S20 mem2io/hex data reg[15:0]
S21 cpu/Registers/R5/Dout reg[15]
S22 cpu/Registers/R4/Dout reg[15]
S23 cpu/MAR/Dout reg[15]
1. The major drawbacks of Basu’s method are incompleteness of selected
registers. Unless a complete 16-bit PC, or 16-bit IR is selected, we have
no idea which instruction is currently under testing. Unless a complete
set of registers is selected, we cannot really test any functionalities. No
matter how many additional trace signals we select, unless a complete
set of signals is selected, functionality coverage is always 0 as shown in
Fig 5.6.
2. Another major drawback is the diluted degree of control signals. As
discussed at the beginning of Chapter 4, the most important signals,
isdu state signals, denoting which stage LC-3 is in, are severely diluted
since they are connected to many other combinational logic elements. If
these isdu state signals are not selected, 43 functionalities (assertions)
are not covered since the most important control state transition infor-
mation is not tested (Table 5.6).
3. Although the HITS algorithm does not select the best set of trace
signals in terms of functionality coverage, our HITS algorithm would
become especially useful when dynamic trace buffer signals selection
is enabled since we can categorize sequential elements into hub and
authority. If we detect errors when we are observing authority nodes
as trace buffer signals in post-silicon debugging, we can dynamically
change trace signals into hub nodes to analyze or even localize the
cause of the bugs.
4. The covered assertion distributions of each method are demonstrated
in Fig 5.3, 5.4 and 5.5. For each pie chart, every sector represents an
assertion. Assertion A1 in Table 5.2 is at the 12o’ clock direction of the
chart. Then rest of the assertions, A2-A79, are located clockwise in the
52
Table 5.6: Functionality coverage comparison between each method.
Signals: selected trace buffer signals. Assertions: assertions covered by each
set of selected signals. Refer to Table 5.2, 5.3, 5.4 and 5.5 for the number of
Signals and Assertions.
Basu HITS PageRank
Signals Assertions Signals Assertions Signals Assertions
S1
S16
A3,A5,A7-A22
S16
A3,A5,A7-A22
S2 A25,A28,A30 A25,A28,A30
S3 A32,A35,A37 A32,A35,A37
S4 A39,A41,A43 A39,A41,A43
S5 A45,A47,A49 A45,A47,A49
S6 A51,A53,A55 A51,A53,A55
S7 A57,A59,A61 A57,A59,A61
S8 A63,A65,A67 A63,A65,A67
S9 A68,A70,A72 A68,A70,A72
S10 A77,A79 A77,A79
S11
S14,S16
A2,A69,A76
S14,S16
A2,A69,A76
S12 A74,A75 A74,A75
S13
S14-S16 S14,S16,S20
A36
A1,A4,A31
A36 A33,A40,A42
A54,A56,A58
A60,A62,A66
S17-19 S21-S23
chart. If an assertion is covered, its color is red; otherwise, its color is
grey. As shown in these three graphs, the assertions covered by HITS
and PageRank methods are evenly distributed among all stages of LC3
control unit finite state machine; that is to say, besides high function-
ality coverage, our methods also guarantee even coverage distribution,
which is another very important criterion for debugging.
Table 5.7: Functionality coverage of LC3
Method Assertions Covered % of Functionalities Covered
Basu 0 0%
HITS 49 62.0 %
PageRank 62 78.5%
53
Figure 5.3: Assertion distribution of Basu’s method. If a sector is grey, that
assertion is not covered. If a sector is red, that assertion is covered. Base’s
method covers no assertions.
5.2.2 Cache Coherence Controller
In a shared memory multiprocessor system with a separate cache memory for
each processor, it is possible to have many copies of any instruction operand:
one copy in the main memory and one in each cache memory. When one
copy of an operand is changed, the other copies of the operand must be
changed also. Cache coherence is the discipline that ensures that changes
in the values of shared operands are propagated throughout the system in a
timely fashion. If this coherency is not guaranteed, data inconsistency will
jeopardize the functionality of multi-core system.
Many cache coherence protocols have been proposed to synchronize the
cache among different processors; the cache coherence controller used in this
experiment is based on the MESI protocol. MESI protocol is used when a
system has multiple CPUs, each of which has corresponding local caches;
the cache policy applied in this protocol is write-back. In this protocol, each
cache has four distinct states: Modified, Exclusive, Shared and Invalid. State
Modified means the cache line is only owned by current cache, and it is dirty
(modified). That is to say, current data is only local to current cache; it is
54
Figure 5.4: Assertion distribution of HITS method. If a sector is grey, that
assertion is not covered. If a sector is red, that assertion is covered. HITS
method covers evenly distributed assertions among all stages of LC3 control
unit finite state machine; this guarantees all functionalities are tested in
post-silicon validation.
different from what is stored in main memory. State Exclusive means the
cache line is only owned by current cache, but it is not dirty. That is to say,
current data is local to current cache; however, it is the same as the data
stored in main memory. State Shared means not only the current cache, but
also other caches, own the cache line and it is not dirty. That is to say,
whenever the data is modified, other caches sharing this data would get this
updated data. State Invalid means the cache line is invalid and does not
contain valid data.
The source code tested in this project is a four-core system, where each
core has ten local caches, each of which stores 32-bit long data. The legacy
structure of the system that includes the memory matrix, the arbitration,
and the memory, remains unchanged. A coherence system contains, in addi-
tion to the legacy components, the MESI ISC, coherency ports of the masters
and the coherency bus. MESI ISC has two connection types. In one direc-
tion it connects to the main bus as a slave and receives the bus’s controls
and addresses. In the other direction, it connects the coherency bus as a
55
Figure 5.5: Assertion distribution of PageRank method. If a sector is grey,
that assertion is not covered. If a sector is red, that assertion is covered.
PageRank method covers evenly distributed assertions among all stages of
LC3 control unit finite state machine; this guarantees all functionalities are
tested in post-silicon validation.
master. MESI ISC has two ports for each system master: a main bus port
and a coherency bus port. The MESI ISC receives the broadcast request
from the system masters through the main bus. It sends the write snoop,
read snoop, write-enable, and read-enable to the system masters through the
coherency bus. MESI ISC separates each broadcast request that a master
sends (initiator) to a separate snoop request for each master, except for the
initiator. Complete information regarding the implementation of MESI ISC
can be downloaded from opencores.com; details about the implementation
will not be discussed further in this thesis.
Not all write and read actions will trigger cache coherency interactions;
only three scenarios will trigger coherency interactions between CPUs. The
first is a read miss: when current cache state is I, if the system intends to
read data from this cache, since no valid data is stored in current cache,
current CPU has to read the data from the memory and update its status
to S. The second one is write miss: when current cache state is I, if the
system intends to write data to current cache, current CPU first has to
read the data from this address from the main memory and write it to its
56
Figure 5.6: Functionality coverage of three methods: Basu, HITS and
PageRank
local cache. Correspondingly, its cache status should be updated to E; then,
current CPU should write data into its local cache and update its status to
M. The third one is write in shared cache: when current cache state is S,
if current CPU wants to write a data into local cache, current CPU should
send write broadcast signal to all other CPUs, then update its cache status
to E; eventually, when all other CPUs send back acknowledge signals, current
CPU can resume the write action and write its local cache. Correspondingly,
its cache status will be updated to M. These three scenarios are labeled in
Fig 5.9.
Detailed interactions in the system for the above three scenarios are tabu-
lated in Tables 5.8, 5.9 and 5.10 for clarity. The scenarios are write to shared
cache line, write miss and read miss. To test the functionality, we wrote
42 assertions monitoring the transactions described in the aforementioned
tables. These aforementioned assertions are not listed in this thesis; please
refer to the appendix of [79].
Overall, there are 748 registers and 1194 combinational elements in this
MESI ISC. Since the address width is 32 bit, we have to select at least 32
bits to get meaningful debugging information. We again select around 20%
registers to be trace buffer signals, which amounts to 140 registers. The
selected trace signals of the existing method proposed by Basu [20] and the
57
Table 5.8: Write to shared cache line
Stage Source Destination Bus Operation
1 Initiator
Coherency
Main Send write broadcast
Controller
2
Coherency
Initiator Main
Acknowledge write
Controller broadcast request
3
Coherency
Snooper Coherency Write Snoop
Controller
4 Snooper Internal
Invalidate valid line
Cache state: S → I
5 Snooper
Coherency
Coherency
Acknowledge write
Controller snoop
6
Coherency
Initiator Coherency Enable write
Controller
7 Snooper Internal
Write to cache
Cache state: S → M
Table 5.9: Write miss
Stage Source Destination Bus Operation
1 Initiator
Coherency
Main Send write broadcast
Controller
2
Coherency
Initiator Main
Acknowledge write
Controller broadcast request
3
Coherency
Snooper Coherency Write Snoop
Controller
4
M:Evict a dirty line
Snooper Internal E/S → I
I: do nothing
5
Write back line to
Snooper Memory Main memory
Cache state: M → I
6 Snooper
Coherency
Coherency
Acknowledge write
Controller snoop
7
Coherency
Initiator Coherency Enable write
Controller
8 Initiator Memory Main Read line
9 Initiator Internal Cache state: I → E
10 Initiator Internal
Write to cache
Cache state: E → M
58
Figure 5.7: Restoration ratio vs. functionality coverage of three methods:
Basu, HITS and PageRank. SRR: the highest SRR is set to 100%, the rest
of the SRRs are normalized with respect to the highest SRR. Functionality
coverage: percentage of covered assertions.
PageRank algorithm proposed in this thesis are presented in Tables 5.11
and 5.12. We are only comparing these two methods since we mentioned in
previous chapters that the HITS algorithm with restoration ratio is helpful
for dynamic trace buffer. In terms of functionality coverage, its performance
is not as good as the pure PageRank algorithm since it still takes restoration
ratio into consideration.
Functionality coverage of each method is summarized as in Tables 5.13 and
5.14 and both restoration ratio and functionality coverage are compared and
plotted in Fig 5.12. From our experimental results, we can conclude that our
PageRank algorithm without restoration probability selects the best signals
in terms of functionality coverage but not restoration ratio; however, the best
existing method maximizing SRR selected best signals in terms of restoration
ratio but not functionality coverage, with very sparse and incomplete trace
signals compared to the PageRank method we implemented.
1. The major drawbacks of Basu’s method, which are incompleteness of
selected registers and diluted degree of control signals, are again re-
flected in this benchmark.
• mesi isc broad/broad fifo/data o reg[40 : 0], which is the com-
bined signal of broad snoop address, broad snoop type,
59
Figure 5.8: Finite state machine of MESI protocol
broad snoop cpu id and broad snoop id, indicating the status of
coherency bus, is the most important signal in this mesi isc.
Without selecting these signals, we neither have complete infor-
mation about the coherency bus nor the status and actions of all
CPUs.
• Based on the current cache status of 4 CPUs,
mesi isc broad/broad fifo/data o reg[40 : 0] will decide the sta-
tus of their cache line in the next cycle. Since this signal is com-
puted through many combinational logic gates, restoration prob-
abilities of the edges connected to
mesi isc broad/broad fifo/data o reg[40 : 0] are very low. There-
fore, mesi isc broad/broad fifo/data o reg[40 : 0] has very low
priority in Basu’s method and only a few bits got selected.
• This mesi isc is implemented using a basic fifo structure. Within
each fifo, there are four fifo entries (registers) storing the data and
shifting the data out to be the fifo output. Since this is not a stan-
dard fifo, fifo entryn will become fifo entryn−1 at every clock
cycle. Therefore, the more fifo entries we select, the more redun-
dant information we have. In addition, fifos are usually standard
60
Figure 5.9: Scenarios trigger cache coherence interactions among processors
IPs in industry; therefore, their reliabilities as design modules are
very high since they have either been used for many years, or they
have been formally verified. Hence, selecting different entries in-
side a fifo is practically useless; the only useful signals of a fifo
should be flag signals such as fifo full or fifo empty.
2. Based on our observation, restoration ratio and functionality coverage
have no direct relationship; they are neither positively nor negatively
correlated. Instead of maximizing the restoration ratio, which is an
irrelevant metric, we should get rid of it and try to select trace signals
covering more functionalities to help debugging.
3. The covered assertion distributions of each method are demonstrated
in Fig 5.10 and 5.11. For each pie chart, every sector represents an
assertion. Assertion A1 in [79] is at the 12o’ clock direction of the
chart. Then rest of the assertions, A2-A42, are located clockwise in
the chart. If an assertion is covered, its color is red; otherwise, its color
is grey. As shown in these two graphs, the assertions covered by the
PageRank method are evenly distributed among all cache coherence
transactions; that is to say, besides high functionality coverage, our
methods also guarantee even coverage distribution, which is another
61
Table 5.10: Read miss
Stage Source Destination Bus Operation
1 Initiator
Coherency
Main Send read broadcast
Controller
2
Coherency
Initiator Main
Acknowledge read
Controller broadcast request
3
Coherency
Snooper Coherency Read Snoop
Controller
4
M:Write back dirty line
Snooper Internal E → S
S/I: do nothing
5
Write back line to
Snooper Main memory
Cache state: M → S
6 Snooper
Coherency
Coherency
Acknowledge read
Controller snoop
7
Coherency
Initiator Coherency Enable read
Controller
8 Initiator Memory Main Read line
9 Initiator Cache state: I → S
very important criterion for debugging.
62
Table 5.11: Selected trace buffer signals by Basu
Name Signal
S1 mesi isc breq fifos/fifo 0/ptr wr reg[0]
S2 mesi isc broad/broad fifo/ptr wr reg[0]
S3 mesi isc breq fifos/fifo 0/entry reg[0][17,33,35,36,37,40]
S4 mesi isc breq fifos/fifo 0/entry reg[1][2,8,13,16,19,27,33,35,36,37,40]
S5 mesi isc breq fifos/fifo 0/entry reg[3][36]
S6 mesi isc breq fifos/fifo 1/entry reg[0][33,35,36,37,40]
S7 mesi isc breq fifos/fifo 1/entry reg[1][2,33,35,36,37,40]
S8 mesi isc breq fifos/fifo 2/entry reg[0][33,35,36,37,40]
S9 mesi isc breq fifos/fifo 2/entry reg[1][33,35,36,37,40]
S10 mesi isc breq fifos/fifo 3/entry reg[0][2,33,35,36,37,40]
S11 mesi isc breq fifos/fifo 3/entry reg[1][33,35,36,37,40]
S12 mesi isc breq fifos/fifo 0/data o reg[9,10,12,16,23,27,33,35,36,37,40]
S13 mesi isc breq fifos/fifo 1/data o reg[9,10,12,16,23,27,33,36,37,40]
S14 mesi isc breq fifos/fifo 2/data o reg[9,10,12,16,23,27,33,35,36,37,40]
S15 mesi isc breq fifos/fifo 3/data o reg[9,10,12,16,23,27,33,35,36,37,40]
S16 mesi isc breq fifos/mesi isc breq fifo cntl/breq id base reg[0]
S17 mesi isc breq fifos/mesi isc breq fifo cntl/breq type array o reg[1]
S18 mesi isc broad/broad fifo/entry reg[3][9,10,12,16,23,33,37]
S19 mesi isc broad/broad fifo/entry reg[2][9,10,12,16,23,33,36,37]
S20 mesi isc broad/broad fifo/entry reg[1][9,10,12,16,23,27,33,35,36,37,40]
S21 mesi isc broad/broad fifo/entry reg[0][9,10,12,16,23,35,36,37,40]
S22 mesi isc broad/broad fifo/data o reg[0,9,10,16,36,37]
63
Table 5.12: Selected trace buffer signals by PageRank method
Name Signal
S23 mesi isc broad/broad fifo/data o reg[0:40]
S24 mesi isc broad/mesi isc broad cntl/broad fifo rd o reg
S25 mesi isc breq fifos/mesi isc breq fifo cntl/mbus ack array reg[0:3]
S26 mesi isc breq fifos/fifo 0/status full reg
S27 mesi isc breq fifos/fifo 1/status full reg
S28 mesi isc breq fifos/fifo 2/status full reg
S29 mesi isc breq fifos/fifo 3/status full reg
S30 mesi isc breq fifos/fifo 0/status empty reg
S31 mesi isc breq fifos/fifo 1/status empty reg
S32 mesi isc breq fifos/fifo 2/status empty reg
S33 mesi isc breq fifos/fifo 3/status empty reg
S34 mesi isc broad/mesi isc broad cntl/cbus active broad array reg[0:3]
S35 mesi isc breq fifos/fifo 0/data o reg[31:39]
S36 mesi isc breq fifos/fifo 1/data o reg[31:39]
S37 mesi isc breq fifos/fifo 2/data o reg[31:39]
S38 mesi isc breq fifos/fifo 3/data o reg[31:39]
S39 mesi isc broad/broad fifo/entry reg[0][31:40]
S40 mesi isc broad/broad fifo/entry reg[1][31:40]
S41 mesi isc broad/broad fifo/entry reg[2][31:40]
S42 mesi isc broad/broad fifo/entry reg[3][31:40]
S43 mesi isc broad/broad fifo/status full reg
S44 mesi isc broad/broad fifo/status empty reg
64
Table 5.13: Functionality coverage comparison between Basu’s method and
PageRank method. Signals: selected trace buffer signals. Assertions:
assertions covered by each set of selected signals. Refer to [79], table 5.11
and table 5.12 for the number of Signals and Assertions.
Basu PageRank
Signals Assertions Signals Assertions
S1
S23,S25,S34
A3,A4,A6
S2 A7,A8,A9
S3 A13,A14,A17
S4 A18,A19,A20
S5 A26,A30,A31
S6 24
S7 26 A33
S8 27 A34
S9 28 A35
S10 29 A36
S11 30 A37
S12 31 A38
S13 32 A39
S14 33 A40
S15 35
S16 36
S17 37
S18 38
S19 39
S20 40
S21 41
S22 42
43 A41
44 A42
Table 5.14: Functionality coverage of MESI ISC
Method Assertions Covered % of Functionalities Covered
Basu 0 0%
PageRank 25 59%
65
Figure 5.10: Assertion distribution of Basu’s method. If a sector is grey,
that assertion is not covered. If a sector is red, that assertion is covered.
Basu’s method covers no assertions.
Figure 5.11: Assertion distribution of PageRank method. If a sector is grey,
that assertion is not covered. If a sector is red, that assertion is covered.
PageRank method covers evenly distributed assertions; this guarantees all
functionalities are tested in post-silicon validation.
66
Figure 5.12: Restoration ratio vs. functionality coverage of two methods:
Basu and PageRank. SRR: the highest SRR is set to 100%; the rest of the
SRRs are normalized with respect to the highest SRR. Functionality
coverage: percentage of covered assertions.
67
CHAPTER 6
SUMMARY AND FUTURE WORK
6.1 Conclusion
In summary, we have formulated a classic trace signal selection problem in
post-silicon validation into a classic data-mining problem using both HITS
algorithm and PageRank algorithm. To our best knowledge, ever since the
signal restoration concept was introduced in 2009 [15], no one has ever tried
to solve the trace signal selection problem using a data-mining approach as we
have done. Furthermore, all of the previous works have focused on increasing
the visibility, and none of them has explained how those selected trace signals
are used during debugging. The current best work was proposed in 2011 by
Basu et al. [20]; our method shows obvious potential to outperform theirs in
terms of the quality of selected trace signals for debugging. The contributions
of the two proposed methods in this thesis are summarized as follows.
1. Represents design under test as a network (di-graph).
2. Formulates trace signal selection problem as a data-mining problem
(rank sequential component with respect to their importance).
3. Points out the intrinsic limitations of restoration probability and restora-
tion probability based methods.
4. Proposes a new metric, functionality coverage, to evaluate the quality
of selected trace signals.
5. Selects trace signals with much better quality in terms of functionality
coverage, which helps engineers with debugging in post-silicon valida-
tion.
6. With dynamic trace buffer infrastructure, enables the possibility to
dynamically select signals with different characteristics.
68
6.2 Future Work
For future work, we need to optimize our algorithm to improve the runtime
performance during signal selection for large-scale circuits. The processing of
a full-size industrial circuit takes several hours with current algorithm imple-
mentation, which leaves plenty of room for improvement. In addition, we will
further investigate the weight assignment for hub and authority scores and
conduct more experiments to determine the most effective weight assignment
as well as other intelligent manipulation to break the clustering effect as well
as select different sets of signals for dynamic trace signals selection. Last but
not least, we might modify the initial weight of each node for the PageRank
algorithm to address the interest of design and verification engineers. In a
real-life scenario, it is reasonable to expect that a designer has determined
some important signals that must be traced. We can propose a constrained
signal selection problem where a set of trace signals are already provided by
the designer and the remaining signals have to be determined to improve
overall debugging performance.
69
REFERENCES
[1] I. Wagner and V. Bertacco, Post-Silicon and Runtime Verification for
Modern Processors. Springer, 2010.
[2] M. Abramovici, P. Bradley, K. Dwarakanath, P. Levin, G. Memmi,
and D. Miller, “A reconfigurable design-for-debug infrastructure for
SoCs,” in Proceedings of the 43rd annual Design Automation Confer-
ence. ACM, 2006, pp. 7–12.
[3] P. Patra, “On the cusp of a validation wall,” Design & Test of Comput-
ers, IEEE, vol. 24, no. 2, pp. 193–196, 2007.
[4] S. Yerramilli, “Addressing post-silicon validation challenge: Leverage
validation and test synergy,” in Keynote, Intl. Test Conf, 2006.
[5] S. Mitra, S. A. Seshia, and N. Nicolici, “Post-silicon validation opportu-
nities, challenges and recent advances,” in Design Automation Confer-
ence (DAC), 2010 47th ACM/IEEE. IEEE, 2010, pp. 12–17.
[6] D. Lin and S. Mitra, “Qed post-silicon validation and debug: Frequently
asked questions.” in ASP-DAC, 2014, pp. 478–482.
[7] X. Liu and Q. Xu, “Interconnection fabric design for tracing signals
in post-silicon validation,” in Proceedings of the 46th Annual Design
Automation Conference. ACM, 2009, pp. 352–357.
[8] I. Wagner and V. Bertacco, “Reversi: Post-silicon validation system for
modern microprocessors,” in Computer Design, 2008. ICCD 2008. IEEE
International Conference on. IEEE, 2008, pp. 307–314.
[9] M. W. Heath, W. P. Burleson, and I. G. Harris, “Synchro-tokens: elimi-
nating nondeterminism to enable chip-level test of globally-asynchronous
locally-synchronous soc’s,” in Proceedings of the Conference on Design,
Automation and Test in Europe-Volume 1. IEEE Computer Society,
2004, p. 10410.
[10] D. Josephson, “The good, the bad, and the ugly of silicon debug,” in
Proceedings of the 43rd Annual Design Automation Conference. ACM,
2006, pp. 3–6.
70
[11] I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin, “System-level
validation of the intel pentium m processor,” Intel Technology Journal,
vol. 7, no. 2, pp. 37–43, 2003.
[12] T. J. Foster, D. L. Lastor, and P. Singh, “First silicon functional valida-
tion and debug of multicore microprocessors,” Very Large Scale Integra-
tion (VLSI) Systems, IEEE Transactions on, vol. 15, no. 5, pp. 495–504,
2007.
[13] P. Lisherness and K.-T. Cheng, “An instrumented observability coverage
method for system validation,” in High Level Design Validation and Test
Workshop, 2009. HLDVT 2009. IEEE International. IEEE, 2009, pp.
88–93.
[14] A. Adir, A. Nahir, A. Ziv, C. Meissner, and J. Schumann, “Reaching
coverage closure in post-silicon validation.” in Haifa Verification Con-
ference. Springer, 2010, pp. 60–75.
[15] H. F. Ko and N. Nicolici, “Algorithms for state restoration and trace-
signal selection for data acquisition in silicon debug,” Computer-Aided
Design of Integrated Circuits and Systems, IEEE Transactions on,
vol. 28, no. 2, pp. 285–297, 2009.
[16] X. Liu and Q. Xu, “Trace signal selection for visibility enhancement
in post-silicon validation,” in Proceedings of the Conference on Design,
Automation and Test in Europe. European Design and Automation
Association, 2009, pp. 1338–1343.
[17] X. Liu and Q. Xu, “On signal selection for visibility enhancement in
trace-based post-silicon validation,” Computer-Aided Design of Inte-
grated Circuits and Systems, IEEE Transactions on, vol. 31, no. 8, pp.
1263–1274, 2012.
[18] H. F. Ko and N. Nicolici, “Automated trace signals selection using the
rtl descriptions,” in Test Conference (ITC), 2010 IEEE International.
IEEE, 2010, pp. 1–10.
[19] K. Basu and P. Mishra, “Rats: restoration-aware trace signal selection
for post-silicon validation,” Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, vol. 21, no. 4, pp. 605–613, 2013.
[20] K. Basu and P. Mishra, “Efficient trace signal selection for post sili-
con validation and debug,” in VLSI Design (VLSI Design), 2011 24th
International Conference on. IEEE, 2011, pp. 352–357.
[21] D. Chatterjee, C. McCarter, and V. Bertacco, “Simulation-based signal
selection for state restoration in silicon debug,” in Computer-Aided De-
sign (ICCAD), 2011 IEEE/ACM International Conference on. IEEE,
2011, pp. 595–601.
71
[22] A. Nahir, A. Ziv, R. Galivanche, A. Hu, M. Abramovici, A. Camilleri,
B. Bentley, H. Foster, V. Bertacco, and S. Kapoor, “Bridging pre-silicon
verification and post-silicon validation,” in Proceedings of the 47th De-
sign Automation Conference. ACM, 2010, pp. 94–95.
[23] H. F. Ko and N. Nicolici, “Combining scan and trace buffers for enhanc-
ing real-time observability in post-silicon debugging.” in European Test
Symposium, 2010, pp. 62–67.
[24] K. Basu, P. Mishra, and P. Patra, “Constrained signal selection for
post-silicon validation,” in 2012 IEEE International High Level Design
Validation and Test Workshop (HLDVT). IEEE, 2012, pp. 71–75.
[25] K. Han, J.-S. Yang, and J. A. Abraham, “Dynamic trace signal selec-
tion for post-silicon validation,” in VLSI Design and 2013 12th Inter-
national Conference on Embedded Systems (VLSID), 2013 26th Inter-
national Conference on. IEEE, 2013, pp. 302–307.
[26] K. Rahmani and P. Mishra, “Efficient signal selection using fine-grained
combination of scan and trace buffers,” in VLSI Design and 2013 12th
International Conference on Embedded Systems (VLSID), 2013 26th In-
ternational Conference on. IEEE, 2013, pp. 308–313.
[27] K. Han, J.-S. Yang, and J. A. Abraham, “Enhanced algorithm of com-
bining trace and scan signals in post-silicon validation,” in VLSI Test
Symposium (VTS), 2013 IEEE 31st. IEEE, 2013, pp. 1–6.
[28] M. Li and A. Davoodi, “A hybrid approach for fast and accurate trace
signal selection for post-silicon debug,” in Proceedings of the Conference
on Design, Automation and Test in Europe. EDA Consortium, 2013,
pp. 485–490.
[29] A. DeOrio, J. Li, and V. Bertacco, “Bridging pre-and post-silicon
debugging with biped,” in Computer-Aided Design (ICCAD), 2012
IEEE/ACM International Conference on. IEEE, 2012, pp. 95–100.
[30] M. Li and A. Davoodi, “Multi-mode trace signal selection for post-silicon
debug.” in ASP-DAC, 2014, pp. 640–645.
[31] S. Prabhakar and M. Hsiao, “Using non-trivial logic implications for
trace buffer-based silicon debug,” in Asian Test Symposium, 2009.
ATS’09. IEEE, 2009, pp. 131–136.
[32] K. Zhao and J. Bian, “Pruning-based trace signal selection algorithm,”
in Proceedings of the 16th Asia and South Pacific Design Automation
Conference. IEEE Press, 2011, pp. 639–644.
72
[33] K. Rahmani, P. Mishra, and S. Ray, “Scalable trace signal selection
using machine learning,” in Computer Design (ICCD), 2013 IEEE 31st
International Conference on. IEEE, 2013, pp. 384–389.
[34] H. Shojaei and A. Davoodi, “Trace signal selection to enhance timing
and logic visibility in post-silicon validation,” in Proceedings of the In-
ternational Conference on Computer-Aided Design. IEEE Press, 2010,
pp. 168–172.
[35] N. Nicolici and H. F. Ko, “Design-for-debug for post-silicon validation:
Can high-level descriptions help?” in High Level Design Validation and
Test Workshop, 2009. HLDVT 2009. IEEE International. IEEE, 2009,
pp. 172–175.
[36] K. Olukotun, M. Heinrich, and D. Ofelt, “Digital system simulation:
methodologies and examples,” in Proceedings of the 35th annual Design
Automation Conference. ACM, 1998, pp. 658–663.
[37] C. R. Ho, M. Theobald, B. Batson, J. Grossman, S. C. Wang,
J. Gagliardo, M. M. Deneroff, R. O. Dror, and D. E. Shaw, “Post-silicon
debug using formal verification waypoints,” in Design and Verification
Conf, 2009.
[38] T. Hong, Y. Li, S.-B. Park, D. Mui, D. Lin, Z. A. Kaleq, N. Hakim,
H. Naeimi, D. S. Gardner, and S. Mitra, “QED: Quick error detection
tests for effective post-silicon validation,” in Test Conference (ITC),
2010 IEEE International. IEEE, 2010, pp. 1–10.
[39] D. Lin, T. Hong, Y. Li, F. Fallah, D. S. Gardner, N. Hakim, and S. Mi-
tra, “Overcoming post-silicon validation challenges through quick error
detection (QED),” in Design, Automation & Test in Europe Conference
& Exhibition (DATE), 2013. IEEE, 2013, pp. 320–325.
[40] D. Lin, T. Hong, Y. Li, S. Kumar, F. Fallah, N. Hakim, D. Gardner,
S. Mitra et al., “Effective post-silicon validation of system-on-chips using
quick error detection,” Computer-Aided Design of Integrated Circuits
and Systems, IEEE Transactions on, vol. 33, no. 10, pp. 1573–1590,
2014.
[41] D. Lin, T. Hong, F. Fallah, N. Hakim, and S. Mitra, “Quick detection of
difficult bugs for effective post-silicon validation,” in Design Automation
Conference (DAC), 2012 49th ACM/EDAC/IEEE. IEEE, 2012, pp.
561–566.
[42] S.-B. Park and S. Mitra, “IFRA: instruction footprint recording and
analysis for post-silicon bug localization in processors,” in Design Au-
tomation Conference, 2008. DAC 2008. 45th ACM/IEEE. IEEE, 2008,
pp. 373–378.
73
[43] S.-B. Park, T. Hong, and S. Mitra, “Post-silicon bug localization in
processors using instruction footprint recording and analysis (IFRA),”
Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on, vol. 28, no. 10, pp. 1545–1558, 2009.
[44] S.-B. Park and S. Mitra, “Ifra: Post-silicon bug localization in pro-
cessors,” in High Level Design Validation and Test Workshop, 2009.
HLDVT 2009. IEEE International. IEEE, 2009, pp. 154–159.
[45] S.-B. Park, A. Bracy, H. Wang, and S. Mitra, “Blog: Post-silicon bug
localization in processors using bug localization graphs,” in Proceedings
of the 47th Design Automation Conference. ACM, 2010, pp. 368–373.
[46] K.-h. Chang, I. L. Markov, and V. Bertacco, “Automating post-silicon
debugging and repair,” in Proceedings of the 2007 IEEE/ACM inter-
national conference on Computer-aided design. IEEE Press, 2007, pp.
91–98.
[47] A. Krstic, L.-C. Wang, K.-T. Cheng, T. Mak et al., “Diagnosis-based
post-silicon timing validation using statistical tools and methodologies.”
in ITC. Citeseer, 2003, pp. 339–348.
[48] F. M. De Paula, A. J. Hu, and A. Nahir, “NuTAB-backspace: rewriting
to normalize non-determinism in post-silicon debug traces,” in Computer
Aided Verification. Springer, 2012, pp. 513–531.
[49] F. M. De Paula, M. Gort, A. J. Hu, S. J. Wilton, and J. Yang,
“Backspace: formal analysis for post-silicon debug,” in Proceedings of
the 2008 International Conference on Formal Methods in Computer-
Aided Design. IEEE Press, 2008, p. 5.
[50] M. Gort, F. M. De Paula, J. J. Kuan, T. M. Aamodt, A. J. Hu,
S. J. Wilton, and J. Yang, “Formal-analysis-based trace computation
for post-silicon debug,” Very Large Scale Integration (VLSI) Systems,
IEEE Transactions on, vol. 20, no. 11, pp. 1997–2010, 2012.
[51] M. Bushnell and V. D. Agrawal, Essentials of electronic testing for dig-
ital, memory and mixed-signal VLSI circuits. Springer, 2000, vol. 17.
[52] X. Liu and Q. Xu, “On reusing test access mechanisms for debug data
transfer in soc post-silicon validation,” in Asian Test Symposium, 2008.
ATS’08. 17th. IEEE, 2008, pp. 303–308.
[53] D. Josephson and B. Gottlieb, “The crazy mixed up world of silicon
debug,” in Custom integrated circuits conference, 2004, pp. 665–670.
[54] H. F. Ko and N. Nicolici, “Functional scan chain design at rtl for skewed-
load delay fault testing,” in Asian test symposium, 2004, pp. 454–459.
74
[55] G.-J. van Rootselaar and B. Vermeulen, “Silicon debug: scan chains
alone are not enough,” in Test Conference, 1999. Proceedings. Interna-
tional. IEEE, 1999, pp. 892–902.
[56] X. Liu and Q. Xu, “On multiplexed signal tracing for post-silicon val-
idation,” Computer-Aided Design of Integrated Circuits and Systems,
IEEE Transactions on, vol. 32, no. 5, pp. 748–759, 2013.
[57] E. Anis and N. Nicolici, “On using lossless compression of debug data
in embedded logic analysis,” in Test Conference, 2007. ITC 2007. IEEE
International. IEEE, 2007, pp. 1–10.
[58] G. Miller, B. Bhattarai, Y.-C. Hsu, J. Dutt, X. Chen, and G. Bakewell,
“A method to leverage pre-silicon collateral and analysis for post-silicon
testing and validation,” in Proceedings of the 48th Design Automation
Conference. ACM, 2011, pp. 575–578.
[59] A. Adir, S. Copty, S. Landa, A. Nahir, G. Shurek, A. Ziv, C. Meissner,
and J. Schumann, “A unified methodology for pre-silicon verification
and post-silicon validation,” in Design, Automation & Test in Europe
Conference & Exhibition (DATE), 2011. IEEE, 2011, pp. 1–6.
[60] A. Adir, A. Nahir, G. Shurek, A. Ziv, C. Meissner, and J. Schumann,
“Leveraging pre-silicon verification resources for the post-silicon valida-
tion of the ibm power7 processor,” in Design Automation Conference
(DAC), 2011 48th ACM/EDAC/IEEE. IEEE, 2011, pp. 569–574.
[61] A. DeOrio, A. Bauserman, and V. Bertacco, “Post-silicon verification
for cache coherence,” in Computer Design, 2008. ICCD 2008. IEEE
International Conference on. IEEE, 2008, pp. 348–355.
[62] A. Chatterjee, S. Deyati, B. Muldrey, S. Devarakond, and A. Banerjee,
“Validation signature testing: a methodology for post-silicon valida-
tion of analog/mixed-signal circuits,” in Proceedings of the International
Conference on Computer-Aided Design. ACM, 2012, pp. 553–556.
[63] A. DeOrio, D. S. Khudia, and V. Bertacco, “Post-silicon bug diagnosis
with inconsistent executions,” in Proceedings of the International Con-
ference on Computer-Aided Design. IEEE Press, 2011, pp. 755–761.
[64] M. Abramovici, “In-system silicon validation using a reconfigurable
platform,” in High Level Design Validation and Test Workshop, 2008.
HLDVT’08. IEEE International. IEEE, 2008, pp. 73–73.
[65] M. Abramovici, “In-system silicon validation and debug,” Design & Test
of Computers, IEEE, vol. 25, no. 3, pp. 216–223, 2008.
75
[66] N. Grover and R. Wason, “Comparative analysis of pagerank and hits
algorithms,” in International Journal of Engineering Research and Tech-
nology, vol. 1, no. 8 (October-2012). ESRSA Publications, 2012.
[67] H. Deng, M. R. Lyu, and I. King, “A generalized co-hits algorithm
and its application to bipartite graphs,” in Proceedings of the 15th
ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM, 2009, pp. 239–248.
[68] C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon, “Pagerank, hits
and a unified framework for link analysis,” in Proceedings of the 25th
annual international ACM SIGIR conference on Research and develop-
ment in information retrieval. ACM, 2002, pp. 353–354.
[69] M. Franceschet, “Pagerank: Standing on the shoulders of giants,” Com-
munications of the ACM, vol. 54, no. 6, pp. 92–101, 2011.
[70] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation
ranking: Bringing order to the web,” Stanford University, Technical
Report, 1998.
[71] K. Basu, P. Mishra, P. Patra, A. Nahir, and A. Adir, “Dynamic selec-
tion of trace signals for post-silicon debug,” in Microprocessor Test and
Verification (MTV), 2013 14th International Workshop on, Dec 2013,
pp. 62–67.
[72] S.-Y. Chen, M.-Y. Hsiao, W.-B. Jone, and T.-F. Chen, “A configurable
bus-tracer for error reproduction in post-silicon validation,” in VLSI
Design, Automation, and Test (VLSI-DAT), 2013 International Sympo-
sium on. IEEE, 2013, pp. 1–4.
[73] M. Boule, J.-S. Chenard, and Z. Zilic, “Assertion checkers in verification,
silicon debug and in-field diagnosis,” in Quality Electronic Design, 2007.
ISQED’07. 8th International Symposium on. IEEE, 2007, pp. 613–620.
[74] A. Gupta, “Assertion-based verification turns the corner,” IEEE Design
& Test of Computers, vol. 19, no. 4, pp. 131–132, 2002.
[75] T. Haveliwala, “Efficient computation of pagerank,” Stanford Univer-
sity, Technical Report, 1999.
[76] L. Yan, Y. Wei, Z. Gui, and Y. Chen, “Research on pagerank and
hyperlink-induced topic search in web structure mining,” in Internet
Technology and Applications (iTAP), 2011 International Conference on.
IEEE, 2011, pp. 1–4.
[77] T. Haveliwala, S. Kamvar, and G. Jeh, “An analytical comparison of
approaches to personalizing pagerank,” Stanford University, Technical
Report, 2003.
76
[78] B. Frikh, B. Ouhbi, and A. Ameur, “A comparative study of link analysis
algorithms for information retrieval,” in Next Generation Networks and
Services (NGNS), 2012. IEEE, 2012, pp. 54–58.
[79] S. Ma, “Cache coherence verification of a MESI intersection controller,”
2013, unpublished technical report of a class project.
77
