Formal Reliability Analysis Using Theorem Proving by Hasan, Osman et al.
IEEE TRANSACTIONS ON COMPUTERS 1
Formal Reliability Analysis using Theorem
Proving
Osman Hasan, Member IEEE Sofiene Tahar, Senior Member, IEEE and Naeem Abbasi, Student Member
IEEE
Abstract—Reliability analysis has become a tool of fundamental importance to virtually all electrical and computer engineers because
of the extensive usage of hardware systems in safety and mission critical domains, such as medicine, military and transportation. Due
to the strong relationship between reliability theory and probabilistic notions, computer simulation techniques have been traditionally
used to perform reliability analysis. However, simulation provides less accurate results and cannot handle large-scale systems due to
its enormous CPU time requirements. To ensure accurate and complete reliability analysis and thus more reliable hardware designs,
we propose to conduct a formal reliability analysis of systems within the sound core of a higher-order-logic theorem prover (HOL). In
this paper, we present the higher-order-logic formalization of some fundamental reliability theory concepts, which can be built upon to
precisely analyze the reliability of various engineering systems. The proposed approach and formalization is then utilized to analyze
the repairability conditions for a reconfigurable memory array in the presence of stuck-at and coupling faults.
Index Terms—Formal models, Performance and Reliability, Theorem proving, Memory Structures.
F
1 INTRODUCTION
Reliability analysis involves the application of various
mathematical techniques for the prediction of reliability
related parameters, such as a system’s resistance to fail-
ure and its ability to perform a required function under
some given conditions. Reliability analysis relies heavily
on the concepts of probability and statistics due to the
enormous amount of random or unpredictable compo-
nents associated with the reliability parameters of a sys-
tem. Examples include failures due to fabrication faults
and electromigration phenomena in System-on-Chips
(SoCs). Moreover, these systems act upon and within
complex environments that themselves have certain el-
ements of unpredictability, such as corrosion, vibration
and temperature variations. Due to these random com-
ponents, establishing the reliability of a system under all
circumstances usually becomes impractically expensive.
The engineering approach to analyze the reliability of a
system with these kinds of unavoidable elements of ran-
domness and uncertainty is to use probabilistic analysis.
The main idea is to mathematically model the random
and unpredictable elements of the given system and
its environment by appropriate random variables. The
probabilistic and statistical properties of these random
variables are then used to judge the system’s reliability
regarding parameters of interest.
Today, simulation is the most commonly used com-
puter based reliability analysis technique. Most simula-
tion softwares provide a programming environment for
defining functions that approximate random variables
for probability distributions. The random elements in
∙ The authors are with the Department of Electrical and Computer Engi-
neering, Concordia University, Montral, QC H3G 1M8, Canada (e-mail:
{o hasan,tahar,n ab}@ece.concordia.ca).
a given system are modeled by these functions and
the system is analyzed using computer simulation tech-
niques [1], such as the Monte Carlo Method [2], where
the main idea is to approximately answer a query on a
probability distribution by analyzing a large number of
samples. Statistical quantities, such as average and vari-
ance, may then be calculated, based on the data collected
during the sampling process, using their mathematical
relations in a computer. Due to the inherent nature of
simulation, the reliability analysis results attained by this
technique can never be termed as 100% accurate. More-
over, simulation requires an enormous amount of CPU
time for attaining meaningful estimates. We generally
need to acquire hundreds of thousands of samples to
estimate the desired probabilistic quantities and this fact
makes the simulation approach impractical when each
sample acquisition step involves extensive computations
The accuracy of hardware system reliability analysis
results has become imperative these days because of
the extensive usage of these systems in safety critical
areas. One of the unfortunate incidents, related to the
inaccurate reliability analysis of systems, is the loss
of the Mars Polar Lander [3] in December 1999. The
Mars Polar Lander; a $165 million NASA spacecraft
launched to survey Martian conditions, is believed to
be lost mainly because of its engine shutdown while it
was still 40 meters above the Mars surface. The engine
shutdown happened due to the vibrations caused by the
deployment of the lander’s legs, i.e., a random behavior,
that gave false indication that the spacecraft had landed.
In order to avoid incidents like this, simulation should
not be relied upon for the reliability analysis of systems
that are supposed to be used in safety-critical domains.
Formal methods [4] are capable of conducting precise
system analysis and thus overcome the above mentioned
IEEE TRANSACTIONS ON COMPUTERS 2
limitations of simulation. The main principle behind for-
mal analysis of a system is to construct a computer based
mathematical model of the given system and formally
verify, within a computer, that this model meets rigorous
specifications of intended behavior. Two of the most
commonly used formal verification methods are model
checking [5] and higher-order-logic theorem proving [6].
Model checking is an automatic verification approach for
systems that can be expressed as a finite-state machine.
Higher-order-logic theorem proving, on the other hand,
is an interactive approach but is more flexible in terms
of tackling a variety of systems.
Both model checking and theorem proving have been
successfully used for the precise functional correctness of
a broad range of hardware systems. On the other hand,
due to the strong relationship between reliability theory
and randomness, their usage for reliability analysis has
been somewhat limited. The major limitations being
the restricted system expressibility and the inability to
precisely reason about statistical properties, such as vari-
ance and tail distribution bounds, in the case of model
checking and the fear of huge proof efforts involved in
reasoning about random components of a system in the
case of theorem proving.
We believe that the recent developments in the formal-
ization of probability theory concepts in higher-order-
logic [7], [8] can be extended upon to conduct reliability
analysis in a higher-order-logic theorem prover. The
main objective of this paper is to minimize the interac-
tive proof efforts for conducting reliability analysis in a
theorem prover by presenting the higher-order-logic for-
malization of some core reliability theory concepts. More
specifically, we present a formal definition of reliability,
which can be used to formally express the reliability
characteristic of a system or component in higher-order-
logic, and the verification of Markov and Chebyshev’s
inequalities [9], which play a vital role in the formal esti-
mation of failure probabilities in the reliability analysis of
systems. These results can be built upon to reason about
the reliability characteristics of a system in a higher-
order-logic theorem prover and thus tend to minimize
the associated modeling and verification efforts.
The utilization and effectiveness of the proposed ap-
proach for handling real-world reliability analysis prob-
lems, is demonstrated through the reliability analysis
of reconfigurable memory arrays in the presence of
stuck-at and coupling faults [10], which are two of the
most commonly found faults in memory arrays. Stuck-
at faults occur when a memory cell never changes its
state, i.e., it is always stuck in one state. Whereas, a
coupling fault occurs when a write operation in one
cell changes the contents of another cell in the memory
array. In order to ensure reliable operation of memory
arrays, some redundancy is usually added to memory
arrays during the design phase. This way even after
fabrication, we can repair the memory faults by replacing
the rows or columns of the memory array containing
faulty memory cells with the available spare rows or
columns. Memories fabricated with these spare rows
and columns are usually termed as reconfigurable memory
arrays. This technique poses an interesting solution to the
memory faults problem but comes with a bigger design
challenge of estimating the right number of spare rows
and columns for meeting reliability specifications. If a
combination of spare rows and columns exists such that
all faults from the memory array can be eliminated then
such a combination of spare rows and columns is called
a repair solution, and the array is called repairable. The
repairability problem of a reconfigurable memory array
is similar to the vertex cover problem of the bipartite
graph and is known to be an NP complete problem
[11]. Thus, probabilistic techniques are usually utilized to
obtain reasonable solutions. In this paper, the proposed
reliability analysis approach is used for attaining precise
solutions to the above mentioned repairability problem.
The state-of-the-art reliability analysis approach for
the repairability problem of reconfigurable memory ar-
rays is simulation, which usually fails to produce precise
results due to its inherent limitations and the large ca-
pacities of memory arrays. Since reconfigurable memory
arrays are an integral component of essentially all SoC
designs these days and hence are quite frequently used
in safety critical areas, the inaccuracies and inadequacies
of simulation in this domain may even result in the loss
of human lives. Therefore, the precise solutions obtained
for the repairability problem, in this paper, not only
indicate the practical usefulness of our approach but also
address the above mentioned safety problem.
The work described in this paper is done using the
HOL theorem prover [12], which is an interactive higher-
order-logic theorem prover. The HOL core consists of
only 5 basic axioms and 8 primitive inference rules,
which are implemented as ML functions. Soundness is
assured as every new theorem must be verified by apply-
ing these basic axioms and primitive inference rules or
any other previously verified theorems/inference rules.
The main motivation behind choosing HOL for our work
is the availability of most of the mathematical theories
like Booleans, natural and real numbers, sets, measure
and probability.
The rest of the paper is organized as follows: Sec-
tion 2 provides some related work. Section 3 describes
the proposed theorem proving based reliability analysis
approach. The formalization of reliability theory funda-
mentals is given in Section 4. This is followed by the
reliability analysis of reconfigurable memory arrays in
Section 5. Finally, Section 6 concludes the paper.
2 RELATED WORK
The recently emerged CMOS or non-CMOS nano-
technologies, which are used to develop most of the
state-of-the-art electronics and computer related equip-
ment, are more prone to defects than their predeces-
sors. Therefore, reliability analysis of nano-scale devices
has become not only essential but, due to their large
IEEE TRANSACTIONS ON COMPUTERS 3
gate counts, very challenging as well. Many researchers
around the world are trying to improve the quality of
computer based reliability methods. The ultimate goal
is to come up with a reliability analysis framework that
includes robust and accurate analysis methods, has the
ability to perform analysis for large-scale designs and
is easy to use. Some of the existing approaches that
allow us to tackle such reliability problems are presented
in the following. For instance, the probabilistic transfer
matrices (PTM) approach [13] fundamentally relies on
matrix arithmetic operations for each gate entity to as-
sess the reliability of the whole circuit. It involves the
complete enumeration of all possible input and output
combinations, which can be very expensive in terms
of computation when dealing with large designs. A
similar but independently proposed technique is based
on developing probabilistic models for unreliable logic
gates [14], called probabilistic gate models (PGMs), and
utilizing these individual models to analyze the reliabil-
ity of the circuit. In order to somewhat overcome the
runtime and scalability issues of the above mentioned
approaches, recently three algorithms, based on inde-
pendent gate failure assumption, have been proposed in
[15] and their effectiveness is illustrated by successfully
analyzing the reliability of a few benchmark circuits.
All of the above techniques are based on simulation
and thus cannot provide 100% precise results due to the
inherent nature of simulation as has been described in
the previous section.
Formal methods are capable of addressing the inac-
curacy issues of simulation and thus have also been
explored to conduct reliability analysis. Probabilistic
model checking has been applied to analyze the cir-
cuit reliability at the logic gate and block levels [16],
[17]. Like traditional model checking, probabilistic model
checking involves the construction of a precise state-
based mathematical model of the given random system,
which is then subjected to exhaustive analysis to verify
if it satisfies a set of reliability properties formally ex-
pressed in temporal logic. Besides the accuracy of the
results, other promising features of probabilistic model
checking include the ability to perform the analysis
automatically. On the other hand, probabilistic model
checking is limited to systems that can only be expressed
as probabilistic finite state machines or Markov chains.
Another major limitation of probabilistic model checking
is state space explosion [18] due to which it is not
scalable to large designs. Similarly, to the best of our
knowledge, it has not been possible to precisely reason
about most of the statistical quantities, such as variance
and tail distribution bounds, using probabilistic model
checking so far.
The higher-order-logic theorem proving based relia-
bility analysis approach, utilized in this paper, tends
to overcome the limitations of both the simulation and
model checking. Due to the formal nature of the models
and properties and the inherent soundness of the theo-
rem proving approach, reliability analysis carried out in
this way is free from any approximation and precision
issues. Similarly, the high expressibility of higher-order
logic allows us to analyze a wider range of systems
without any modeling limitations, such as infinite state-
space or the limitedness to Markovian chain models.
Hurd [7] developed a framework for the verification
of probabilistic algorithms in the HOL theorem prover.
Random variables are basically probabilistic algorithms
and thus can be formalized and verified, based on their
probability distribution properties, using the methodol-
ogy proposed in [7]. In fact, building upon Hurd’s for-
malization, most of the commonly used discrete [7] and
continuous [8] random variables have been formalized in
higher-order-logic and their corresponding probabilistic
and statistical [8] properties have been verified using
interactive theorem proving techniques. In this paper, we
utilize the above mentioned formalization, available in
the HOL theorem prover, to develop a generic theorem
proving based reliability analysis approach, a novelty
that to the best of our knowledge has not been presented
in the open literature so far.
We utilize the proposed approach for the reliability
analysis of memory arrays. Simulation techniques are
very commonly used for such analysis [19], [20]. When
memory sizes become large, analysis through simula-
tion very quickly becomes computationally difficult to
handle. Paper-and-pencil based analytical analysis have
been traditionally used for such cases. A memory ar-
ray probability model represents either the occurrence
of individual faults or the total number of faults as
a random variable and thus allows reasoning about
statistical properties of memory arrays. Questions, such
as “given a certain fault distribution and number of
faults can almost every reconfigurable memory array be
repaired”, or “with how many faults a memory array
can almost never be repaired”, are then answered [21],
[22], [23] based on analytical reasonings. To the best of
our knowledge, we were the first ones to utilize higher-
order-logic theorem proving for tackling the repairability
problem of stuck-at faults for reconfigurable memory
arrays in [24]. That analysis has been extended in the
current paper with the inclusion of coupling fault models
and several new probabilistic and statistical properties
and reliability conditions.
3 PROPOSED METHODOLOGY
A hypothetical model of the proposed reliability analysis
approach is given in Fig. 1, with some of its most
fundamental components depicted with shaded boxes.
Like all traditional analysis problems, the starting point
of reliability analysis is also a system description and
some intended system properties and the goal is to check
if the given system satisfies these given properties. For
simplicity, we have divided system reliability properties
into two categories, i.e., reliability properties related
to discrete random variables and reliability properties
related to continuous random variables.
IEEE TRANSACTIONS ON COMPUTERS 4
Fig. 1. Theorem Proving based Reliability Analysis
The first step in the proposed approach is to construct
a model of the given system in higher-order-logic. For
this purpose, the foremost requirement is the availability
of infrastructures that allow us to formalize all kinds
of discrete and continuous random variables as higher-
order-logic functions, which in turn can be used to
represent the random components of the given system
in its higher-order-logic model. The second step is to
utilize the formal model of the system to express system
reliability characteristics as higher-order-logic theorems.
The prerequisite for this step is the ability to express
probabilistic and statistical properties related to both
discrete and continuous random variables in higher-
order-logic. All probabilistic properties of discrete and
continuous random variables can be expressed in terms
of their Probability Mass Functions (PMFs) and Cumula-
tive Distribution Functions (CDFs), respectively. Similarly,
most of the commonly used statistical properties can be
expressed in terms of the expectation and variance char-
acteristics of the corresponding random variable. Thus,
we require the formalization of mathematical definitions
of PMF, CDF, expectation and variance for both discrete
and continuous random variables in order to be able to
express the given system’s reliability characteristics as
higher-order-logic theorems. The third and the final step
for conducting reliability analysis in a theorem prover is
to formally verify the higher-order-logic theorems devel-
oped in the previous step using a theorem prover. For
this verification, it would be quite handy to have access
to a library of some pre-verified theorems corresponding
to some commonly used properties regarding probability
distribution functions, expectation and variance. Since,
we can build upon such a library of theorems and
thus speed up the verification process. The formalization
details regarding the above mentioned steps are briefly
described now.
3.1 Discrete Random Variables
A random variable is called discrete if its range, i.e.,
the set of values that it can attain, is finite or at most
countably infinite [25]. Discrete random variables can be
completely characterized by their PMFs that return the
probability that a random variable X is equal to some
value x, i.e., 푃푟(푋 = 푥). Discrete random variables are
quite frequently used to model randomness in reliability
analysis. For example, the Bernoulli random variable is
widely used to model the fault occurrence in a compo-
nent and the Binomial random variable may be used to
represent the number of faulty components in a lot.
Discrete random variables can be formalized in higher-
order-logic as deterministic functions with access to
an infinite Boolean sequence 픹∞; an infinite source of
random bits with data type (푛푎푡푢푟푎푙 → 푏표표푙) [7]. These
deterministic functions make random choices based on
the result of popping the top most bit in the infinite
Boolean sequence and may pop as many random bits
as they need for their computation. When the functions
terminate, they return the result along with the remain-
ing portion of the infinite Boolean sequence to be used
by other functions. Thus, a random variable that takes
a parameter of type 훼 and ranges over values of type 훽
can be represented in HOL by the function
ℱ : 훼→ 퐵∞ → 훽 ×퐵∞
For example, a 퐵푒푟푛표푢푙푙푖( 12 ) random variable that
returns 1 or 0 with probability 12 can be modeled as
⊢ bit = 휆s.(if shd s then 1 else 0, stl s)
where the variable s, in the above definition, represents
the infinite Boolean sequence and the functions shd and
stl are the sequence equivalents of the list operations
’head’ and ’tail’. A function of the form 휆x.t represent a
lambda abstraction function in HOL that maps 푥 to 푡(푥).
The function bit accepts the infinite Boolean sequence
and returns a pair with the first element equal to either 0
or 1 and the second element equal to the unused portion
of the infinite Boolean sequence.
The random variables can also be expressed in the
more general state-transforming monad where the states
are the infinite Boolean sequences.
⊢ ∀ a s.unit a s = (a,s)
⊢ ∀ f g s.bind f g s = g fst(f s) snd(f s)
where the HOL functions fst and snd return the first
and second component of a pair, respectively. The unit
operator is used to lift values to the monad, and the
bind is the monadic analogue of function application.
All monad laws hold for this definition, and the notation
allows us to write functions without explicitly mention-
ing the sequence that is passed around, e.g., function 푏푖푡
can be defined as
⊢ bit_monad = bind sdest
(휆b. if b then unit 1 else unit 0)
where sdest s returns the pair (푠ℎ푑 s, 푠푡푙 s).
The work in [7] also presents the formalization of
some mathematical measure theory in HOL, which can
be used to define a probability function ℙ from sets
IEEE TRANSACTIONS ON COMPUTERS 5
of infinite Boolean sequences to 푟푒푎푙 numbers between
0 and 1. The domain of ℙ is the set ℰ of events of
the probability. Both ℙ and ℰ are defined using the
Carathe´odory’s Extension theorem, which ensures that ℰ
is a 휎-algebra: closed under complements and countable
unions. The formalized ℙ and ℰ can be used to derive
all the basic axioms of probability in the HOL theorem
prover. Similarly, they can also be used to prove proba-
bilistic properties for random variables. For example, we
can formally verify the following probabilistic property
for the function bit, defined above,
⊢ ℙ {s | fst (bit s) = 1} = 12
where {푥∣퐶(푥)} represents a set of all elements 푥 that
satisfy the condition 퐶 in HOL.
The above mentioned infrastructure can be utilized to
formalize most of the commonly used discrete random
variables and verify their corresponding PMF relations
[7]. For example, the Binomial random variable, which
models an experiment that counts the number of suc-
cesses in 푚 independent Bernoulli(푝) trials with a success
probability equal to 푝, can be formalized in higher-order-
logic as the following recursive function.
Definition 1: Binomial(m,p) Random Variable
⊢ ∀ p. (prob_bino 0 p = unit 0) ∧
∀ p n. prob_bino (n + 1) p =
bind (prob_bino n p)
(휆m. bind (prob_bern p)
(휆b. unit (if b then (m + 1) else m))))
where prob_bern is the higher-order-logic function for
the Bernoulli(푝) random variable [7] and the bind and
unit functions have already been defined above. We
were also able to verify the correctness of the above
definition by proving its PMF relation in HOL as follows.
Theorem 1: PMF of Binomial(m,p) Random Variable
⊢ ∀ m p n. 0 ≤ p ∧ p ≤ 1






pn (1 - p)m−n
3.2 Continuous Random Variables
A random variable is called continuous if it ranges over
a continuous set of numbers [25]. A continuous set of
numbers, sometimes referred to as an interval, contains
all real numbers between two limits. Continuous random
variables can be completely characterized by their CDFs
that return the probability that a random variable X is
exactly less than or equal to some value x, i.e., 푃푟(푋 ≤
푥). Continuous random variables are required to model
various phenomenon in reliability analysis. For example,
the exponential random variable is quite frequently used
to model the time to failure of a component.
The sampling algorithms for continuous random vari-
ables are non-terminating and hence require a different
formalization approach than discrete random variables,
for which the sampling algorithms are either guaranteed
to terminate or satisfy probabilistic termination, meaning
that the probability that the algorithm terminates is 1.
One approach to address this issue is to utilize the con-
cept of the nonuniform random number generation [1],
which is the process of obtaining arbitrary continuous
random numbers using a Standard Uniform random
number generator. The main advantage of this approach
is that we only need to formalize one continuous random
variable from scratch, i.e., the Standard Uniform random
variable, which can be used to model other continu-
ous random variables by formalizing the corresponding
nonuniform random number generation method.
Based on the above approach, a methodology for the
formalization of all continuous random variables for
which the inverse of the CDF can be represented in a
closed mathematical form is presented in [8]. The first
step in this methodology is the formalization of the
Standard Uniform random variable. The Standard Uni-
form random variable can be formalized using Hurd’s
approach for the formalization of discrete random vari-
ables, described in the last section, and the formalization
of the mathematical concept of limit of a real sequence









where 푋푘 denotes the outcome of the 푘푡ℎ random bit;
푇푟푢푒 or 퐹푎푙푠푒 represented as 1 or 0 respectively. The
formalization details are outlined in [8].
The second step in the methodology for the formal-
ization of continuous probability distributions is the for-
malization of the CDF and the verification of its classical
properties. This is followed by the formal specification
of the mathematical concept of the inverse function of a
CDF. This formal specification, along with the formaliza-
tion of the Standard Uniform random variable and the
CDF properties, can be used to formally verify the cor-
rectness of the Inverse Transform Method (ITM) [1]. The
ITM is a well known nonuniform random generation
technique for generating nonuniform random variables
for continuous probability distributions for which the
inverse of the CDF can be represented in a closed
mathematical form. Mathematically, it can be expressed
for a random variable 푋 with CDF 퐹 using the Standard
Uniform random variable 푈 as follows
푃푟(퐹−1(푈) ≤ 푥) = 퐹 (푥) (2)
and its formal proof can be found in [8].
The formalized Standard Uniform random variable
can now be used to formally specify any continuous
random variable for which the inverse of the CDF can be
expressed in a closed mathematical form as 푋 = 퐹−1(푈).
Whereas, the CDF of this formally specified continuous
random variable, 푋 , can be verified, based on simple
arithmetic reasoning, using the formal proof of the ITM.
For illustration purposes, consider the example of the
exponential random variable, with the following CDF.
IEEE TRANSACTIONS ON COMPUTERS 6
푃푟(푋 ≤ 푥) =
{
0, for 푥 ≤ 0 (3a)
1− 푒−푚푥, for 0 < 푥 (3b)
It can be expressed, using the above methodology, as the
following higher-order-logic function
Definition 2: Exponential(m) Random Variable
⊢ ∀ m s. exp_rv m s =
(휆x. - 1
m
ln (1 - x)) (std_unif_cont s)
where the HOL functions (휆x. - 1
m
ln (1 - x)) and
std_unif_cont represent the inverse CDF of the ex-
ponential random variable and the Standard Uniform
random variable, respectively. Now, the CDF of the
exponential random variable, given in Equation (3), can
be expressed as the following theorem.
Theorem 2: CDF for the Exponential Random Variable
⊢ ∀ m x. (0 < m) ⇒
cdf (휆s. exp_rv m s) x =
if x ≤ 0 then 0 else (1 - e−mx)
The verification of Theorem 2 is based on the above
methodology and requires very little user interaction,
since it is based on the formally verified ITM and
thus requires arithmetic reasoning only instead of the
relatively complex reasoning based on probability theory
principles.
3.3 Statistical Properties
In reliability analysis, statistical characteristics play a
major role in decision making as they tend to summarize
the probability distribution characteristics of a random
variable in a single number. Due to their widespread
interest, the computation of statistical characteristics has
now become one of the core components of every mod-
ern reliability analysis framework.
Expectation provides the average of a random vari-
able, where each of the possible outcomes of this random
variable is weighted according to its probability [9]. The
expectation for a function of a discrete random variable,
which attains values in the positive integers only, is




푓(푛)푃푟(푋 = 푛) (4)
where 푋 is the discrete random variable and 푓 repre-
sents a function of the random variable 푋 . The above
definition only holds if the associated summation is
convergent, i.e.,
∑∞
푛=0 푓(푛)푃푟(푋 = 푛) < ∞. The ex-
pression of expectation, given in Equation (4), has been
formalized in [8] as a higher-order-logic function using
the formalization of the probability function ℙ, explained
in Section 3.1 of this paper, as follows.
Definition 3: Expectation of Function of a Discrete RV
⊢ ∀ f R. expec_fn f X =
suminf(휆n. (f n)ℙ{s | fst(X s) = n})
where the HOL function suminf [27] represents the




The expected value of a discrete random variable can
now be defined as a special case of Definition 3 when 푓
is an identity function.
Definition 4: Expectation of a Discrete RV
⊢ ∀ R. expec R = expec_fn (휆n. n) X
In order to verify the correctness of the above defini-
tions of expectation, they are utilized in [8] to formally








퐸푥[푎+ 푏푋] = 푎+ 푏퐸푥[푋] (6)
These properties not only verify the correctness of the
above definitions but also play a vital role in verify-
ing the expectation characteristics of discrete random
components of probabilistic systems, as will be seen in
Section 5 of this paper.
Variance of a random variable X describes the dif-
ference between X and its expected value and thus is
a measure of its dispersion. It is defined for a discrete
random variable, 푋 , as follows
푉 푎푟[푋] = 퐸푥[(푋 − 퐸푥[푋])2] (7)
The above definition of variance can be formalized in
higher-order-logic by utilizing the formal definitions of
expectation, given in Definitions 3 and 4 as follows.
Definition 5: Variance of a Discrete Random Variable
⊢ ∀ X. variance X =
expec_fn (휆n. (n - expec X)2) X
Like the expectation definition, this definition was also
formally verified to be correct by proving the following
classical variance properties for it.








The above mentioned formalization allows us to rea-
son about expectation and variance properties of any
formalized discrete random variable that attains values
in positive integers. For example, again consider the
example of the Binomial(푚, 푝) random variable, given in
Definition 1. Its expectation (푚푝) and variance (푚푝(1−푝))
expressions can be expressed as the following higher-
order-logic theorems and verified using the above men-
tioned infrastructure.
Theorem 3: Expectation of Binomial(m,p) Random Variable
⊢ ∀ m p. 0 ≤ p ∧ p ≤ 1 ⇒
expec(휆s. prob_bino m p s) = m p
IEEE TRANSACTIONS ON COMPUTERS 7
Theorem 4: Variance of Binomial(m,p) Random Variable
⊢ ∀ m p. 0 ≤ p ∧ p ≤ 1 ⇒
variance(휆s. prob_bino m p s) = m p(1 - p)
The formalization and verification, presented in this
section, can be utilized to formally reason about ex-
pectation and variance for positive integer valued dis-
crete random variables. On the other hand, to the best
of our knowledge, the formalization and verification
of statistical properties, like expectation and variance,
for continuous random variables is an open research
issue as of now. This step requires a higher-order-logic
formalization of an integration function that can also
handle functions with domains other than real numbers.
Lebesgue integration provides this feature and thus the
higher-order-logic formalization of some portions of the
Lebesgue integration theory [28] can be built upon for
formalizing the mathematical concepts of expectation
and variance for continuous random variables.
4 RELIABILITY THEORY FORMALIZATION
In this section, we present the higher-order-logic formal-
ization of some fundamental and widely used reliabil-
ity theory definitions and theorems. The main motiva-
tion behind this is to provide the users of the higher-
order-logic theorem proving based reliability analysis
approach with a generic infrastructure that can be built
upon and thus minimize their interactive proof efforts
associated with the third step of the proposed reliability
analysis approach, described in the previous section.
In engineering reliability theory, reliability 푅(푡) of a
system or component is defined as the probability that
it performs its intended function until some time 푡 [29].
푅(푡) = 푃푟(푋 > 푡) (10)
The random variable 푋 , in the above definition, models
the time to failure of the system. Usually, this time to
failure is modeled by the exponential random variable
with parameter 푚 that represents the failure rate of the
system. The value of 푚 depends on the drift of system’s
characteristics with time. This drift may occur because
of several internal or external factors. For example, in
the case of analyzing an SoC or some other hardware
component the value of 푚 would be effected due to
electromigration, environmental effects, like corrosion,
vibration and temperature, and transient stresses such
as electrostatic discharge and lightning.
Now, based on Equation (10), we can formally define
the reliability of a system or component in HOL using
the formal definition of the exponential random variable,
given in Definition 2, as follows.
Definition 6: System Reliability
⊢ ∀ m t.rel_sys m t = ℙ{s|exp_rv m s > t}
The function rel_sys accepts two parameters of type
푟푒푎푙, 푚 and 푡, which represent the failure rate and time,
respectively. It returns the reliability of the given system
at any time 푡 based on the mathematical expression of
Equation (10).
We now formally verify the following useful alternate
reliability expression [29].
푅(푡) = 푒−푚푡 (11)
It can be formally expressed as follows
Theorem 5: Alternate System Reliability Expression
⊢ ∀ m t.(0 ≤ t)⇒ rel_sys m t = e−mt
where the assumption (0 ≤ t) ensures that time 푡 can
never be negative. Theorem 5 was verified using the
complement law of probability (∀A. ℙ(A¯) = 1− ℙ(A)) and
the CDF theorem for the exponential random variable.
Equation (11) provides a very simple but useful means
to determine the reliability of a component or system and
is thus widely used in reliability analysis. For example,
it has been used to assess the reliability of reconfigurable
memory arrays in [30] and the reliability and fault toler-
ance of robotics in [31]. Thus, its verification in Theorem
5 allows us to tackle these kinds of reliability analysis
within the sound core of a theorem prover.
In reliability analysis, while looking at the failure rates
of a system, it is often the case that we are interested in
the probability that a random variable assumes values
that are far from its expectation. Instead of characterizing
this probability by a distribution function, it is a common
practice to rely upon bounds on this distribution, termed
as tail distribution bounds, which are usually calculated
using Markov and Chebyshev’s inequalities [9].
The Markov’s inequality gives an upper bound for the
probability that a non-negative random variable 푋 is
greater than or equal to some positive constant
푃푟(푋 ≥ 푎) ≤ 퐸푥[푋]
푎
(12)
Markov’s inequality can be expressed in HOL, using
the statistical properties related formalization presented
in Section 3.3, for a measurable discrete random variable
that attains values in positive integers only as follows.
Theorem 6: Markov’s Inequality
⊢ ∀ X a. (0 < a) ∧
(summable(휆n. n ℙ{s | fst (X s) = n}))
⇒ ℙ {s | fst (X s) ≥ a} ≤ (expec X)
a
where 푎 represents a 푟푒푎푙 number and the predi-
cate summable [27] returns 푇푟푢푒 if the infinite sum-




푛=0 푓(푛) = 푥. Thus, the summable assump-
tion in the above theorem states that the theorem is
only valid for a random variable X with well-defined
expectation. The HOL proof of Theorem 6 is based on
some probability theory axioms and arithmetic reasoning
and more details can be found in [8].
Markov’s inequality gives the best tail bound pos-
sible, for a nonnegative random variable, using the
expectation for the random variable only. This bound
IEEE TRANSACTIONS ON COMPUTERS 8
can be improved upon if more information about the
distribution of the random variable is taken into account.
Chebyshev’s inequality is based on this principle and it
presents a significantly stronger tail bound in terms of
variance of the random variable
푃푟(∣푋 − 퐸푥[푋]∣ ≥ 푎) ≤ 푉 푎푟[푋]
푎2
(13)
The corresponding HOL theorem is as follows
Theorem 7: Chebyshev’s Inequality
⊢ ∀ R a. (0 < a) ∧ (0 < variance X) ∧
(summable(휆n. n ℙ{s | fst (X s) = n})) ∧
(summable(휆n. n2 ℙ{s | fst (X s) = n}))
⇒ ℙ {s | abs (fst (X s) - expec X) ≥ a}
≤ variance R
a2
The 푠푢푚푚푎푏푙푒 assumptions ensure that the theorem is
only valid for a random variable 푋 with well-defined
expectation and second moment values. The HOL proof
of Theorem 7 is also based on some probability theory
axioms and the proof details can be found in [8].
Due to the widespread interest in failure probabilities
in reliability analysis, Markov and Chebyshev’s inequal-
ities are widely used in this domain. Thus, their formal
verification is a significant step towards the develop-
ment of a successful theorem proving based reliability
analysis framework. In fact, we will utilize them for the
repairability analysis presented in the next section.
5 RECONFIGURABLE MEMORY ARRAYS
Embedded memory is the most dominant component in
terms of silicon area of any SoC these days. Applications
such as mobile communication devices, signal process-
ing and computer networks all require large amounts of
memory. Extremely small memory cells and the fact that
a significant amount of the chip area is taken by com-
pact memories, makes them more prone to defects than
standard logic. The defects in a memory can render the
whole SoC useless. Even in mature fabrication processes
where the defect densities tend to be small, throwing
away of any chip is considered unacceptable because
of its adverse effect on yield. Moreover, the fabrication
defects that are not caught in the testing phase may also
lead to devastating situations when the corresponding
memories are used in safety critical SoCs for domains,
such as medicine, military and transportation.
Reconfigurable memory arrays tend to increase the
reliability of memory arrays in the presence of fabrica-
tion faults. The main idea is to add some redundancy
to memory arrays during the design phase. This way
even after fabrication, we can repair some of the memory
faults by replacing the rows or columns containing faulty
memory cells with the available spare rows or columns.
Though, this solution comes with a bigger design chal-
lenge of solving the repairability problem, i.e, estimating
the right number of spare rows and columns for meeting
reliability specifications for a given memory array.
In this section, we analyze this repairability prob-
lem of reconfigurable memory arrays in the presence
of stuck-at and coupling faults, which are two of the
most commonly occurring fabrication faults, using the
proposed reliability analysis approach. Our analysis is
mainly inspired by the analytical model developed in
[22] for the paper-and-pencil based reliability analysis of
reconfigurable memory arrays. We proceed by formally
expressing a fault model for reconfigurable memory
arrays in higher-order logic. Our formalization utilizes
precise random variable functions to express the random
components in the model. This formalization is then
utilized to formally verify two significant results regard-
ing the repairability problem of reconfigurable memory
arrays. Firstly, we verify a relation that ascertains that a
large square memory array is almost always repairable
(with probability 1) if stuck-at and coupling faults are
independent and identically distributed with specific
probabilities. This condition is usually termed as the
repairability condition. Secondly, we verify a bound on
the stuck-at and coupling fault occurrence probabilities
that will make reconfiguration of a large square mem-
ory array almost impossible (with probability 0). This
condition, which is usually termed as the irrepairability
condition, allows us to determine how large the proba-
bility of defects must be in order to make reconfiguration
nearly impossible. Using the proposed approach, we are
able to accurately analyze both of the above mentioned
repairability and irrepairability properties without any
CPU time constraints, which clearly demonstrates its
effectiveness for real-world reliability analysis problems.
5.1 Formal Stuck-at and Coupling Fault Model
In order to illustrate the formal stuck-at and coupling
fault model, we first present a 6x6 memory array with
one stuck-at and two coupling faults example, shown in
Figure 2(a). The stuck-at fault is represented with a circle
and the coupling faults are represented using a pair of
squares. The two squares in the pair are connected by an
arrow. The direction of the arrow is from the coupling
cell (lightly shaded) to the coupled cell.
























































Fig. 2. Memory Array Model
A coupling fault in the memory array can be repaired
by replacing either the coupling cell or the coupled cell.
IEEE TRANSACTIONS ON COMPUTERS 9
Disabling the coupling or the aggressor cell must be
done by replacing the row containing the coupling cell,
whereas the coupled cell can be repaired by replacing
either its row or column with a spare row or a spare col-
umn [22]. For example, the coupling fault (6, 6)→ (5, 5)
can be repaired by either replacing the 6푡ℎ row with a
spare row, or by replacing the 5푡ℎ row or the 5푡ℎ column
with a spare row or a spare column, respectively (Figure
2(b)). The degree of this fault node in the graph is three,
since this fault node is connected to one column node
(5) and two row nodes (5, 6). The second coupling fault,
(3, 3) → (3, 4), on the other hand has a degree of two
since it is only connected to one row (3) and one column
(4). A stuck-at fault, on the other hand, can be repaired
by replacing either the row or column containing the
fault by a spare row or a spare column, respectively [32].
Thus, the stuck at fault at location (5, 2) can be repaired
by either replacing the 5푡ℎ row or 2푛푑 column of the array
with a spare row or a spare column. In the graph model
for the repairability problem, each fault node can have
a degree of two or three for the coupling faults and a
degree of two for the stuck-at faults.
We now generalize the above example. The recon-
figurable memory array can be modeled as a bipartite
graph (퐹,푋,퐸). In this bipartite graph, 퐹 represents
the set of nodes corresponding to faults in the memory
array, 푋 = 푅 ∪ 퐶 is a set of nodes corresponding to
rows (푅) and columns (퐶) in the memory array, and
퐸 is the set of edges connecting various nodes of the
sets 퐹 and 푋 based on how these faults can possibly be
repaired. It is important to note here that the number of
elements in the set 퐹 and their identities is a random
quantity as fault occurrence is an unpredictable event.
Therefore, the probability that a node will be included
in the set 퐹 depends on the probabilities 푝푠 and 푝푐,
corresponding to the occurrence of stuck-at and coupling
faults, respectively. Also, the occurrence of stuck-at or
coupling fault, and thus the inclusion of a node in the
set 퐹 , is assumed to be independent and identically
distributed in this model. Thus, the upper bound on the
cardinality of the set 퐹 is 푛3+푛2 [22], where 푛 represents
the number of total rows or columns of an 푛x푛 memory
array. A repair solution exists if one can find a set of
nodes say 푆 in set 푋 , which are less than or equal to in
number of the available spare rows (푠푟) or columns (푠푐).
The probability of repairability can now be defined as
Pr(∣퐹 ∣ ≤ 푠푟 + 푠푐) (14)
where ∣퐹 ∣ represents the cardinality of the set 퐹 . Equa-
tion (14) represents the probability of the event when the
number of stuck-at and coupling faults ∣퐹 ∣, a random
quantity, is less than the total number of spare rows
and columns 푠푟 + 푠푐. We can express Equation (14) in
terms of the number of rows or columns of a square
푛x푛 reconfigurable memory array as
Pr(∣퐹 ∣ ≤ (푎+ 푏)푛) (15)
where 푎 = 푠푟푛 and 푏 =
푠푐
푛 . The values of 푎 and 푏 are
bounded in the real interval [0, 1], since the number of
spare rows and spare columns is usually a small fraction
of the total number of rows and columns in the array and
can never exceed it.
In this paper, our first goal is to formally verify
that if the probabilities of stuck-at and coupling fault

















where 푐1 + 푐2 = 푎 + 푏 and 푤(푛) → ∞ as 푛 → ∞, then
the memory array is almost always repairable. The term
almost always repairable in the above context means
that the probability of repairability (Pr(∣퐹 ∣ ≤ 푠푟 + 푠푐))
tends to 1 as 푛 becomes very very large. The above ex-
pressions for the stuck-at and coupling fault occurrence
probabilities have been initially proposed and analyzed
using informal techniques in [22]. Our contribution in
this paper is to formally verify the above argument using
the HOL theorem prover.
The first step in the proposed probabilistic analysis
approach is to construct a formal model of the system
in higher-order-logic while representing its random com-
ponent as formalized random variables. In the above
mentioned memory array model our parameter of in-
terest is the number of faults. The behavior of a stuck-
at or coupling fault occurrence in the above model
can be formally represented as a Bernoulli(푝) random
variable with 푝 = 푝푠 and 푝 = 푝푐, respectively. Now,
under the assumption that the occurrence of stuck-at
and coupling faults are independent and identically
distributed, we can model the total number of these
faults as Binomial(푚, 푝) random variables, which model
an experiment that counts the number of successes in
푚 independent Bernoulli(푝) trials, with their respective
probabilities as follows.
Definition 7: Number of Stuck-at Faults
⊢ ∀ n c1 w. num_of_faults_stuck n c1 w







Definition 8: Number of Coupling Faults
⊢ ∀ n c2 w. num_of_faults_coupling n c2 w







The functions above accept three parameters each: the
number of rows or columns of a square reconfigurable
memory array as a 푛푎푡푢푟푎푙 number 푛, the real numbers
푐1 or 푐2, respectively, that are related to the fractions of
spare rows and columns as 푐1+ 푐2 = 푎+ 푏, and the 푟푒푎푙
sequence 푤 with data type (푛푎푡푢푟푎푙 → 푟푒푎푙). They uti-
lize the Binomial random variable function prob_bino,
given in Definition 1, to return the number of stuck-at
and coupling faults, respectively, for the specific case of
a square 푛 x 푛 memory array with the fault occurrence
IEEE TRANSACTIONS ON COMPUTERS 10
probabilities equal to the expressions, given in Equations
(16) and (17), respectively.
Now, the total number of the faults in the memory
array can be formalized as the sum of the number of
stuck-at and coupling faults as follows.
Definition 9: Total Number of Faults
⊢ ∀ n c1 c2 w. num_of_faults n c1 c2 w =
bind (num_of_faults_stuck n c1 w)
(휆x. bind (num_of_faults_coupling n c2 w)
(휆y. unit (x + y)))
The above function accepts four parameters, 푛, 푐1, 푐2
and 푤 and returns the sum of stuck-at and coupling
faults, generated by functions num_of_faults_stuck
and num_of_faults_coupling, respectively, using
the monadic functions bind and unit.
In the probabilistic analysis of very large memory
arrays, it is often required to find when repairing a
fault in a memory array becomes nearly impossible. In
order to be able to answer such questions, we verify an
irrepairability condition, according to which the memory
array is almost never repairable if the probabilities of










−[푎 ln 푎+(1−푎) ln(1−푎)]
(1−푎)2(1−푏) [22]. The term almost
never repairable means that the probability of having 1
or more repair solutions using 푎푛 rows tends to 0 as 푛
becomes very very large.
Our formal model of the memory array is also capable
of capturing the number of repair solutions and thus
the irrepairability condition. A repair solution ceases to
exist if no combination of 푎푛 spare rows can be used to
repair all the stuck-at and coupling faults present in the
memory array. Under the assumption that the occurrence
of stuck-at and coupling faults are independent and
identically distributed, we can model the number of
repair solutions as a Binomial(푚, 푝) random variable
with 푚 being equal to the total number of possible






, and 푝 being equal to the probability
that a specific choice of 푎푛 rows constitutes a repair
solution. This probability can be expressed in terms of
the stuck-at and coupling fault probabilities, 푝푠 and 푝푐, as
(1−푝푠)(푛−푎푛)(푛−푗푛)(1−푝푐)(푛−푎푛)2(푛−푗푛), where 푗 ≤ 푏 [22].
Thus, the number of repair solutions can be formalized
in higher-order logic as follows:
Definition 10: Number of Repair Solutions












where ⌊푥⌋ denotes the floor of 푥 and it returns the
nearest integer that is less than 푥. The floor function






the data type of both of its arguments 푚 and 푛 to be
positive integers. It is important to note that in the paper-
and-pencil analysis of the same problem [22], the floor
function was missing in the binomial expression.
5.2 Repairability Condition
In this section, we utilize the function num_of_faults
to formally verify a couple of statistical properties re-
garding the number of faults and the almost always re-
pairability condition for an 푛 x 푛 reconfigurable memory
array with stuck-at and coupling fault occurrence prob-
abilities given by Equations (16) and (17), respectively.
These verification results play a vital role in designing
reliable reconfigurable memory arrays.
For a memory array containing independent and
identically distributed stuck-at and coupling faults with
probabilities 푝푠 and 푝푐, given by Equations (16) and (17),
respectively, the average number of faults is given by














This property can be formally expressed using the formal
definition of expectation and our formal definition of the
number of faults as follows.
Theorem 8: Average Number of Faults
⊢ ∀ n a b c1 c2 w.
(0 ≤ a) ∧ (a ≤ 1) ∧ (0 ≤ b) ∧ (b ≤ 1) ∧
(c1 + c2 = a + b) ∧ (1 < n) ∧
(∀ n. (0 < w(n)) ∧



















where the HOL function min returns the minimum value
of its two real arguments. The first four assumptions
in the above theorem ensure that the fractions 푎 and 푏
are bounded by the interval [0, 1] as has been described
in the previous section. The relationship between 푐1, 푐2
and 푎, 푏 is given in the fifth assumption. Whereas, the
precondition 1 < 푛 has been used in order to ensure
that the given memory array has more than one cell.
The last assumption is about the real sequence 푤 and
basically provides its upper and lower bounds. These
bounds have been used in order to prevent the stuck-
at and coupling fault occurrence probabilities 푝푠 and 푝푐,
given in Equations (16) and (17), from falling outside
their allowed interval [0, 1]. It is interesting to note that
no such restriction on the sequence 푤 was imposed in
the paper-and-pencil based analysis of the repairability
problem given in [22]. This fact clearly demonstrates the
strength of the proposed approach as it allowed us to
highlight this corner case, which if ignored could lead
to the invalidation of the whole analysis. The conclusion
of Theorem 8 presents the mathematical relation given
in Equation (20).
IEEE TRANSACTIONS ON COMPUTERS 11
We proceed with the verification of Theorem 8 by
simplifying the left hand side (LHS) of its proof goal
using the linearity of expectation property, given in
Definition 4, as follows.
expec (휆s.num_of_faults_stuck n c1 w s) +














Next, we verify the following two theorems for the
expectation of the number of stuck-at and coupling faults
based on their definitions, given in Definitions 7 and
8, respectively, and the expectation of Binomial random
variable, given in Theorem 3.
Theorem 9: Average Number of Stuck-at Faults
⊢ ∀ n c1 w. (1 < n) ∧
(∀ n. (0 < w(n)) ∧ (w(n) < c1√n)) ⇒








Theorem 10: Average Number of Coupling Faults
⊢ ∀ n c2 w. (1 < n) ∧
(∀ n. (0 < w(n)) ∧ (w(n) < c2√n)) ⇒








The above two theorems can now be used to conclude
the HOL proof of Theorem 8.
The variance of the total number of faults for an 푛x푛
memory array, with the probabilities of stuck-at and
coupling fault occurrence, given by Equations (16) and
(17), is given by
푉 푎푟[∣퐹 ∣] = 푛2(푝푠)(1− 푝푠) + 푛3(푝푐)(1− 푝푐) (21)
This property can be formally expressed using the formal
definition of variance and the formal definition of the
number of faults as follows.
Theorem 11: Variance of the total Number of Faults
⊢ ∀ n a b c1 c2 w s.
(0 ≤ a) ∧ (a ≤ 1) ∧ (0 ≤ b) ∧ (b ≤ 1) ∧
(c1 + c2 = a + b) ∧ (1 < n) ∧
(∀ n.(0 < w(n)) ∧
































The HOL verification of Theorem 11 is based on the
linearity of variance property, given in Equation (8),
and the variance characteristics of the Binomial random
variable. The proof steps are very similar to the ones for
Theorem 8.
A tail distribution bound of the number of faults for
an 푛x푛 memory array, with the probabilities of stuck-at
and coupling fault occurrence, given by Equations (16)
and (17), can be expressed as follows.
푃푟(∣퐹 ∣ ≤ (푎+ 푏)푛) ≥ 1− 푛
2(푝푠)(1− 푝푠) + 푛3(푝푐)(1− 푝푐)
4푛(푤(푛))2
(22)
Whereas, the corresponding HOL theorem is as follows.
Theorem 12: Tail Distribution Bound for the num. of Faults
⊢ ∀ n a b c1 c2 w s.
(0 ≤ a) ∧ (a ≤ 1) ∧ (0 ≤ b) ∧ (b ≤ 1) ∧
(c1 + c2 = a + b) ∧ (1 < n) ∧
(∀ n.(0 < w(n)) ∧





(ℙ {s | (fst( num_of_faults n c1 c2 w s))
≤(a + b) n} ≥
1 -( variance(휆s. num_of_faults n c1 c2 w s)
4n(w(n))2 )
We proceed with the verification of this theorem by
splitting its proof goal into two subgoals using the less-
than-or-equal-to transitive property as follows.
ℙ {s | (fst(num_of_faults n c1 c2 w s) >
(a + b)n - 4
√
nw(n)) ∧
(fst(num_of_faults n c1 c2 w s) <
(a + b)n} ≤
ℙ {s | (fst(num_of_faults n c1 c2 w s)) ≤
(a + b)n}
1-( variance(휆s. num_of_faults n c1 c2 w s)
4n(w(n))2 ) ≤
ℙ {s | (fst(num_of_faults n c1 c2 w s) >
(a + b)n - 4
√
nw(n)) ∧
(fst(num_of_faults n c1 c2 w s) <
(a + b)n}
The first subgoal can be verified using the basic prob-
ability increasing axiom (∀A B. A ⊆ B⇒ ℙ(A) ≤ ℙ(B)).
Whereas, by rewriting the two inequalities in the argu-
ment of the probability function of subgoal 2 using the
absolute value theorem ((∣푦−푥∣ < 푑) = (푥−푑 < 푦 < 푥+푑)
and using Theorem 8 along with some arithmetic reason-
ing we get:
1-( variance(휆s. num_of_faults n c1 c2 w s)
4n(w(n))2 ) ≤
ℙ {s | |fst (num_of_faults n c1 c2 w s) -




Now using the complement probability law along with
Theorems 8 and 11, we can rewrite the above sub goal
as follows
( variance(휆s. num_of_faults n c1 c2 w s)
4n(w(n))2 ) ≥
ℙ {s | |fst (num_of_faults n c1 c2 w s) -
(expec(휆s. num_of_faults n c1 c2 w s))|
≥ 2√n w(n)}
The above subgoal can now be discharged from the
HOL goal stack by using Chebyshev’s inequality, given
in Theorem 7, along with some arithmetic reasoning.
Finally, we use the statistical properties verified so far
to analyze the repairability problem, i.e., an 푛x푛 recon-
figurable memory array with the probabilities of stuck-at
and coupling fault occurrence, given by Equations (16)
and (17), is almost always repairable.
lim
푛→∞Pr(∣퐹 ∣ ≤ (푎+ 푏)푛) = 1 (23)
IEEE TRANSACTIONS ON COMPUTERS 12
The corresponding HOL theorem is as follows
Theorem 13: Repairability Condition
⊢ ∀ a b w.
(0 ≤ a) ∧ (a ≤ 1) ∧ (0 ≤ b) ∧ (b ≤ 1) ∧
(c1 + c2 = a + b) ∧ (1 < n) ∧
(∀ n.(0 < w(n)) ∧






w(n)) = 0) ⇒
(lim (휆n.
ℙ {s | (fst (num_of_faults n c1 c2 w s))
≤ (a + b) n }) = 1))
where lim M represents the HOL formalization of the
limit of a real sequence 푀 (i.e., 푙푖푚 푀 = lim
푛→∞푀(푛)) [27].
A new assumption (lim(휆n. 1
w(n) ) = 0) has been added
that formally represents the intrinsic characteristic of
푟푒푎푙 sequence 푤 that it tends to infinity as its 푛푎푡푢푟푎푙
argument becomes very very large.
We proceed with the HOL verification of Theorem 13
by first splitting its proof goal into the following two




ℙ{s|fst(num_of_faults n c1 c2 w s)




ℙ{s|fst(num_of_faults n c1 c2 w s)
≤ (a + b)n}]
The first subgoal can be verified using the basic
probability axiom (∀퐴.푃푟(퐴) ≤ 1). Whereas, we utilize
Theorem 12 and the transitivity property of less-than-
or-equal-to for real numbers along with some arithmetic


































The expression on the RHS above can be rewritten as




1 - ( c1
2w(n) − 12√n )( 12w(n) − c12nw(n) + 12n√n ) +
( c2
2w(n) − 12√n )( 12w(n) − c22n2w(n) + 12n2√n )
]
This subgoal can now be verified as the limit value
of the expression on the RHS tends to 1, since all the
denominator terms in this expression tend to ∞ as 푛
becomes very large.
5.3 Irrepairability Condition
In this section, we utilize the function num_of_repsoln
to formally verify the irrepairability condition for an 푛x푛
reconfigurable memory array. For a memory array con-
taining independent and identically distributed stuck-at
and coupling faults with probabilities 푝푠 and 푝푐, given by
Equations (18) and (19), respectively, the average number













where 푈 represents the Bernoulli random variable that
models the number of repair solutions. We formalized
this property using the definitions of Bernoulli random
variable, expectation and the number of repair solutions.
Theorem 14: Average Number of Repair Solutions
⊢ ∀ n a a1 a2 j c1 c2. (a = a1
a2
) ∧
(0 ≤ a1) ∧ (a1 ≤ a2) ∧ (0 ≤ b) ∧ (b ≤ 1)
∧ (0 ≤ j) ∧ (j ≤ b) ∧ (0 < c1) ∧ (c1 <
(na2)) ∧ (0 < c2) ∧ (c2 < (na2)2) ⇒











The variable 푎, which is a constant and represents the
ratio of the spare rows or columns to the total number
of rows or columns in a square memory array, has been
declared as a ratio of two positive integers 푎1 and 푎2.
Similarly, we restrict the size of the square memory array,
i.e., the number of rows or columns, to be only equal to
a multiple of 푎2 and thus 푛푎2 has been used instead of
푛 in the above theorem. These preconditions are used
to ensure that the product of 푎 and the total number of
rows or columns should always be equal to an integer
value that represents the number of spare rows. Besides
the bounds on 푎, 푏 and 푗, bounds on the values of 푐1
and 푐2 have also been assumed in the above theorem.
These bounds have been used in order to prevent the
stuck-at and coupling fault occurrence probabilities 푝푠
and 푝푐, given in Equations (18) and (19), from falling
outside their allowed interval [0, 1]. The conclusion of
Theorem 14 formally presents the expectation relation
of the number of repair solutions, given in Equation
(24). This theorem can be verified using the definition
of the function num_of_repsoln and the expectation
property for the Binomial random variable, given in
Theorem 3, along with the fact that the probability of
success for the Binomial random variable of the function
num_of_repsoln lies in the interval [0, 1].
In order to analyze the irrepairability condition, we
are interested in the probability that 1 or more repair
solutions exist. This probability has the following tail
distribution bound for the case of an 푛x푛 memory array
푃푟(푈 > 0) < (2퐻(푎)푒−푐1(1−푎)
2(1−푏)푒−푐2(1−푎)
2(1−푏))푛 (25)
where 퐻 is the binary entropy function 퐻(푥) =
−−푥푙푛(푥)−(1−푎)푙푛(1−푎)푙푛2 [22]. The corresponding HOL the-
orem for the above tail distribution bound is as follows.
Theorem 15: Tail Distribution Bound for Repair Solutions
⊢ ∀ n a a1 a2 j c1 c2. (a = a1
a2
) ∧
(0 ≤ a1) ∧ (a1 ≤ a2) ∧ (0 ≤ b) ∧ (b ≤ 1)
∧ (0 ≤ j) ∧ (j ≤ b) ∧ (0 < c1) ∧ (c1 <
n) ∧ (0 < c2) ∧ (c2 < (na2)2) ∧
c1+ c2 > −[a ln a+(1−a) ln(1−a)](1−a)2(1−b) ⇒





IEEE TRANSACTIONS ON COMPUTERS 13
We proceed with the verification of this theorem by
splitting its proof goal into two subgoals as follows.
ℙ{s|fst(num_of_repsoln na2 a j c1 c2 s)>0}
≤expec(휆s.num_of_repsoln na2 a j c1 c2 s)




The first subgoal can be verified using the
Markov’s inequality, verified in Theorem 6, as the set
{s|fst(num_of_repsoln na2 a j c1 c2 s)>0}
is equivalent to the set {s|fst(num_of_repsoln na2
a j c1 c2 s)≥1}. Whereas, the second subgoal can









In order to verify the above inequality, we verified the
following alternate definition of the exponential function
Lemma 1: Exponential Function
⊢ ∀ x. lim (휆n. (1+ x
n
)n) = ex
using the formalized power series based definition of the




푛! , in HOL [27].
The formal proof of Lemma 1 involves the verification
of L’Hopital’s rule and the formal definitions of limit of
a real sequence and limit of a function at a point along
with some rigorous arithmetic reasoning. Once verified,
Lemma 1 can be used along with the monotonically
increasing property of the real sequence (휆n. (1+ x
n
)n),
when ∣푥∣ < 푛, to prove the following useful result.
⊢ ∀ n x. ∣x∣ < n ⇒ (1+ x
n
)n ≤ ex
The above relationship, along with some arithmetic rea-
soning, can now be used to discharge the remaining sub-
goal of Theorem 15. This also concludes the verification
of our desired tail distribution bound.
Finally, we use the statistical properties verified so far
to analyze the irrepairability property, i.e., a reconfig-
urable memory array with the probabilities of stuck-at
and coupling fault occurrence, given by Equations (18)
and (19), is almost never repairable.
lim
푛→∞Pr(푈 > 0) = 0 (26)
The corresponding HOL theorem is as follows
Theorem 16: Irrepairability Condition
⊢ ∀ a a1 a2 j c1 c2. (a = a1
a2
) ∧
(0 ≤ a1) ∧ (a1 ≤ a2) ∧ (0 ≤ b) ∧ (b ≤ 1)
∧ (0 ≤ j) ∧ (j ≤ b) ∧ (0 < c1) ∧ (0 <
c2) ∧ c1+ c2 > −[a ln a+(1−a) ln(1−a)](1−a)2(1−b) ⇒
(lim (휆n. ℙ {s|fst
(num_of_repsoln na2 a j c1 c2 s)>0}) = 0)
The proof of Theorem 16 is very similar to Theorem 13.
Though, in this case, we use the basic probability axiom
(∀퐴.0 ≤ 푃푟(퐴)) and the fact that the limit value of the
tail distribution bound of 푃푟(푈 > 0), given in Theorem
15, is 0 since the expression is less than 1.
The above results clearly demonstrate the effectiveness
of theorem proving based reliability analysis. Due to the
formal nature of the models, the high expressiveness
of higher-order logic, and the inherent soundness of
theorem proving, we have been able to verify generic
properties of interest that are valid for any given mem-
ory array with 100% precision; a novelty which is not
available in simulation. Similarly, we have been able to
formally analyze properties that cannot be handled by
model checking. The proposed approach is also superior
to the paper-and-pencil proof methods in a way as the
chances of making human errors, missing critical as-
sumptions and proving wrongful statements are almost
nil since all proof steps are applied within the sound core
of the HOL theorem prover. These additional benefits
come at the cost of the time and effort spent, while
formalizing the memory array and formally reasoning
about its properties. But, the fact that we were building
on top of already verified probability and reliability
theory foundations, described in Sections 3 and 4, helped
significantly in this regard as the memory analysis only
consumed approximately 250 man-hours and 3500 lines
of HOL code.
6 CONCLUSIONS
In this paper, we utilized the probability theory formal-
ized in higher-order-logic to construct a formal reliability
analysis approach. The main idea behind this approach
is to use formalized random variables to model systems
and to verify the corresponding reliability characteristics
in a theorem prover. We also formalized the definition of
reliability and formally verified the Markov and Cheby-
shev’s inequalities, which play a vital role in reliability
analysis. Because of the formal nature of the models,
the proposed reliability analysis is free of approximation
and precision errors and due to the high expressive
nature of higher-order logic a wider range of systems
can be analyzed. This makes the proposed approach very
promising for the reliability analysis of safety critical and
highly sensitive engineering and scientific applications.
The proposed approach was used to analyze the re-
pairability problem of reconfigurable memory arrays in
the presence of stuck-at and coupling faults. We first
developed a higher-order-logic based formal stuck-at
and coupling fault model for reconfigurable memory
arrays, and based on this model we formally verified
some key statistical properties and the repairability and
irrepairability conditions. The formally verified expec-
tation and variance properties and the Markov and
Chebyshev’s inequalities greatly helped us to speed
up the analysis process. The results obtained are 100%
precise and confirmed the results obtained via analyti-
cal approaches. Another distinguishing feature of these
properties is their generic nature, i.e., they can be utilized
to assess the reliability of any reconfigurable memory
IEEE TRANSACTIONS ON COMPUTERS 14
array. The successful handling of this real-world reliabil-
ity analysis problem by the proposed approach clearly
demonstrates its feasibility for other reliability analysis
issues. To the best of our knowledge, this is the first
study on using formal methods for the reliability analy-
sis of reconfigurable memory arrays with both stuck-at
and coupling faults.
The fundamentals associated with theorem proving
based reliability analysis, presented in this paper, can
certainly be applied to many other domains besides
the illustrative example of memory arrays. An ongoing
project in our research group is to utilize the formalized
probability theory to analyze the reliability of Boolean
logic circuits. The approach, mainly inspired from the
probability gate models (PGM) based reliability analysis
[14], utilizes the formalized Bernoulli random variables
to model the gate failure phenomena and the input
arrival patterns at the logic gates. The reliability of
a component can be formally defined in this case as
the probability of having a correct output. The goal of
this work is to formally verify probabilistic properties,
by building on top of the infrastructure presented in
Sections 3 and 4, associated with the reliability of com-
monly used logic circuits like decoders, multiplexors and
adders. The main benefits of conducting such analysis
using theorem proving include the accuracy of the re-
sults and the generic nature of the properties.
The proposed approach is certainly not mature enough
to handle all kinds of reliability problems. We do not
have the formalization infrastructure to express and
reason about statistical properties related to continuous
random variables yet. Similarly, we also lack the for-
malization of other integration theory related reliability
characteristics, such as mean time to failure (MTTF) and
hazard rates [29]. Though, the higher-order-logic formal-
ization of a domain independent integration theory, like
the Lesbesgue’s integration, can pave the path to resolve
these bottlenecks. A major limitation of our approach
is the associated user interaction, i.e., the user needs to
guide the proof tools manually since we are dealing with
higher-order logic, which is known to be non-decidable.
This is one of the costs that the designers need to pay
for attaining 100% accurate results.
REFERENCES
[1] L. Devroye, Non-Uniform Random Variate Generation. Springer-
Verlag, 1986.
[2] D. MacKay, Introduction to Monte Carlo Methods, in Learning
in Graphical Models, NATO Science Series. Kluwer Academic
Press, 1998, pp. 175-204.
[3] Mars Polar Lander, http://mpfwww.jpl.nasa.gov/msp98/, 2008.
[4] A. Gupta, Formal Hardware Verification Methods: A Survey, Formal
Methods in System Design, vol. 1, no. 2-3, pp. 151-238, 1992.
[5] E. Clarke, O. Grumberg, and D. Long, Verification Tools for Finite
State Concurrent Systems, in A Decade of Concurrency-Reflections
and Perspectives, ser. LNCS, vol. 803. Springer, 1993, pp. 124-175.
[6] M. Gordon, Mechanizing Programming Logics in Higher-0rder Logic,
in Current Trends in Hardware Verification and Automated
Theorem Proving. Springer, 1989, pp. 387-439.
[7] J. Hurd, Formal Verification of Probabilistic Algorithms, PhD Thesis,
University of Cambridge, Cambridge, UK, 2002.
[8] O. Hasan. Formal Probabilistic Analysis using Theorem Proving. PhD
Thesis, Concordia University, Montreal, QC, Canada, 2008..
[9] P. Billingsley, Probability and Measure. John Wiley, 1995.
[10] A. Miczo, Digital Logic Testing and Simulation. Wiley Interscience,
2003.
[11] S. Kuo and W. Fuchs, Efficient Spare Allocation for Reconfigurable
Arrays, IEEE Design and Test of Computers, vol. 4, no. 1, pp.
24-31, 1987.
[12] M. Gordon and T. Melham, Introduction to HOL: A Theorem Prov-
ing Environment for Higher-Order Logic. Cambridge University
Press, 1993.
[13] S. Krishnaswamy, G.F. Viamonte, I.L. Markov, and J. P. Hayes,
Accurate Reliability Evaluation and Enhancement via Probabilistic
Transfer Matrices, in Proc. Design, Automation and Test in Eu-
rope, 2005, pp. 282–287.
[14] J. Han, E. Taylor, J. Gao, and J. Fortes, Faults, Error Bounds and
Reliability of Nanoelectronic Circuits, in Proc. Application Specific
System Architectures and Processors, 2005, pp. 247-253.
[15] M.R. Choudhury, K. Mohanram, Reliability Analysis of Logic
Circuits, IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, vol.28, no.3, pp. 392-405, 2009.
[16] D. Bhaduri and S. Shukla, NANOPRISM: A Tool for Evaluating
Granularity Versus Reliability Trade-offs in Nano Architectures., in
Proc. Great Lakes Symposium on VLSI, 2004, pp. 109-112.
[17] G. Norman, D. Parker, M. Kwiatkowska, and S. Shukla, Evaluat-
ing the Reliability of NAND Multiplexing with PRISM, IEEE Trans.
on Computer-Aided Design of Integrated Circuits and Systems,
vol. 24, no. 10, 2005, pp. 1629-1637.
[18] E. Clarke, O. Grumberg, and D. Peled, Model Checking. The MIT
Press, 2000.
[19] M. Nicolaidis, N. Achouri, and L. Anghel, A Diversified Memory
Built-in Self-repair Approach for Nanotechnologies, in Proceedings
of the 22nd IEEE VLSI Test Symposium, 2004, pp. 313-318.
[20] A. Sehgal, A. Dubey, E. Marinissen, C. Wouters, H. Vranken, and
K. Chakrabarty, Redundancy Modelling and Array Yield Analysis
for Repairable Embedded Memories, Computers and Digital Tech-
niques, IEE Proceedings, vol. 152, no. 1, pp. 97-106, 2005.
[21] W. Shi and W. K. Fuchs, Probabilistic Analysis and Algorithms
for Reconfiguration of Memory Arrays, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol.
11, no. 9, pp. 1153-1160, 1992.
[22] C. P. Low and H. W. Leong, Probabilistic Analysis of Memory Re-
configuration in the Presence of Coupling Faults, IEEE International
Workshop on Defect and Fault Tolerance in VLSI Systems, pp.
157-166, 1992.
[23] D. Blough, Performance Evaluation of a Reconfiguration-Algorithm
for Memory Arrays containing Clustered Faults, IEEE Transactions
on Reliability, vol. 45, no. 2, pp. 274-284, 1996.
[24] O. Hasan, N. Abbasi, and S. Tahar, Formal Probabilistic Analysis
of Stuck-at Faults in Reconfigurable Memory Arrays, in Integrated
Formal Methods, ser. LNCS, vol. 5423. Springer, 2009, pp. 277-
291.
[25] R. Yates and D. Goodman, Probability and Stochastic Processes: A
Friendly Introduction for Electrical and Computer Engineers. Wiley,
2005.
[26] A. Levine, Theory of Probability. Addison-Wesley series in Behav-
ioral Science, Quantitative Methods, 1971.
[27] J. Harrison, Theorem Proving with the Real Numbers. Springer, 1998.
[28] S. Richter, Formalizing Integration Theory, with an Application to
Probabilistic Algorithms, Diploma Thesis, Technische Universitat
Munchen, Department of Informatics, Germany, 2003.
[29] K. Tridevi, Probability and Statistics with Reliability, Queuing and
Computer Science Applications. Wiley-Interscience, 2002.
[30] M. Choi, N. Park and F. Lombardi, Hardware-Software Co-
Reliability in Field Reconfigurable Multi-Processor-Memory Systems.
in Proc. International Parallel and Distributed Processing Sym-
posium, 2002, IEEE Computer Society, pp. 170-184.
[31] J.R. Cavallaro anad I.D. Walker, A Survey of NASA and Military
Standards on Fault Tolerance and Reliability Applied to Robotics, in
Proc. AIAA/NASA Conference on Intelligent Robots in Field,
Factory, Service, and Space, 1994, American Institute of Aero-
nautics and Astronautics, pp. 282-286.
[32] M. Chang, W. K. Fuchs and J.H. Patel, Diagnosis and Repair of
Memory with Coupling Faults. IEEE Transaction on Computers.
vol. 38, no. 4 pp. 493-500, 1989.
