A two phase approach for checking sequence generation by Dinçtürk, Emre Mustafa & Dincturk, Emre Mustafa
A TWO PHASE APPROACH FOR CHECKING SEQUENCE
GENERATION
by
MUSTAFA EMRE DI˙NC¸TU¨RK
Submitted to the Graduate School of Engineering and Natural Sciences
in partial fulfillment of
the requirements for the degree of
Master of Science
Sabancı University
August 2009
A TWO PHASE APPROACH FOR CHECKING SEQUENCE GENERATION
APPROVED BY:
Assist. Prof. Dr. Hu¨snu¨ Yenigu¨n, (Thesis Supervisor)
. . . . . . . . . . . . . . . . . . . . . . . .
Prof. Dr. Kemal I˙nan
. . . . . . . . . . . . . . . . . . . . . . . .
Assoc. Prof. Dr. Albert Levi
. . . . . . . . . . . . . . . . . . . . . . . .
Assoc. Prof. Dr. Tonguc¸ U¨nlu¨yurt
. . . . . . . . . . . . . . . . . . . . . . . .
Assoc. Prof. Dr. Berrin Yanıkog˘lu
. . . . . . . . . . . . . . . . . . . . . . . .
DATE OF APPROVAL: . . . . . . . . . . . . . . . . . . . . . . . .
c© Mustafa Emre Dinc¸tu¨rk 2009
All Rights Reserved
A TWO PHASE APPROACH FOR CHECKING SEQUENCE
GENERATION
Mustafa Emre Dinc¸tu¨rk
Computer Science and Engineering, Master’s Thesis, 2009
Thesis Supervisor: Hu¨snu¨ Yenigu¨n
Keywords: FSM based testing, Checking Sequence, Random FSM Generation
Abstract
A new method for constructing a checking sequence for finite state ma-
chine (FSM) based testing is introduced. It is based on a recently sug-
gested method which uses quite a different approach than almost all the
methods developed since the introduction of the checking sequence gen-
eration problem around half a century ago. Unlike its predecessor which
aggressively tries to recognize the states by applying identification se-
quences, our approach relies on yet to be generated parts of the sequence
for this. The method may terminate without producing a checking se-
quence. We also suggest a method to check if a sequence is a checking
sequence for this purpose. If it turns out not be a checking a sequence,
a post processing phase extends the sequence further. We present the
results of an experimental study showing that our two phase approach
produces shorter checking sequences than the previously published meth-
ods. This experimental study is performed on FSMs that are randomly
generated by using a tool implemented within this work to support this
and other FSM based testing studies.
i
KONTROL DI˙ZI˙SI˙ U¨RETI˙MI˙ I˙C¸I˙N I˙KI˙ AS¸AMALI BI˙R
YAKLAS¸IM
Mustafa Emre Dinc¸tu¨rk
Bilgisayar Bilimi ve Mu¨hendislig˘i, Yu¨ksek Lisans Tezi, 2009
Tez Danıs¸manı: Hu¨snu¨ Yenigu¨n
Anahtar Kelimeler: SDM Bazlı Sınama, Kontrol Dizileri, Rastlantısal SDM
U¨retimi
O¨zet
Bu c¸alıs¸mada Sonlu Durum Makinaları (SDM) bazlı sınamada yeni bir
kontrol dizisi u¨retim yo¨ntemi verilmektedir. Bu yo¨ntem, yakın gec¸mis¸te
o¨ne su¨ru¨len ve problemin yaklas¸ık yarım asır o¨nce ortaya konulus¸undan
beri kullanılan tu¨m yo¨ntemlerden farklı bir yaklas¸ıma sahip yeni bir
yo¨ntemi temel almaktadır. Yenilik olarak, agresif bir s¸ekilde durum
belirleme dizileriyle durumların tanınması yerine, kontrol dizisine daha
sonra yapılacak eklentilerin bu sorunu c¸o¨zeceg˘i o¨ngo¨ru¨lmektedir. Ancak
bu yo¨ntemin kontrol dizisi u¨retememe ihtimali bulunmaktadır. Bu ne-
denle yine bu c¸alıs¸ma ic¸erisinde verilen bir dizinin kontrol dizisi olup
olmadıg˘ını kontrol eden bir yo¨ntem de gelis¸tirilmis¸tir. Eg˘er u¨retilen
dizinin bir kontrol dizisi olmadıg˘ı anlas¸ılırsa, dizi ikinci bir as¸amada
tekrar ele alınıp yapılan eklentilerle bir kontrol dizisı haline getirilmek-
tedir. Bu c¸alıs¸mada yeni yo¨ntemin mevcut yo¨ntemlere go¨re daha kısa
kontrol dizileri u¨rettig˘ini go¨steren deneysel c¸alıs¸malar da sunulmaktadır.
Bu deneysel c¸alıs¸malarda kullanılan Sonlu Durum Makinaları yine bu
c¸alıs¸ma su¨resinde gerc¸ekles¸tirilmis¸ bir rastlantısal SDM u¨retme aracı kul-
lanılarak u¨retilmis¸tir.
ii
Acknowledgments
I would like to state my gratitude to my supervisor, Hu¨snu¨ Yenigu¨n for every-
thing he has done for me, especially for his invaluable guidance, limitless support
and understanding.
I would like to thank Hasan Ural and Guy-Vincent Jourdan for supporting this
work with precious ideas and comments.
I would like to thank my family for never leaving me alone.
I would like to thank Gu¨lden Sarıcalı and Birol Yu¨ceog˘lu for giving me encour-
agement and motivation.
I would like to thank TUBI˙TAK for the financial support provided.
iii
Table of Contents
1 Introduction 1
2 Preliminaries 4
2.1 FSM Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Extending Next State and Output Functions . . . . . . . . . . 4
2.1.2 Some Properties of FSMs . . . . . . . . . . . . . . . . . . . . 5
2.2 Representing an FSM by a Directed Graph . . . . . . . . . . . . . . . 5
2.3 Distinguishing Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Preset Distinguishing Sequence . . . . . . . . . . . . . . . . . 7
2.3.2 Distinguishing Set (Adaptive Distinguishing Sequence) . . . . 7
2.4 Checking Sequences based on Distinguishing Sequences . . . . . . . . 7
3 Random FSM Generation 10
3.1 Component Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Free Edge and Set of Free Edges . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Existence of a Free Edge in a Strongly Connected Graph . . . 12
3.2.2 Existence of a Free Edge in a not Strongly Connected Graph . 14
3.3 Forcing Strongly Connectedness . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Finding a Set of Free Edges in a Component . . . . . . . . . . 15
3.3.2 Making a Graph Strongly Connected . . . . . . . . . . . . . . 17
3.4 Forcing Initial Reachability . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.1 Method 1: Using a Backbone Component Graph . . . . . . . . 21
3.4.2 Method 2: Generate an Initial Reachable Graph with Random
Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Shuﬄing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iv
3.6 Providing Input/Output Probabilities . . . . . . . . . . . . . . . . . . 26
4 Checking if a Sequence is a Checking Sequence 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 State Recognition Using Uncertainty Automaton . . . . . . . . . . . . 29
4.3.1 Candidate Elimination Using Incompatible Sets . . . . . . . . 32
4.3.2 Candidate Elimination Using Candidate Trial . . . . . . . . . 38
4.3.3 Using Candidate Elimination Methods Together . . . . . . . . 43
4.4 Thoughts on Uncertainty Automaton . . . . . . . . . . . . . . . . . . 43
5 Overview of Sima˜o et al.’s Method 45
6 Our Checking Sequence Generation Method 47
6.1 Phase 1: Sequence Generation . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Phase 2: Extending Sequence Q to a Checking Sequence . . . . . . . 52
6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3.1 Comparison with Sima˜o et al.’s Method . . . . . . . . . . . . 55
6.3.2 Contributions of Phase 1 and Phase 2 . . . . . . . . . . . . . . 59
6.3.3 Effect of Candidate Elimination Using a Set of Incompatible
Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7 Conclusion 65
v
List of Figures
2.1 FSM M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1 Initial Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Uncertainty Automaton after nodes merged . . . . . . . . . . . . . . 31
4.3 Copy Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Final Uncertainty Automaton . . . . . . . . . . . . . . . . . . . . . . 42
6.1 FSM M2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Final Uncertainty Automaton for Q generated in Phase 1 . . . . . . . 54
6.3 Final Uncertainty Automaton for Q′ = Qbab . . . . . . . . . . . . . . 55
6.4 Average CS Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 Our Method’s CS Lenghts as a Box Plot . . . . . . . . . . . . . . . . 58
6.6 Average Improvements Over Sima˜o et al.’s Method . . . . . . . . . . 60
6.7 Improvements Over Sima˜o et al.’s Method as a Box Plot . . . . . . . 61
6.8 Average Method Execution Times . . . . . . . . . . . . . . . . . . . . 62
6.9 Contributions of Phase 1 and Phase 2 to CS Length . . . . . . . . . . 62
6.10 Percentage Contribution of Phase 2 CS Length . . . . . . . . . . . . . 63
6.11 Distribution of Execution Time between Phase 1 and Phase 2 . . . . 63
6.12 Effect of Candidate Elimination Using a Set of Incompatible Nodes
on Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.13 Effect of Candidate Elimination Using a Set of Incompatible Nodes
on Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vi
List of Tables
4.1 Candidate Sets For the Uncertainty Automaton in Figure 4.1 after
d-recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Candidate Sets For the Uncertainty Automaton in Figure 4.2 . . . . . 31
4.3 Incompatible Sets For the Uncertainty Automaton in Figure 4.2 . . . 33
4.4 Candidate Sets for the Uncertainty Automaton in Figure 4.2 . . . . . 36
4.5 Candidate Sets For the Uncertainty Automaton in Figure 4.4 . . . . . 41
4.6 Incompatible Sets For the Uncertainty Automaton in Figure 4.4 . . . 41
4.7 Candidate Sets For the Uncertainty Automaton in Figure 4.5 . . . . . 42
4.8 Candidate Sets For the Uncertainty Automaton in Figure 4.6 . . . . . 43
6.1 Iteration 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Iteration 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Iteration 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.4 Iteration 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 Iteration 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Candidate Sets For the Uncertainty Automaton in Figure 6.2 . . . . . 53
6.7 Candidate Sets For the Uncertainty Automaton in Figure 6.3 . . . . . 54
6.8 Average CS Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.9 Average Improvements Over Sima˜o et al.’s Method . . . . . . . . . . 59
vii
Chapter 1
Introduction
A Finite State Machine (FSM) is an abstract structure with a finite set of states
where application of an input causes a state transition along with the production
of an output. FSMs are widely used to model systems in diverse areas such as
sequential circuits, communication and software protocols[4, 1, 7, 2, 21, 23, 18].
Many systems are implemented using FSM based models. As these systems became
more complicated and large, the research for techniques to ensure the reliability
of these systems gained importance. FSM based testing is a research area that is
motivated to answer these reliability demands.
In conformance testing, the aim is to ensure that an implementation conforms
to its specification. In other words, conformance testing tries to answer the question
if an implementation, that is intended to implement some specification, is a correct
implementation of its specification or not. When the specification of a system is
modeled as an FSMM then the implementation can also be considered as an FSMN
and the question becomes whether N is equivalent toM . By equivalence of FSMs it
is meant that if for any sequence of inputs that is defined inM , N produces the same
sequence of outputs asM . An Implementation Under Test (IUT) is considered to be
a black box. That is IUT is an FSM N with unknown transitions but it is generally
assumed to have at most as many states as M and to have the same input alphabet
as M . Thus the approach that is used to test an FSM based system is to apply
some inputs and observe the outputs produced by the IUT. Using only this output
observation the correct functioning of IUT is tried to be deduced by comparing
the outputs produced by the IUT against the expected outputs produced by the
1
specification FSM M . An input sequence that can determine if IUT is a correct or
faulty implementation of specification M is called a checking sequence.
An important problem in conformance testing is state verification. That is, a
mechanism is needed to know in which state the IUT is. This is necessary since
a checking sequence has to verify every transition of the specification FSM and
verification of a transition requires verification of the initial and the final states of
a transition. That is we need to know that IUT is in the correct state before an
input is applied (so that the we can know which output to expect) and reaches
to the correct state after the input is applied. State verification problem can be
solved using Preset Distinguishing Sequence (PDS) [9], Unique Input Output (UIO)
sequence [22] and Characterizing Set [9]. A PDS is an input sequence that produces
different outputs for different states. Therefore if the specification FSM has a PDS,
then the state verification problem is solved easily by applying the PDS at the state
to be verified. However not every minimal FSM has a PDS [15] and to determine if
an FSM has a PDS is a PSPACE-complete problem [16].
According to the survey in [17], the literature of conformance testing begins in
1950’s. In 1956 Moore’s paper on machine identification problem was published [19].
In his paper, he studied the problem of obtaining the state diagram of an unknown
FSM with given number of states by only observing its input output behavior. He
also stated the conformance testing problem. In 1964, Hennie proposed a method
using PDS for generating a checking sequence with length polynomial in length of
PDS and machine size [10]. Hennie’s method that uses PDS to generate checking
sequences is called D-method. He also gave an algorithm that generates exponen-
tially long checking sequences for the case when a distinguishing sequence cannot
be found. Later several other checking sequence generation methods that are based
on UIO sequences, characterizing sets and transition tours were proposed. These
methods are called U-Method [22], W-Method [4] and T-Method [20] respectively.
Although there were some studies in 70’s and 80’s, conformance testing became
a more active research area in the beginning of 90’s thanks to applications in testing
communication protocols. Especially distinguishing sequence based methods be-
came popular. The studies were focused on the improvement of previous methods
using global optimization techniques. In [2], using a graph theoretical approach, the
2
checking sequence generation problem modeled as a Rural Chinese Postman Prob-
lem. In [14, 11] this optimization model was further improved. In addition to that,
in [3] it is shown that some transition verification sequences could be eliminated
from the optimization model and in [26] the model is improved to produce shorter
checking sequences by making use of overlapping of distinguishing sequences. In [24],
Sima˜o et al. proposed an approach that is different than previous work. Instead of
trying global optimization, they designed an algorithm that makes local optimiza-
tion. With this approach, they achieved better results than global optimization
methods in most cases.
The contributions of this thesis to the conformance testing are threefold. First
we present the details of a tool that generates random FSMs that we require to
measure and compare the performances of checking sequence generation methods.
Second we present a method that attempts to determine if a given input sequence
is a distinguishing sequence based checking sequence or not. Lastly we present
a method that generates distinguishing sequence based checking sequences. Our
method is basically a modification of Sima˜o et al.’s method. Experiments show that
our method achieves an average reduction of at least 7% in checking sequence length
compared to Sima˜o et al.’s method.
The rest of this thesis is organized as follows. In Chapter 2, the basic informa-
tion on FSMs and conformance testing is provided. In Chapter 3, the details of our
random FSM generation tool is provided. In Chapter 4, our method to check if a
given sequence is a DS based checking sequence is presented in detail. In Chap-
ter 5, an overview of the Sima˜o et al.’s checking sequence generation method from
[24] is provided. In Chapter 6, we present details of our checking sequence gener-
ation method together with experimental results. Finally Chapter 7 contains the
concluding remarks.
3
Chapter 2
Preliminaries
2.1 FSM Fundamentals
An FSM (finite state machine) is specified by a tuple M = (S, s1, I, O, δ, λ) where
• S = {s1, s2, . . . , sn} is the finite set of states and n is the number of states
• s1 ∈ S is the initial state
• I is the finite set of inputs
• O is the finite set of outputs
• δ : S × I → S is the next state function
• λ : S × I → O is the output function
For two states si and sj, an input x and an output y if δ(si, x) = sj and λ(si, x) = y
then intuitively this means the machine M performs a transition from state si to
state sj when input x is applied and it produces output y as a response to this input.
We will also denote such a transition by (si, sj;x/y).
An input symbol x ∈ I is defined at state s if δ(s, x) and λ(s, x) are defined.
2.1.1 Extending Next State and Output Functions
The next state function δ and the output function λ can be extended to sequences
as follows. Let x ∈ I be an input symbol and X ∈ I∗ be an input sequence and
let xX ∈ I∗ denote the input sequence obtained by concatenation of x and X
4
(that is juxtaposition of input (output) sequences and input (output) symbols mean
concatenation) then
• δ(s, xX) = δ(δ(s, x), X) and
• λ(s, xX) = λ(s, x)λ(δ(s, x), X)
For the empty sequence ε we define δ(s, ε) = s and λ(s, ε) = ε. An input se-
quence X = x1x2 . . . xr ∈ I
∗ is defined at state s if ∀1 ≤ i ≤ r, xi is defined at
δ(s, x1x2 . . . xi−1)
2.1.2 Some Properties of FSMs
An FSM M is
• deterministic if for each state s ∈ S and for each input symbol x ∈ I, M
has at most one transition with start state s and input symbol x. Since
the transitions of an FSM are defined by a function, in our setting an FSM
is always deterministic. For nondeterministic machines, relations are used
instead of functions.
• completely specified if for each state s ∈ S and for each input symbol x ∈ I,
δ(s, x) and λ(s, x) are defined, that is when δ and λ are total functions.
• minimal if for any two different states si, sj ∈ S, there is an input sequence
X ∈ I∗ such that λ(si, X) 6= λ(sj, X).
• initially reachable if for each si ∈ S there exists some input sequence X ∈ I
∗
such that δ(s1, X) = si ( i.e. each state si ∈ S is reachable from the initial
state s1)
2.2 Representing an FSM by a Directed Graph
An FSM M can be represented by a directed graph G = (V,E) with set of vertices
V and a set of directed edges E. In such a graph, each edge e = (vj, vk;x/y) ∈ E
5
s1 s2
s3
b/0
a/0
b/1
a/1
b/0
a/1
Figure 2.1: FSM M1
with label x/y represents a transition t = (sj, sk;x/y) from sj to sk with input x
and output y. We will also use (vj, vk) to denote an edge when the edge label is not
important. The vertices vj and vk of e are called start and end of e respectively and
it is said that e leaves vj and enters vk. Two edges ej and ek are called adjacent if
end of ej and start of ek are same.
Any sequence of adjacent edges (not necessarily distinct) is called a path. We will
denote a path (n1, n2;x1/y1)(n2, n3;x2/y2) . . . (nr, nr+1;xr/yr) as P = (n1, nr+1;X/Y )
where X = x1x2 . . . xr and Y = y1y2 . . . yr. The nodes ni correspond to vertices of
G. Node n1 is the start of P and nr+1 is the end of P . Input output sequence X/Y
is called the label of P and X/Y is a transfer sequence from v1 to vr. X is the input
portion and Y is output portion of X/Y respectively.
In graph G, a vertex vk is reachable from vertex vj, represented as vj  vk, if
there exists a path P such that start of P is vj and end of P is vk. G is strongly
connected if ∀vj, vk ∈ V , vj  vk is satisfied. An FSM is strongly connected, if the
digraph representing it is strongly connected.
2.3 Distinguishing Sequences
The checking sequence generation methods that will be discussed in this thesis re-
quire existence of a distinguishing sequence. Distinguishing sequences are special
sequences used for state identification. Throughout thesis the phrase identification
sequence always refers to distinguishing sequence. There are two types of distin-
guishing sequences that are explained next.
6
2.3.1 Preset Distinguishing Sequence
A Preset Distinguishing Sequence (PDS) of an FSM M is an input sequence D in
response to which every state of M gives a distinct output sequence.
For instance ba is a PDS for FSM M1 shown in Figure 2.1.
• λ(s1, ab) = 00
• λ(s2, ab) = 11
• λ(s3, ab) = 10
2.3.2 Distinguishing Set (Adaptive Distinguishing Sequence)
A Distinguishing Set (or Adaptive Distinguishing Sequence – ADS) is multi-set of
input sequences D¯ = {Ds1 , Ds2 , . . . , Dsn} such that for any pair Dsi , Dsj ∈ D¯ there
exists a common prefix α of Dsi and Dsj such that λ(si, α) 6= λ(sj, α). The sequence
Dsi is called the ADS of state si.
For example, D¯ = {Ds1 , Ds2 , Ds3}, where Ds1 = a and Ds2 = Ds3 = ab, is a
distinguishing set for FSM M1 in Figure 2.1.
Note that PDS is a special case of ADS where for all states Dsi = D. Therefore
every FSM which has a PDS also has a distinguishing set. However the inverse is
not true. That is there exist FSMs with a distinguishing set but no PDS. Compared
to PDS, distinguishing sets have some advantages. Determining the existence of a
distinguishing set and finding one if exist is polynomial in number states and number
of inputs [16].
2.4 Checking Sequences based on Distinguishing
Sequences
Let M be a completely specified, minimal, deterministic and strongly connected
FSM that is represented by directed graph G = (V,E). Also let Φ(M) be the set
of FSMs such that each FSM N ∈ Φ(M) has at most as many states as M and has
the same input and output sets as M . FSMs M and N are said to be equivalent if
there does not exist an input sequence X such that λ(sM1 , X) 6= λ(s
N
1 , X) where s
M
1
7
and sN1 are the initial states of M and N respectively. If such an input sequence
X exists then X is said to distinguish M and N . A checking sequence of M is an
input sequence such that it distinguishes M from every FSM N ∈ Φ(M) that is not
equal to M . Hence in the context of conformance testing, when checking sequence
is applied on any faulty implementation N in Φ(M) the output produced by N will
be different than the output produced by specification M .
The main aspect of a checking sequence is that it defines a one to one and onto
function f between state set of specification M and state set of implementation N
and tries to show that if (sj, sk;x/y) is a transition inM then N has a corresponding
transition (f(sj), f(sk);x/y). Thus testing using a checking sequence requires the
concepts of state recognition and transition verification defined. We will define these
concepts using distinguishing sequence of FSM M as follows.
Let P = (n1, n2;x1/y1) (n2, n3; x2/y2) . . . (nr, nr+1;xr/yr) be a path in G from n1
to nr+1 with the label X/Y = x1x2 . . . xr/y1y2 . . . yr. Also let D¯ be a distinguishing
set of M . There are two types of recognition that we will define here, namely d-
recognition and t-recognition [25]. A vertex in P is said to be recognized as some state
of M if it is either d-recognized or t-recognized where d-recognition and t-recognition
are defined as follows,
• a node ni of P is d-recognized as state s of M if ni is start of a subpath of P
with label Ds/λ(s,Ds)
• a node ni of P is t-recognized as state s of M if there are two subpaths
(nq, ni;X
′/Y ′) and (nj, nk;X
′/Y ′) of P such that nq and nj are recognized
as s′ of M , nk is recognized as state s of M
In addition to that a transition verification is defined as follows. A transition
t = (s, s′;x/y) of M is verified (in P ) if there is an edge (ni, ni+1;x
′/y′) of P such
that nodes ni and ni+1 are recognized as states s and s
′ of M respectively and
x′/y′ = x/y.
The following theorem from [25] (rephrased in our notation) states a sufficient
condition for a checking sequence.
Theorem 1. Let X/Y be the label of a path P of directed graph G (for FSM M)
such that every transition is verified in P . Then X (i.e. the input portion of label
8
of P ) forms a checking sequence of M .
9
Chapter 3
Random FSM Generation
Measuring and comparing the performances of a checking sequence generation al-
gorithms generally require experimentation of the method on a set of FSMs. All
checking sequence generation methods, including the methods discussed in this the-
sis, require these FSMs to have some properties. For example a method may require
an FSM to be deterministic, completely specified, strongly connected, minimal and
having a preset distinguishing sequence. Since these FSMs will be used for exper-
imental purposes, it is also very important for the FSMs to have the element of
randomness in their structure as much as possible so that they are still able to rep-
resent all possible FSMs with desired properties in a just manner. For this reason,
we developed a tool that can generate deterministic and completely specified ran-
dom FSMs with given number of states, number of input symbols and number of
output symbols and having any of the following properties listed below
• Being strongly connected (or not)
• Being initially reachable (or not)
• Being minimal (or not)
• Having a preset distinguishing sequence (or not)
• Having an adaptive distinguishing sequence (or not)
Among these properties, strongly connectedness, initial reachability and having pre-
set distinguishing sequence turned out to be very difficult to satisfy when it is left
to pure chance. In other words, assigning transitions randomly between states was
10
not very efficient to generate FSMs with mentioned properties. Thus for these prop-
erties, after initial assignments of the transitions, the tool allows a post processing
step to be applied on the generated random FSM to force the FSM to have the
desired property. In the following sections the details of this post processing steps
are explained for each property. However before examining post processing, below
is the process of initial assignment of transitions explained as pseudo code.
Algorithm 1: Random Assignment of Transitions
Input: S finite set of states
Input: I finite set of input symbols
Input: O finite set of output symbols
Output: T list of transitions of a completely specified, deterministic FSM
with randomly assigned transitions
T = ∅;1
foreach state s ∈ S do2
foreach input x ∈ I do3
choose a random output symbol y from O;4
choose a random destination state s′ from S;5
T = T ∪ {(s, s′;x/y)};6
Since a new transition is created for each state and input symbol pair, the com-
plexity of random assignment of transitions is O(np) where |S| = n and |I| = p.
3.1 Component Graph
The component graph, sometimes called condensation, of a digraph G is directed
acyclic graph that have a vertex for each strongly connected component of G and the
edges in component graph represents the connectivity between these components.
A more formal definition is given below.
Definition 1. Assuming that there are m strongly connected components of G =
(V,E) then the component graph of G is defined as G¯ = (V¯ , E¯) where V¯ = {c1, c2, ...,
cm} denotes the set of strongly connected components such that V¯ is a partition of
V and E¯ is defined as E¯ = {(ci, cj)|ci 6= cj,∃vi ∈ ci, vj ∈ cj s.t. (vi, vj) ∈ E}.
11
In other words, each vertex in G¯ corresponds to a subset of vertices in G and
there is an edge in G¯ from a vertex ci to another vertex cj, if in G there is an edge
from one of the vertices in ci to one of the vertices in cj.
3.2 Free Edge and Set of Free Edges
Let’s define an edge e = (vi, vj) of G where vi ∈ ci, as a free edge if the component
ci remains strongly connected when e is removed from G. Formally
Definition 2. e is a free edge in G if G¯ = (V¯ , E¯) and G¯′ = (V¯ ′, E¯ ′) satisfy V¯ = V¯ ′
where G′ = (V,E ′) and E ′ = E \ {e}.
In the following sections set of free edges for graph G will be denoted as F .
3.2.1 Existence of a Free Edge in a Strongly Connected
Graph
Below we present a proof for existence of at least one free edge in a strongly connected
graph G = (V,E) where |E| ≥ 2/times|V |.
Definition 3. Let G = (V,E) be a digraph. For a subset of the nodes Γ, Γ contrac-
tion of G is defined as G(Γ) = (V ′, E ′) where V ′ = (V \ Γ) ∪ {γ} and
E ′ = {(u, v)|u, v 6∈ Γ, (u, v) ∈ E} ∪
{(u, γ)|u 6∈ Γ, v ∈ Γ, (u, v) ∈ E} ∪
{(γ, v)|u ∈ Γ, v 6∈ Γ, (u, v) ∈ E}
Intuitively, in G(Γ) all the nodes in Γ are removed and they are represented by
a new fresh node γ. Those edges in G that are not from or to a node in Γ are
preserved in G(Γ). The edges between two nodes in Γ are removed in G(Γ). An
edge between a node in Γ and a node not in Γ is replaced by an edge using the node
γ instead of the node in Γ.
Lemma 2. Let G = (V,E) be a digraph and Γ ⊆ V be a subset of V . For two nodes
u, u′ ∈ V \Γ, if there exists a path u u′ in G, then there also exists a path u u′
in G(Γ).
12
Proof. If the path u  u′ does not go through a node in Γ, then all the edges in
u  u′ also exist in G(Γ). Otherwise let u  v (v′  u′, resp.) be the shortest
prefix (the shortest suffix of, resp.) u u′ such that u, v′ ∈ Γ. By using Lemma 3
(Lemma 4, resp.), there exist a path u γ (γ  u′, resp.) in G(Γ). Hence we have
the path u γ  u′ in G(Γ).
Lemma 3. Let G = (V,E) be a digraph and Γ ⊆ V be a subset of V . For a node
u ∈ V \Γ, if there exists a path u u′ to a node u′ ∈ Γ in G, then there also exists
a path u γ in G(Γ).
Proof. Consider the shortest prefix u v of the path u u′ such that v ∈ Γ. Let
u v′ be the path u v where the last edge (v′, v) is removed. Since none of the
nodes along the path u  v′ are in Γ, the edges on this path also exist in G(Γ).
Therefore we have the path u  v′ also in G(Γ). Since v′ 6∈ Γ, v ∈ Γ, (v′, v) ∈ E,
we have the edge (v′, γ) in G(Γ). Thus by combining the path u v′ and the edge
(v′, γ) in G(Γ), the desired result is obtained.
Lemma 4. Let G = (V,E) be a digraph and Γ ⊆ V be a subset of V . For a node
u ∈ V \ Γ, if there exists a path u′  u from a node u′ ∈ Γ in G, then there also
exists a path γ  u in G(Γ).
Proof. Consider the shortest suffix v  u of the path u′  u such that v ∈ Γ. Let
v′  u be the path v  u where the first edge (v, v′) is removed. Since none of the
nodes along the path v′  u are in Γ, the edges on this path also exist in G(Γ).
Therefore we have the path v′  u also in G(Γ). Since v′ 6∈ Γ, v ∈ Γ, (v, v′) ∈ E,
we have the edge (γ, v′) in G(Γ). Thus by combining the edge (γ, v′) and the path
v′  u, the desired result is obtained.
Lemma 5. Let G = (V,E) be a digraph and Γ ⊂ V be a subset of V . If G is
strongly connected then so is G(Γ).
Proof. Consider two nodes u, v 6∈ Γ. Since G is strongly connected, we have a path
u v existing in G. By using Lemma 2, we also have such a path in G(Γ). Consider
now a node u 6∈ Γ. There must exist a path u γ in G(Γ). To see this consider a
node v ∈ Γ. Since G is strongly connected, there is a path u  v in G. By using
Lemma 3, the desired result is obtained. Finally, the existence of a path γ  u can
be shown by using a similar reasoning and Lemma 4.
13
Lemma 6. Let G = (V,E) be a strongly connected digraph with |E| ≥ 2×|V |. Then
there exists at least one free edge in G.
Proof. The proof is by induction on |V |. For |V | = 1 it is trivial to see that the claim
holds. Let us consider the case |V | > 1. If G has a loop (that is if (v, v) ∈ E for some
v ∈ V ) or if G has parallel edges (that is if there are multiple edges between the
same pair of nodes), then we can remove the loop or one of the parallel edges and the
graph will still be strongly connected. Suppose G has no loops and it has no parallel
edges. Let Γ = {v1, v2, . . . , vm} ⊆ V be the nodes of a smallest cycle (i.e. a cycle
with the smallest number of vertices) in G. As G has no loops, m ≥ 2. Without
loss of generality assume that, ∀1 ≤ i < m, (vi, vi+1) ∈ E and (vm, v1) ∈ E. Note
that these edges must be the only edges between the nodes of Γ. In other words,
for three different nodes vi, vj, vk ∈ Γ it is not possible to have (vi, vj), (vi, vk) ∈ E
since Γ wouldn’t be a smallest cycle otherwise. Therefore there are exactly m edges
between the vertices in Γ.
Let us now consider G(Γ). Since there are exactly m edges between the vertices
in Γ, there are |E|−m edges in G(Γ). The number of vertices in G(Γ) is |V |−m+1.
First of all, the number of edges in G(Γ) is more than two times the number of
nodes in G(Γ), i.e. |E| −m ≥ 2× (|V | −m+ 1) since |E| ≥ 2× |V | and m ≥ 2.
Furthermore by using Lemma 5, it is known that G(Γ) is strongly connected as
well.
Finally, (|V | −m + 1) < |V | since m ≥ 2 and therefore by using the induction
hypothesis the proof is completed.
3.2.2 Existence of a Free Edge in a not Strongly Connected
Graph
Below we show that there exists at least one edge in a not strongly connected graph
if the graph has nodes with outdegree greater than 1.
Theorem 7. Let G = (V,E) be a digraph where each node has the same outdegree
k ≥ 2 and let G′ = (V ′, E ′) be a strongly connected component of G. If G is not
strongly connected, then there exists at least one free edge (u, v) in G where u ∈ V ′.
14
Proof. If there exists an edge (u, v) ∈ E where u ∈ V ′ and v ∈ V \ V ′, then (u, v) is
a free edge. If there is no such edge, then |E ′| = k× |V ′| ≥ 2× |V ′|. In this case by
using Lemma 6, there is a free edge (u, v) in G′ where u, v ∈ V ′.
3.3 Forcing Strongly Connectedness
If the user wants the generated FSM to be strongly connected, tool gives user an
option of forcing strongly connectedness of the generated FSM by a post processing
step rather than waiting for a strongly connected FSM to be generated by random
assignment of transitions only. If this option is enabled, tool generates a random
FSM by randomly assigning transitions and checks whether it is strongly connected.
If it is not then the post processing to make the FSM strongly connected begins.
Details of this process are explained in this section. Note that since an FSM can be
represented as a directed graph, the process will be explained as a graph algorithm
considering the underlying graph representation of the FSM.
3.3.1 Finding a Set of Free Edges in a Component
The problem of finding a set of free edges for a strongly connected component as
large as possible is directly related to Minimum Equivalent Graph (MEG) problem.
MEG problem is defined as follows. Given a directed graph G(V,E) find the small-
est subset E ′ of E such that E ′ still keeps the same reachability relations between
vertices in V . When MEG problem is restricted to strongly connected graphs then
it is called the minimum Strongly Connected Spanning Subgraph (SCSS) problem
which is NP-HARD [8]. As you may notice if we can find a solution to the minimum
SCSS problem for a component ci then we can find a set of free edges with maxi-
mum cardinality for ci and vice versa. That is because if E
′ is the solution to the
minimum SCSS problem for a strongly connected component ci of G(V,E) and if
Ei ⊂ E is defined as Ei = {(vi, vj)|vi ∈ ci} then (Ei \ E
′) is a set of free edges with
maximum cardinality for ci.
Although finding a set of free edges with maximum cardinality for a strongly con-
nected component is NP-HARD, we still want to find as many free edges as possible.
For this reason we use a very simple heuristic. When finding F we iterate on each
15
edge e = (vi, vj) ∈ E. If vi, vj ∈ ci we remove e and check if vj is still reachable from
vi. If it is reachable then e is a free edge and included in F , otherwise we put e back.
However there are cases where the reachability check can be skipped and an edge
can be included in F directly. One such case is when vi = vj, that is e is a self-loop
and it is guaranteed to be a free edge. Also any edge e satisfying vi ∈ ci, vj /∈ ci
directly included in F since in that case e is an edge going to a vertex outside ci and
does not affect the strongly connectedness of ci. Algorithm 2 describes this process
formally. Note that except these two cases, if an edge e happens to be a free edge
and thus is included in F and removed from E, an edge e′ 6= e which has not been
considered yet and was previously a free edge before removal of e, might not be a
free edge anymore. For that reason, the order in which the free edges are considered
and included in F becomes important. In our implementation, since we want to
affect randomness of the generated FSM as little as possible, we consider edges in a
random order for inclusion in F .
Algorithm 2: Find Set of Free Edges
Input: G = (V,E) graph
Output: F set of free edges for G
F = ∅;1
E ′ = E;2
foreach edge e = (vi, vj) ∈ E in some random order do3
Let ci and cj be the components in G s.t. vi ∈ ci and vj ∈ cj;4
if vi = vj OR ci 6= cj OR vi  vj in G
′ = (V,E ′ \ {e}) then5
F = F ∪ {e};6
E ′ = E ′ \ {e};7
The complexity of finding a set of free edges is analyzed as follows. Finding
a set of free edges in a graph is performed by removing an edge and checking the
reachability condition. After an edge e = (v, v′) is removed, checking if v′ is still
reachable from v takes O(V +E) time using breadth first search. In the worst case
the algorithm may try to remove all edges and check for reachability. Hence the
complexity is O((V + E)E). Since in our case the graph represents a completely
specified FSM with n states and p inputs, that is |V | = n and |E| = np, the
16
complexity is O((n+ np)np) = O(n2p2).
3.3.2 Making a Graph Strongly Connected
Making a graph strongly connected is an iterative process such that after each
iteration the number of strongly connected components of the graph either reduces
or stays same. The process terminates when the number of strongly connected
components reduces to 1 and thus graph becomes strongly connected. To achieve
this, the aim in each iteration is to find a set of free edges of the current graph
and assign new destinations for each of them hoping that these new assignments
will create new connections between components and reduce the number of strongly
connected components. Note that Theorem 7 guarantees that if G = (V,E) is not
strongly connected then Algorithm 2 will find at least one free edge in each and
every strongly connected component of G. Notice that by definition a free edge
has no effect on the strongly connectedness of any component. Thus changing the
destinations of free edges never has the risk of increasing the number of components.
To be more clear and give the main idea, a more formal description of the algorithm
is presented in Algorithm 3.
Algorithm 3: Make Graph Strongly Connected
Input: G = (V,E) not strongly connected graph
Output: G∗ = (V,E∗) strongly connected graph obtained by changing
destination vertices of some edges in G
G∗ = G;1
while G∗ is not strongly connected do2
G¯∗(V¯ , E¯∗) = component graph of G∗;3
find a set of free edges F of G∗;4
remove F from E∗;5
foreach edge (vi, vj) ∈ F do6
pick a random component c ∈ V¯ ∗;7
pick a random vertex v ∈ c;8
E∗ = E∗ ∪ {(vi, v)};9
One important thing to notice is that the new destination for a free edge is
17
determined by firstly choosing a random component and then a random destination
vertex within that component rather than choosing a random vertex in the graph
directly. Also notice that we have no restrictions on which component to choose,
so it can be the case that new destination for the free edge might be in the same
component as the source of the free edge. Although in such a case, no connection
is created between components, nevertheless the effect of new assignments on the
randomness of the graph is much less. In addition to that, choosing the component
of destination vertex first increases algorithm’s chances for increasing the number of
connections between components over the chance of choosing a destination within
the same component as the source of free edge. Let’s see how this is so. Assume
that a graph G with n vertices initially have m strongly connected components
V¯ = {c1, c2, ..., cm} and some component ci satisfies ∀j, j 6= i, n > |ci| >> |cj|.
That is ci is a very large component compared to all other components in terms
of number of the vertices it contains. Also let’s assume that the component with
the smallest cardinality is cm and consider the chances of assigning a free edge of ci
to cm. If we had chosen a vertex in the graph directly as the new destination of a
free edge, a free edge whose source is in ci will be assigned to a new destination in
component cm with a probability of |cm|/n. Since n >> |cm|, probability of creating
a connection from the large component ci to the smallest component cm will be very
small. However in our method, by choosing the component for the destination first,
the probability of connection from the ci to the cm becomes 1/m which is in practice
much greater than |cm|/n.
Algorithm 3, although gives the main idea of our implementation, does not re-
flect the details correctly. In each iteration of the algorithm, it seems strongly
connected components and set of free edges are computed from scratch for graph
G∗ = (V,E∗). Computing these in each iteration can be very time consuming if G∗
is large. Because of this, our implementation follows a different way, while doing the
same thing in essence. Instead of working each time on the original graph, starting
from the original graph, in each iteration we always work on the component graph
of the previous iteration. Thus we are trying to make the component graph strongly
connected which is actually same thing as making the original graph strongly con-
nected. Thus after an iteration, if some components form a new strongly connected
18
component, the size of the graph we are working on reduces. However working on
a new component graph in each iteration, instead of the original graph, requires
us to remember the vertices within the components so that the changes that are
made on the graph used in current iteration could be mapped to the graph on the
previous iteration. For this reason, we use a stack that stores the vertices within the
components and the free edges used in an iteration. When the last iteration finishes
and the graph reduces to a single component, using the information stored in the
stack, we are able to change the edges of the all previous iterations and including
the initial graph so that it is now strongly connected.
In order to analyze the running time of the Algorithm 3, we need to know that
how many times while loop iterates. We already know the running time of each
step within while loop. The most expensive step happens to be finding free edges
of a graph which has running time O(n2p2) and dominates other steps. However
we do not know how many times while loop will iterate exactly since the algorithm
is probabilistic. Although in theory while loop may iterate infinitely many times,
it will iterate until the number of strongly connected components reduces to 1. In
the worst case scenario, initially we may have all vertices as a separate component
hence there can be at most |V | = n components. Further in the worst case scenario
we assume that each component has only one free edge. Then by assigning new
destinations to free edges, algorithm tries to create a cycle in the component graph.
When a cycle is formed the components in the cycle becomes connected and number
of strongly connected components reduces. For the worst case scenario we can
calculate the probability of creating a cycle in the component graph and denote
it as P . A rough calculation shows that P > (n − 1)!(n − 1)/2nn−1. Also the
expected worst case running time of the algorithm E can be found using E = T/P
where T is the running time of a single iteration. Hence the expected running time
is O(n2p2/((n − 1)!(n − 1)/2nn−1)) which is O(nn). Although worst case expected
running time of the algorithm is very large, note that this is a very loosely calculated
bound which considers a very extreme case. In practice the algorithm terminates
in feasible time (for instance it takes approximately 1 second to generate a strongly
connected FSM with 10000 states 5 inputs and 5 outputs).
19
3.4 Forcing Initial Reachability
Some checking sequence generation methods assume a reliable reset feature in the
implementation. This feature guarantees that no matter at which state the machine
currently is, applying a special input, called the reset input, takes the machine to
the initial state.
Such a reset transition is modeled in a specification by a transition from each
state to the initial state. The existence of these reset transitions relaxes the con-
ditions on the other transitions. More explicitly stated, the machine has to be
strongly connected. However for being strongly connected, it is now sufficient to be
initially reachable only, i.e. all states must be reachable from the initial state. This
condition combined with the reset transitions from all the states back to the initial
state guarantees that the machine is strongly connected. To support the research
for checking sequence generation under the assumption of reliable reset transitions,
our random FSM generation tool supports generation of initially reachable but not
strongly connected FSMs as well.
If an initially reachable FSM is desired, tool has two different methods of making
a graph initially reachable. Which method to use is selected by user. Notice that a
strongly connected FSM is also initially reachable. Because of that making an FSM
initially reachable is only necessary when a not strongly connected FSM is desired.
Before explaining methods in detail, we need to establish an important property
of initially reachable graphs.
Theorem 8. The component graph G¯ = (V¯ , E¯) of an initially reachable graph
G = (V,E) have only one vertex with indegree 0 and it contains the initial vertex.
Proof. Consider the component ci that contains the initial vertex. That means all
components in V¯ \ {ci} are reachable from ci. Firstly notice that ci cannot have an
incoming edge so its indegree is 0. This can be shown by a simple contradiction.
If there had been an incoming edge (cj, ci) then that edge would form a cycle in
component graph since cj is reachable from ci. Since a component graph is an
acyclic graph by definition, a contradiction is reached. Secondly for all components
in V¯ to be reachable from ci each one must have at least one incoming edge because
a component with no incoming edge cannot be reached from another component.
20
These two facts prove that all vertices in V¯ except ci have indegree greater than
0.
3.4.1 Method 1: Using a Backbone Component Graph
In this method, the user is given some control on the structure of component graph
of the random graph that will be generated. Besides other inputs (number of states,
number of input symbols and number of output symbols), the user can give number
of strongly connected components and the number of vertices (states) within each
component as input. Then according to this component structure given by the user,
edges between these components are decided in a manner that makes the component
graph initially reachable. This component graph is called the backbone component
graph since any graph which has the same connections between its components as
the backbone component graph is guaranteed to be initially reachable.
Notice that there can be many different backbone component graphs for a given
number of components. For this reason generation of a backbone component graph
is a process that results in one of the possible backbones by some random selection
of edges between components.
Backbone Generation Assume that user wants m strongly connected compo-
nents denoted as V¯ = {c1, c2, ..cm}. We first need to assign an order to each compo-
nent. Since we represent a component ci with an integer index i, let’s use natural
order of integers as the order of components. Then we assign edges of the backbone
component graph G¯ = (V¯ , E¯) such that they satisfy following conditions.
1. ∀j > 1 ∃i s.t. i < j and (ci, cj) ∈ E¯
2. ∀j¬∃i s.t. i > j and (ci, cj) ∈ E¯
Simply, what these conditions establish are as follows. In condition 1 it is established
that all components, except c1, have at least one incoming edge from another com-
ponent which is smaller in the ordering of components. That is all components are
reachable from c1. Condition 2 states that there can be no edge from a component
with some large order to a component with a smaller order. This guarantees that
21
there is no cycle in the graph as a component graph must be acyclic. The algorithm
for generating backbone is given in Algorithm 4.
Algorithm 4: Generate Backbone Component Graph
Input: m number of strongly connected components
Output: G¯ = (V¯ , E¯) backbone component graph
V¯ = {c1, c2, ..., cm};1
E¯ = ∅;2
for i = 2 to m do3
choose some nonempty random subset s of {1, ..., i};4
foreach cj s.t. j ∈ s do5
E¯ = E¯ ∪ (cj, ci);6
The complexity of generating a backbone component graph is O(m2), since for
each of the m components some edges are added from a subset of m components.
Generating an Initially Reachable Graph Now we can present the generation
of an initially reachable graph using the generated backbone component graph.
Algorithm 5 describes this process.
Here are some remarks about Algorithm 5.
• At line 1, generation of a random graph with strongly connected components
{c1, c2, ..., cm} each having size as given in N = {n1, n2, ..., nm} is achieved
as follows. Firstly for each ci a separate strongly connected graph with ni
vertices are generated using the tool. Then these m graphs are combined into
one graph that consists of these m individual graphs.
• In the for loop between lines 5-8, for each edge in the backbone graph, it is
made sure that the resulting graph has an edge between the corresponding
components. This is achieved by changing the destination vertex of a free
edge according to the edge in backbone graph and putting it back to set of
edges.
• At the last line, all remaining free edges inserted back into the graph after
their destinations are changed. Destinations are changed in such a way that
22
Algorithm 5: Generate Initial Reachable Graph Using Backbone Component
Graph
Input: N = {n1, n2, ..., nm} component sizes
Output: G = (V,E) initially reachable graph with components V¯
generate a random graph G(V,E) with |N | = m strongly connected1
components each containing nk vertices where 1 ≤ k ≤ m ;
generate a backbone component graph G¯ ;2
find a set of free edges F for G;3
remove F from E ;4
foreach edge (ci, cj) ∈ E¯ of G¯ do5
pick some random free edge (vi, vk) ∈ F s.t. vi ∈ ci ;6
pick some random vertex vj ∈ cj;7
E = E ∪ (vi, vj);8
add all remaining free edges to E after changing their destinations in a way9
that does not violate condition 2;
condition 2 is not violated, that is no cycle is introduced in the component
graph. Two approaches implemented to achieve this. In the first approach all
free edges are assigned destinations according to backbone graph whose edges
already satisfy condition 2 and in the second approach a free edge whose
source is in component ci is assigned to some random component cj such that
i < j. When the first approach is used, the component graph of the generated
random graph is same as the backbone component graph. However in the
second approach the component graph of the generated random graph may
contain connections that does not exists in the backbone component graph.
The complexity of Algorithm 5 is dominated by generatingm strongly connected
graphs in the first statement. Hence Algorithm 5 have the same complexity as
generating m strongly connected graphs.
23
3.4.2 Method 2: Generate an Initial Reachable Graph with
Random Components
When user does not care about the number of strongly connected components and
number of vertices in components, so he wants these parameters to be random as
well, then he can use the second method for generating an initially reachable random
graph. In this method firstly a not strongly connected random graph is obtained by
random assignment of edges. Then this graph is forced into an initially reachable
graph, if it is not initially reachable already.
Intuitively, the method works as follows. Let V¯0 ⊆ V¯ be the set of vertices
in the component graph with 0 indegree. Initially in the component graph there
are always more than one vertex with 0 indegree, since otherwise graph would be
already initially reachable. The main aim of the method is to reduce the cardinality
of V¯0 to one and thus making the graph initially reachable. In each iteration some
random vertex ci from V¯0 is chosen and it is removed from V¯0 after increasing its
indegree. Indegree of ci is increased by using free edges of some randomly chosen
subset of vertices which cannot be reached from ci. That is new connections are
made to ci from vertices that are not reachable from ci. It is important to make these
new connections from vertices that are not reachable from ci, since this guarantees
that we do not create a cycle in the component graph. Although at the end of the
iteration ci is removed from V¯0, this does not necessarily reduce the cardinality of
V¯0. This is because, the edges between vertices of two different components are free
edges by definition and since destinations of free edges are changed in order to make
new connections to ci, a vertex in the component graph may lose its only incoming
edge. Hence its indegree becomes 0 and it must be included in V¯0. For this reason,
at the end of each iteration V¯0 is updated along with the component graph G¯. Even
though theoretically algorithm does not have guarantee of termination, in practice
this does not seem to be a problem.
More formal description of the algorithm is presented in Algorithm 6.
Algorithm 6 is a probabilistic algorithm with large complexity. Although in
theory it has no guarantee for termination, in practice it terminates quickly.
24
Algorithm 6: Make a graph initially reachable
Input: G = (V,E) graph to make initially reachable
Result: G is initially reachable
G¯ = (V¯ , E¯) = component graph of G;1
find a set of free edges F for G;2
V¯0 = vertices in G¯ with 0 indegree;3
while V¯0 have more than one element do4
pick some random ci ∈ V¯0 ;5
V¯i = set of components not reachable from ci ;6
pick some random subset S of V¯i;7
foreach cs ∈ S do8
pick a free edge e = (vi, vj) such that vi ∈ cs;9
set destination of e to some randomly chosen vertex in ci;10
update G¯;11
update V¯0 ;12
3.5 Shuﬄing
To decrease the time spent to generate an FSM with a preset distinguishing sequence,
tool contains an option called shuﬄe. When user wants to generate a random FSM
with a preset distinguishing sequence, tool generates an initial FSM and checks if it
has a preset distinguishing sequence. What this option provides is that if the FSM
has not any distinguishing sequence then rather than creating a new FSM from
scratch, tool randomly assigns new input and output symbols for each transition
and checks again for the existence of a distinguishing sequence. This operation
called shuﬄing and takes less time than creating a new FSM from scratch. Notice
that during shuﬄing, sources and destinations of transitions are not changed. Thus
properties such as strongly connectedness and initial reachability is not affected
after shuﬄing. When this option is enabled user can also provide how many times
shuﬄing takes place before a new FSM created from scratch or an FSM with preset
distinguishing sequence is generated.
The running time of a single shuﬄe operation is O(np) where n is number of
25
states and p is number of input symbols. That is because every transition is con-
sidered once and there are np transitions in a completely specified, deterministic
FSM.
3.6 Providing Input/Output Probabilities
Recall that the assignment of output symbols to the transitions are performed ran-
domly. For each input symbol x and output symbol y, the number of x/y transitions
seen in the FSMs randomly generated in this way turns out be more or less the same.
To test a heuristic developed for generating UIO sequences based on the fre-
quency (how rare or how frequent) of transitions’ I/O labels [5], our tool has an
option that allows user to specify the probability for each I/O pair to be seen in the
FSM.
These probabilities are given in a regular text file that we call the i/o distribution
file. Each line of the file should be in the form i o p where i is an input symbol, o is an
output symbol and p is a probability as a percentage. Since tool generates completely
specified FSMs, number of transitions that have input symbol i is always same and
it is exactly number of states. That is because each state must have a transition
with input i in a completely specified FSM. On the other hand, no restriction exists
for output symbols. Then a line in the file means that among all transitions which
have input symbol i, p percent of them should have output symbol o in the FSM.
26
Chapter 4
Checking if a Sequence is a
Checking Sequence
4.1 Introduction
Given any input output sequence X/Y of a specification FSM M , it is desirable to
know whether X is a checking sequence of M or not. Further if it is known that X
is not a checking sequence of M , it seems beneficial to able to get some information
about how closeX is to a checking sequence ofM . For example, during the operation
of checking sequence generation algorithm, with this information algorithm will
be knowledgeable about how close the current sequence to a checking sequence
and will have the opportunity to use it as a guide to make decisions on how to
extend the current sequence so that generating a checking sequence is possible. The
checking sequence generation method that will be explained in Chapter 6 uses such
an approach.
In this section, we propose a distinguishing sequence based method which checks
if the input portion X of an input output sequence X/Y is a DS based checking
sequence of specification FSM M . If it is not, the method is still able to provide
some information about how close X is to a checking sequence.
27
4.2 Uncertainty Automaton
As explained in Section 2.4, a checking sequence for an FSM M distinguishes M
from all FSMs in the set Φ(M) where Φ(M) is the set of FSMs with at most as many
states asM and having the same input output sets. Hence to determine if the input
portion X of a input output sequence X/Y of M is a checking sequence, initially
we treat X/Y as an I/O sequence that is produced by an FSM in Φ(M). That is
initially we only assume that X/Y is a sequence that is produced by some unknown
machine N ∈ Φ(M) and what we want to know is that if N is equivalent to M or
not. Since X/Y is an I/O sequence, this sequence corresponds to some sequence of
transitions that visits a sequence of states of this unknown FSM N . Let’s consider
the path P = (n1, nr;X/Y ) where nodes ni represents states visited in N when X
is applied. If we can find a correspondence between the states of M and the nodes
in P and see that P verifies every transition of M then we can say that X is a
checking sequence of M . To find this correspondence between the states of M and
nodes in P , we consider P as a graph and call this as the uncertainty automaton. It
is called that way, since initially we do not know which node corresponds to which
state of M and there is the possibility that a node ni could be any of the states of
M . Hence we associate each node ni with a set of states that it may correspond to
and call that set as the candidate set of node ni. While we process the uncertainty
automaton we try to reduce the number of states in candidate sets of each node.
Formally, given an input output sequenceX/Y we consider a path P = (n1, nr+1;
X/Y ). Then we represent P as a graph. We call this graph as uncertainty automaton
of P and represent it as GP = (VP , EP ) where initially VP = {n1, n2, ..., nr+1} and
EP = {(ni, ni+1;x/y)|(ni, ni+1;x/y) in P}.
Furthermore let’s define C : VP 7→ 2
S where S is the set of states of M . In other
words C maps each node ni to a set of states of FSM M such that C(ni) is called
the candidate set of ni and represents the set of states that ni can be recognized as.
For example consider the I/O sequenceX/Y = aabababbba/0101100100 and FSM
M1 given in Figure 2.1. Then initial uncertainty automaton is generated according
to the sequence X/Y as shown in Figure 4.1. Each node in the initial uncertainty
automaton have all states of M1 in their candidate sets, i.e. ∀1 ≥ i ≥ 11, C(ni) =
{s1, s2, s3}. That is they can be recognized as either s1 or s2 or s3.
28
n1 n2 n3 n4 n5 n6
n11 n10 n9 n8 n7
a/0 a/1 b/0 a/1 b/1
a/0
b/0b/1b/0a/0
Figure 4.1: Initial Uncertainty Automaton
The main aim of the method is to recognize each node in the uncertainty automa-
ton. Beginning with the initial uncertainty automaton, method tries to eliminate
states from the candidate sets of nodes. We will propose several techniques to elim-
inate states from the candidate sets. Using these techniques if a candidate set of a
node becomes singleton then that node is recognized. That is when the candidate
set of a node ni contains a single node, say s, that means ni is recognized as state
s of M , i.e. candidate set of ni will be C(ni) = {s}.
4.3 State Recognition Using Uncertainty Automa-
ton
Given an input output sequence X/Y , considering the path with label X/Y P =
(n1, nr+1;X/Y ) we form the initial uncertainty automaton GP as explained above.
GP is initialized such that for each node ni ∈ VP , C(ni) contains all the states in
FSM M . Later we try to recognize the nodes of GP by reducing the candidate sets
of the nodes. The uncertainty reduces as the candidate sets of the nodes get smaller.
One easy way of recognizing a node is to look for an occurrence of ADS of a
state. That is if the path P has a subpath (ni, nj;X
′/Y ′) such that X ′ is ADS of
a state s and λ(s,X ′) = Y ′ then the node ni cannot be any state other than s.
Therefore such nodes can easily be recognized as the corresponding states and the
candidate sets of those nodes can be updated accordingly.
For example, consider the distinguishing set D¯ = {Ds1 , Ds2 , Ds3} where Ds1 =
a/0, Ds2 = ab/11, Ds3 = ab/10 for the FSM M1 in Figure 2.1. Using D¯ in the initial
uncertainty automaton shown in Figure 4.1, we can d-recognize
• Nodes n1, n6 and n10 as state s1
29
• Node n4 as state s2
• Node n2 as state s3
and update the candidate sets as shown in Table 4.1.
C(n1) = {s1} C(n2) = {s3} C(n3) = {s1, s2, s3} C(n4) = {s2}
C(n5) = {s1, s2, s3} C(n6) = {s1} C(n7) = {s1, s2, s3} C(n8) = {s1, s2, s3}
C(n9) = {s1, s2, s3} C(n10) = {s1} C(n11) = {s1, s2, s3}
Table 4.1: Candidate Sets For the Uncertainty Automaton in Figure 4.1 after d-
recognition
Whenever we understand that two nodes of an uncertainty automaton corre-
spond to the same state of M , we merge those two nodes into one single node.
We can understand that two nodes ni and nj correspond to the same state in two
different ways.
• ni and nj are both recognized as the same state s of M , that is C(ni) =
C(nj) = {s}.
• there exist two subpaths (np, ni;X
′/Y ′) and (nq, nj;X
′/Y ′) with the same
label in GP where np and nq are understood to correspond to the same state
of M .
After we understand two nodes correspond to the same state, we merge them by
using the following merge operation.
Merging Nodes A node nj is merged into other node ni by
1. setting the start of each edge leaving nj as ni
2. setting the end of each edge entering nj as ni.
3. updating the candidate set of ni as C(ni) = C(ni) ∩ C(nj)
Intuitively, as a result of step 1 and 2 above each edge leaving and entering nj
now leaves and enters the node ni. If step 1 creates a node ni that has two leaving
edges with the same label then the end nodes of these edges are also understood
30
n1 n2 n3 n4 n5
n9 n8
a/0
b/0
a/1 b/0 a/1
b/1
b/1
b/0
Figure 4.2: Uncertainty Automaton after nodes merged
to be corresponding to the same state, hence they will be merged as well. For this
reason, the uncertainty automaton always stays deterministic at the end of merge
operations. In step 3, the candidate sets of the merging nodes is intersected because
two nodes are understood to be corresponding to the same state. Although we may
not know which state they correspond to, it is obvious that it must be one of the
states in the intersection of the candidate sets of two nodes.
Also notice while merging two nodes ni and nj if C(ni) ∩ C(nj) is a singleton,
say {s} then that means after merging resulting node is recognized as state s of M .
In fact the t-recognition explained in Section 2.4 will be realized when merging two
nodes ni and nj where |C(ni)| = 1 and |C(nj)| > 1.
After nj is merged into ni, nj is removed from the uncertainty automaton, since
all the information that is stored in nj is now available in ni.
For example, consider the uncertainty automaton in Figure 4.1 and candidate
sets given in Table 4.1. Since nodes n1, n6 and n10 are recognized as state s1, they
must be merged into one node, let’s merge them as n1. In addition to that, since
n1, n6 and n10 enters to n2, n7 and n11 with the label a/0 respectively, nodes n2, n7
and n11 must be merged into one node, let’s merge them as n2. At this point no
more merging is possible and the resulting uncertainty automaton is shown in Figure
4.2 along with the candidate sets shown in Table 4.2.
C(n1) = {s1} C(n2) = {s3} C(n3) = {s1, s2, s3} C(n4) = {s2}
C(n5) = {s1, s2, s3} C(n8) = {s1, s2, s3} C(n9) = {s1, s2, s3}
Table 4.2: Candidate Sets For the Uncertainty Automaton in Figure 4.2
31
The running time for a single merge operation on an uncertainty automaton
is the summation of time spent for setting the edges and intersecting candidate
sets of the nodes that are merging. For a single node the maximum number of
outgoing edges can be p (number of input symbols) and the maximum number
incoming edges can be r (number of edges in the uncertainty automaton). Hence
the total time spent for setting edges is O(p + r). In addition to that, since in a
candidate set there can be at most n elements (states), intersection of candidate
sets takes O(n2) without using a special data structure for set operations. Hence
the running time for a single merge operation is O(p + r + n2). The running time
for all possible merge operations is simply the multiplication of number of possible
merge operations and time spent for a single merge. In an uncertainty automaton
the maximum number of possible merges is bounded by the number of nodes in the
uncertainty automaton(i.e. O(r) = r + 1). This case happens when a single node
is merged with all other nodes. Hence the running time for all merge operations is
O(r)×O(p+ r + n2) = O(pr + r2 + n2r).
4.3.1 Candidate Elimination Using Incompatible Sets
With the techniques explained so far, all the state recognitions on the uncertainty
automaton can also be realized by using d- and t-recognition only. In this section a
technique that eliminates states from the candidate set of a node will be explained
which in turn allows the techniques explained above to recognize more states than
d- and t-recognitions alone can achieve. Before explaining this technique, we need
to define a compatibility relation between the nodes of an uncertainty automaton.
Compatibility of Nodes Two nodes ni and nj are defined to be compatible if
• C(ni) ∩ C(nj) 6= ∅ and
• for any input sequence X ∈ I∗ that is defined for ni, either X is not defined
for nj or if paths (ni, nt, X/Yi) and (nj, nu, X/Yj) exists in GP then Yi = Yj
and nt and nu are compatible and vice versa.
An important property of compatibility relation is that it is symmetric, that is
if ni is compatible with nj then nj is compatible with ni.
32
Incompatible Set of a Node In this candidate elimination technique, we need
to know for each node ni, the set of nodes that ni is not compatible with. For this
reason, let’s define N as N : VP 7→ 2
VP . N(ni) is called the incompatible set of ni
and contains the set of nodes that are not compatible with ni.
For the uncertainty automaton given in Figure 4.2, the incompatible sets are
shown in Table 4.3. If we consider N(n3), nodes n5 and n8 are in N(n3) since when
N(n1) = {n2, n4} N(n2) = {n1, n4, n5, n8} N(n3) = {n5, n8, n9}
N(n4) = {n1, n2} N(n5) = {n2, n3, n9} N(n8) = {n2, n3, n9}
N(n9) = {n3, n5, n8}
Table 4.3: Incompatible Sets For the Uncertainty Automaton in Figure 4.2
input b is applied node n3 produces output 0 whereas n5 and n8 produce output
1. N(n3) also contains n9 since with b/0 n3 enters node n4 and n9 enters node n1
but n1 and n4 incompatible. n1 and n4 are incompatible because the intersection of
their candidate sets is empty.
Finding Incompatible Sets Algorithm 7 gives formal description of finding in-
compatible sets for nodes. Finding incompatible sets of each node in uncertainty
automaton is a process with two phases. In the first phase (lines 1-6) each pair of
nodes are considered separately. For any pair of nodes, say ni and nj, if C(ni)∩C(nj)
is empty then these two nodes are incompatible. Otherwise taking only edges that
leaves ni and nj into account, compatibility of ni and nj is checked. That is, if both
nodes have an edge leaving with input symbol x, say (ni, nu;x/y) and (nj, nt;x/y
′),
then these nodes are incompatible if the output they produce are different (y 6= y′).
Since in this first phase only single input symbols are considered for checking
compatibility, this phase does not give a final result about the compatibility of
two nodes. It only tells if incompatibility of two nodes can be deduced by any
input sequence of length 1. However there may be cases that ni and nj might be
incompatible but it can only be seen when all possible input sequences defined at
ni and nj are considered. In the second phase of the algorithm (lines 7-12), this
case is handled. In this phase, we iterate over a list of node pairs (L) that are
incompatible to each other. Initially this list contains all pairs of nodes that are
33
found to be incompatible in the first phase. Algorithm iteratively removes a pair
from the list say ni and nj and checks if there are two nodes nu and nt such that
(nt, ni, x/y) and (nu, nj, x/y). If this is the case then we can deduce that nt and
nu are not compatible as well. That is because, nt and nu enters to incompatible
nodes with same label x/y. Then new pair nt and nu is added to list of incompatible
nodes if they are not considered before. This is checked by keeping track of each
pair considered in this second phase in a separate list (H). After all such nodes that
reach to ni and nj with the same label are added to the list then the pair ni, nj is
removed from the list. Algorithm terminates when the list becomes empty, thus all
incompatible sets are found.
Algorithm 7: Find Incompatible Sets
Input: GP = (VP , EP ) uncertainty automaton
Result: N(ni) is found for each node ni ∈ VP
L = ∅ ; // List of node pairs to be processed1
foreach pair of nodes (ni, nj) in VP × VP do2
if C(ni) ∩ C(nj) = ∅ OR (∃x (ni, nu, x/yi), (nj, nt, x/yj) ∈ EP3
s.t. yi 6= yj then
L = L ∪ {(ni, nj)};4
N(ni) = N(ni) ∪ {nj};5
N(nj) = N(nj) ∪ {ni};6
H = ∅ ; // List of incompatible node pairs processed7
while L is not empty do8
pick a pair (ni, nj) from L ;9
L = L \ {(ni, nj)};10
H = H ∪ {(ni, nj)};11
foreach nt and nu s.t. (nt, ni, x/y) and (nu, nj, x/y) do12
if (nt, nu) 6∈ H then13
L = L ∪ {(nt, nu)} ;14
N(nt) = N(nt) ∪ {nu};15
N(nu) = N(nu) ∪ {nt};16
34
The running time of the finding incompatible sets given in Algorithm 7 can
be analyzed as follows. In first phase (first for loop) for each pair of nodes the
intersection of candidate sets and outgoing edges are considered. Finding intersec-
tion of candidate sets takes O(n2) time and there are O(p) outgoing edges for any
node. Since there are O(r2) node pairs, in the first phase the total time spent is
O(n2r2 + pr2). The second phase may also consider all pair of nodes and the most
time consuming operation in this phase is to check if a pair of nodes is considered
before. Although in the description given in the Algorithm 7 the list H is used to
check this, in our implementation we check if a pair of nodes was considered before
by looking whether one of the nodes is already in the incompatible set of the other.
Since each incompatible set can have at most O(r) nodes, this check is done in O(r)
time. Considering all pairs of nodes, the running time of the second phase is O(r3).
Hence the total running time of the algorithm is O(r3 + n2r2 + pr2).
Candidate Elimination Using a Recognized Node Before explaining the
technique candidate elimination using a set of incompatible nodes, we will first con-
sider a special case of the technique. We will call this special case as candidate
elimination using a recognized node.
Consider a node ni that is recognized as state s of M and suppose that there
exists a node nj which has not been recognized yet. Let’s assume nj is known to be
incompatible with ni (i.e. nj ∈ N(ni)), and s ∈ C(nj). In other words, s is still a
candidate state for nj but we also know that nj correspond to a different state than
ni which is recognized as s. This actually proves that nj cannot be recognized as s
and hence s can be removed from C(nj).
This elimination process is applied to all node pairs (ni, nj) such that C(ni) = {s}
and nj ∈ N(ni) by setting C(nj) = C(nj) \ {s}.
For example, considering the uncertainty automaton in Figure 4.2, the following
candidate eliminations using a recognized node is possible. Since node n2 is rec-
ognized as state s3 and is incompatible with nodes n5 and n8, state s3 is removed
from C(n5) and C(n8). Table 4.4 shows the updated candidate sets after these
eliminations.
Assuming incompatible sets are given, the running time for a single application
35
C(n1) = {s1} C(n2) = {s3} C(n3) = {s1, s2, s3} C(n4) = {s2}
C(n5) = {s1, s2} C(n8) = {s1, s2} C(n9) = {s1, s2, s3}
Table 4.4: Candidate Sets for the Uncertainty Automaton in Figure 4.2
of candidate elimination using recognized nodes is O(nr). That is because there can
be at most n recognized nodes and for each of them we can do eliminations on all
the nodes in its incompatible sets each of which can have at most O(r) nodes. If the
method achieves an elimination then it has to be applied again. The method can be
applied until all nodes recognized or no more elimination possible. However before
each application of this candidate elimination method incompatible sets have to be
recomputed since as a result of candidate eliminations incompatibility relations may
change. For this reason we have to add the running time of finding incompatible sets
to the running time of the method. Hence the running time of a single application
of the method is O(nr) + O(r3 + n2r2 + pr2) = O(r3 + n2r2 + pr2). As we see,
running time for finding incompatible sets dominates the running time for a single
application of the method, hence the first term O(nr) is dropped. When we analyze
how many times the method can be applied, we can say that in the worst case
the method have to be applied O(nr) times. That is because in the worst case,
each application eliminates a single element from the candidate set of a single node.
Hence in the worst case the running time of applying candidate elimination using a
recognized node is O(nr)×O(r3 + n2r2 + pr2) = O(nr4 + n3r3 + pnr3).
Candidate Elimination Using a Set of Incompatible Nodes Note that the
technique explained above cannot be applied when |C(ni)| > 1. Although for a node
nj ∈ N(ni) it is guaranteed that ni and nj will correspond to two different states
since the state corresponding to ni is not found yet we cannot simply remove the
entire set of states in C(ni) from C(nj).
However there is still a further chance for candidate elimination using a similar
idea. Let’s start by considering such an elimination on a simple case. Assume that
there is a set consisting of two nodes ni and nj both of which are not recognized yet
(i.e. |C(ni)| > 1 and |C(nj)| > 1) and are known to be incompatible (i.e. nj ∈ N(ni)
and ni ∈ N(nj)). If ni and nj have candidate sets such that |C(ni) ∪ C(nj)| = 2,
36
then that means there are 2 candidate states that ni and nj can be recognized as.
Since we also know that ni and nj is incompatible, ni and nj will be recognized as
different states in C(ni)∪C(nj). Further let’s assume that there is a third node nu
that is also not recognized yet and is known to be incompatible with both ni and
nj. This incompatibility tells us that nu cannot be recognized as any of the 2 states
in C(ni) ∪ C(nj), thus the elimination C(nu) = C(nu) \ (C(ni) ∪ C(nj)) is a valid
operation.
As we have seen in the example above, there are cases when we can definitely
know that a node cannot be recognized as a set of states rather than a single state.
Thus elimination of multiple states from the candidate set of a node at once is
possible. When we generalize this idea, we come up with the following formulation.
Assume that for a set of k nodes, say K = {n1, n2, ..., nk} where |K| = k, the
following conditions hold
1. if k > 1 then ∀ni ∈ K, |C(ni)| > 1
2. if k > 1 then ∀ni, nj ∈ K if ni 6= nj then nj ∈ N(ni)
3. if k > 1 then |
⋃k
i=1 C(ni)| = k
4. if k = 1 then |C(n1)| = 1
In simple words, condition 1 states each node in K has not been recognized yet.
Condition 2 states each node in K is incompatible with every other node in K. That
is no two nodes in K can be recognized as the same state. Condition 3 states that
there are k possible states that nodes in K can be recognized as. Hence combining
condition 2 and 3, it is obvious that each node in K will be recognized as one of the
k possible states and no other node in K will be recognized as that state. Although
we have no information about which node will be recognized as which state, this is
not necessary for elimination.
If there is any node, say nu, different than the nodes inK and is incompatible with all
the nodes inK then we can do the following elimination C(nu) = C(nu)\
⋃k
i=1 C(ni).
Notice that all conditions except the last one assumes the case k > 1. That is because
the case k = 1 refers to the special case where the only node in K must be already
recognized and that is same as the special case examined as candidate elimination
37
using a recognized node.
Although candidate elimination method presents an important opportunity, finding
a set K satisfying the conditions becomes more expensive as k gets large. In fact,
finding a set K is same as solving the famous Clique problem on undirected graphs.
That is because, let’s assume we have found a set of nodes U with |U | = m ≥ k
that satisfies conditions 1 and 3. Then we need to find a subset K ⊆ U such that
K satisfies condition 2. If we consider each node in U as a node of an undirected
graph and put an undirected edge between the nodes that are incompatible with
each other, then finding a clique of size k in this graph solves our problem of finding
a subset K. Since clique problem is NPC, then candidate elimination using a set of
incompatible nodes leads to exponential running time.
Considering the performance of the method, the maximum cardinality of the set
of incompatible nodes, (k) can be given as a parameter. This provides a tuning
chance for the trade off between running time and finer analysis. Searching for a
set of incompatible nodes K with large cardinality,(k), may yield better results but
increases the running time of the analysis.
4.3.2 Candidate Elimination Using Candidate Trial
Another method that allows elimination of candidates from the candidate set of
an unrecognized node is what we call as candidate trial. In this method, for an
unrecognized node, say ni, a candidate state s ∈ C(ni) is chosen. Then a what-if
analysis is performed assuming that ni is recognized as s. In other words, a copy of
the current uncertainty automaton is created and ni is recognized as s on the copy
automaton. ni is recognized as s by simply merging it with the node that is already
recognized as s, if there is such node. If there is not, that is ni is the first node that
will be recognized as s, then setting C(ni) = {s} is enough. After ni is recognized as
s on the copy automaton, the analysis continues using the methods explained before
and it is checked that whether recognizing ni as s causes a contradiction at some
point. A contradiction is reached while recognitions and candidate eliminations
performed as usual, at some point two nodes, say nt and nu needs to be merged
but either C(nt) ∩ C(nu) is empty or for some input symbol x, nt and nu produce
different outputs. If such a contradiction is reached at some point, then it is sure
38
that ni should not be recognized as s. Hence s can be eliminated from C(ni) in
the current uncertainty automaton. If no contradiction is reached then we can only
conclude that in the current state of the uncertainty automaton, there is still chance
for ni to be recognized as s. Thus at this point elimination of s from C(ni) is not
possible.
Considering the performance of the method, we think that it is reasonable not to
use candidate trial method in a nested fashion. Although it is possible to put a limit
on the depth of the nested candidate trial method calls, in our implementation we
do not call candidate trial method within another candidate trial method call. This
is simply because, not limiting candidate trial calls mean trying every possibility for
every unrecognized node and that may increase running time drastically.
Considering the uncertainty automaton in Figure 4.2, we can continue candidate
eliminations using the techniques explained above. When we want to use candidate
elimination using set of incompatible nodes, there are two such sets satisfying the
conditions of the method. One is the set of nodes {n3, n5, n9} and the other is
{n3, n8, n9}. However for both sets there is not any other node that is incompatible
with all of the nodes in one of the sets, thus no elimination is possible.
However we can continue candidate eliminations using candidate trial method.
Since among unrecognized nodes, node n5 have only two candidates remaining, we
like to consider candidate trial on node n5 first. So assume that node n5 is recognized
as state s1 on a copy of the current uncertainty automaton and this assumption leads
to following merges and candidate eliminations on the copy uncertainty automaton.
• Node n5 is merged with node n1 (since we assume they are recognized as same
state)
• State s1 is eliminated from C(n9) (since node n1 becomes incompatible with
node n9)
• State s1 is eliminated from C(n8) (since node n1 becomes incompatible with
node n8, thus node n8 recognized as state s2)
• State s1 is eliminated from C(n3) (since node n1 becomes incompatible with
node n3)
39
• Node n8 merged with node n4 (since they both recognized as state s2)
• State s2 is eliminated from C(n9) (since node n9 becomes incompatible with
node n4, thus node n9 recognized as state s3)
• State s2 is eliminated from C(n3) (since node n3 becomes incompatible with
node n4,thus node n3 recognized as state s3)
After all of these merges and eliminations, the resulting copy uncertainty au-
tomaton is shown in Figure 4.3.
n1 n2 n3
n9 n4
a/0
b/1
a/1
b/0
b/0a/1
b/1
b/0
Figure 4.3: Copy Uncertainty Automaton
Since nodes n3 and n9 are now recognized as state s3 and node n2 has already
been recognized as state s3, these three nodes have to be merged. However node n9
and node n2 happens to be incompatible, since with b/0 they reach nodes that are
recognized as different states (node n2 reaches node n4 that has been recognized as
state s2 and node n9 reaches n1 that has been recognized as state s1). Thus there
is a conflict and that means our assumption of recognizing node n5 as state s1 was
false. So state s1 can be eliminated from C(n5). That leaves only state s2 in C(n5),
hence node n5 is now recognized as state s2 and must be merged with node n4.
Merging node n5 with node n4 makes the following candidate eliminations pos-
sible via candidate elimination using a recognized node.
• State s2 is eliminated from C(n9) (since node n9 is incompatible with node
n4)
40
• State s2 is eliminated from C(n3) (since node n9 is incompatible with node
n4)
The resulting uncertainty automaton is shown in Figure 4.4, the corresponding can-
didate sets in Table 4.5 and the incompatible sets in Table 4.6.
n1 n2 n3 n4
n9 n8
a/0 a/1
b/0
b/0
a/1
b/1
b/1
b/0
Figure 4.4: Uncertainty Automaton
C(n1) = {s1} C(n2) = {s3} C(n3) = {s1, s3}
C(n4) = {s2} C(n8) = {s1, s2} C(n9) = {s1, s3}
Table 4.5: Candidate Sets For the Uncertainty Automaton in Figure 4.4
N(n1) = {n2, n4} N(n2) = {n1, n4, n8} N(n3) = {n4, n8, n9}
N(n4) = {n1, n2, n3, n9} N(n8) = {n2, n3, n9} N(n9) = {n3, n4, n8}
Table 4.6: Incompatible Sets For the Uncertainty Automaton in Figure 4.4
Now there is a possibility for candidate elimination using a set of incompatible
nodes. Considering the set of nodes K = {n3, n9}, it can be seen that they form
a set that can be used for elimination (since C(n3) ∪ C(n9) = {s1, s3} then |K| =
|C(n3) ∪ C(n9)| = 2 also node n3 is incompatible with node n9). If we consider
node n8 which is incompatible with both n3 and n9, then state s1 can be eliminated
from C(n8). Then node n8 is recognized as state s2 and must be merged with node
n4. This merge leads to merging of nodes n1 and n9. The resulting uncertainty
automaton is shown in Figure 4.5 with candidate sets shown in Table 4.7.
41
n1 n2
n4 n3
a/0
b/0
a/1
b/0
b/0
a/1
b/1
Figure 4.5: Uncertainty Automaton
C(n1) = {s1} C(n2) = {s3} C(n3) = {s1, s3} C(n4) = {s2}
Table 4.7: Candidate Sets For the Uncertainty Automaton in Figure 4.5
The only unrecognized node is node n3 with candidate set C(n3) = {s1, s3}.
Making a candidate elimination using the recognized node n1, it is possible to elim-
inate s1 from C(n3) and recognize it as state s3. Hence node n3 must be merged
with node n2. Now all nodes are recognized as some state of FSM M1. The final
automaton is shown in Figure 4.6 with candidate sets shown in Table 4.8. Notice
that the final automaton is now equivalent to FSM M1 with all nodes recognized
and all transitions ofM1 are verified. As a result it is concluded that X is a checking
sequence for FSM M1.
n1 n2
n4
b/2
a/0
b/1
a/1
b/0
a/1
Figure 4.6: Final Uncertainty Automaton
42
C(n1) = {s1} C(n2) = {s3} C(n4) = {s2}
Table 4.8: Candidate Sets For the Uncertainty Automaton in Figure 4.6
4.3.3 Using Candidate Elimination Methods Together
After being seen the methods we use for candidate elimination, in this section we
present how we use these methods together. The main consideration we have is to
use the method that is cheaper as much as possible. Whenever a method fails to
update the uncertainty automaton then we proceed to the next method which makes
a more expensive analysis then the current method. What we mean by an update of
the uncertainty automaton is an elimination of a candidate from the candidate set
of any node. In addition to that whenever a method achieves an update, we brake
the execution of the current method and continue with a cheaper method if possible.
Hence we run the methods in the following order until none of the methods are able
to update the uncertainty automaton or all nodes in the uncertainty automaton
have been recognized.
1. Candidate Elimination Using a Recognized Node
2. Candidate Elimination Using a Set Of Incompatible Nodes
3. Candidate Elimination Using Candidate Trial
4.4 Thoughts on Uncertainty Automaton
As explained above, given an FSMM , an I/O sequence X/Y (which can be inferred
from a given input sequenceX by tracing it onM starting from s1), together with an
ADS D¯, induces an uncertainty automaton N for an FSM M by using the method
explained in this section. Although not proved formally, it is easy to see intuitively
that if the uncertainty automaton N has as many states as M and is equivalent to
M , then X is a checking sequence for M .
However, even if X is really a checking sequence for M , the method may not
produce an uncertainty automaton equivalent toM . This may happen sinceX might
actually be a checking sequence not using D¯, or there might be other recognitions
which cannot be performed by using our state recognition techniques. Therefore
43
when the final uncertainty automaton is not equivalent to M , we cannot be sure
whether X is a checking sequence or not.
We believe that even when a given input sequence X does not produce an un-
certainty automaton N equivalent to M , N can be used to decide how close X is
being a checking sequence. The difference in the number of states and the sizes of
the candidate states at the nodes of N can be used to produce such a metric.
44
Chapter 5
Overview of Sima˜o et al.’s Method
In [24], Sima˜o et al. presents a constructive method for generating checking se-
quences using distinguishing sets. In this section we provide a short description of
the algorithm using our own notation.
Given an FSM M and a distinguishing set D¯ = {Ds1 , Ds2 , ...Dsn} of M , the
algorithm generates a checking sequence Q = x1x2...xk. For a sequence Q, let P (Q)
denote the path that starts from the initial state ofM such that Q is the input por-
tion of the label of P (Q). That is P (Q) = (n1, n2;x1/y1), (n2, n3;x2/y2), ..., (nr, nr+1;
xr/yr). The algorithm iteratively constructs the sequence Q such that in the cor-
responding path P (Q) all transitions of M are verified. Thus when all transitions
are verified, Q becomes a checking sequence of M and algorithm terminates. Let
Qi denote the prefix of the checking sequence Q such that Qi is obtained at the
end of the ith iteration. Similarly let P (Qi) be the corresponding path for Qi. At
iteration i, algorithm produces the sequence Qi by extending the sequence Qi−1.
Initially it is supposed that the implementation is at the initial state s1 and in order
to recognize it Ds1 has to be applied. Thus the sequence Q1 is always Ds1 . In itera-
tion i how to extend the sequence Qi−1 is decided based on whether the end vertex
of P (Qi−1) is recognized or not. In this method, a vertex in P (Q) is recognized
when it is d-recognized or t-recognized as explained in Section 2.4. If the end vertex
of P (Qi−1) is recognized then a transition verification sequence is appended to the
current sequence, otherwise a state recognition is done by appending some iden-
tification sequence. Notice that in each case the sequence appended always ends
with an identification sequence, hence when a state recognition is attempted the
45
longest possible overlapping between the identification sequences is considered. A
more formal description of method is given in Algorithm 8.
Algorithm 8: Sima˜o et al.’s Checking Sequence Generation Method as in [24]
Input: D¯ = {Ds1 , Ds2 , ..., Dsn} a distinguishing set for an FSM M
Output: Q a checking sequence for M
Q0 is the empty sequence ;1
i = 1;2
while there are unverified transitions do3
let sk = δ(s1, Qi−1) ;4
if end of P (Qi−1) is recognized (as sk of M) then5
Find a shortest verified transfer sequence β from sk to some state sj,6
such that sj has some unverified transition (sj, su;x/y) ;
Qi = Qi−1βxDsu ;7
else
Find the longest suffix χ of Qi−1 such that Qi−1 = αχ and χ is also a8
prefix of Dsu , where su = δ(s1, α), and the end vertex of P (α) is not
recognized ;
Qi = Qi−1φ where Dsu = χφ ;9
Update recognized vertices in P (Qi) ;10
Update verified transitions ;11
Since the aim of the algorithm is to obtain a sequence that verifies every transi-
tion, after each extension to the current sequence the set of verified transitions must
be updated. This is necessary for both cases. In case 1 when a transition verification
sequence is appended to current sequence it is obvious that a transition will be veri-
fied, but note that a transition verification may lead to other transition verifications
via t-recognitions. Hence more than one transition can be verified within an itera-
tion. Likewise in case 2, that is when a suffix of identification sequence is appended
to the current sequence, a previously unrecognized vertex will be d-recognized and
that may also lead to transition verifications.
46
Chapter 6
Our Checking Sequence
Generation Method
In this section a new method to generate checking sequences using distinguishing sets
will be presented. Similar to Sima˜o et al.’s [24] method, this method also constructs
a checking sequence by extending the current sequence in each iteration. However
unlike the method in [24], the method consists of two phases. In the first phase an
input sequence Q is generated but Q is not guaranteed to be a checking sequence.
If it is not, then method enters second phase and does some post-processing. In this
post-processing phase, Q is further extended until it becomes a checking sequence.
6.1 Phase 1: Sequence Generation
In the first phase of the method, an input sequence Q, which may not be a checking
sequence, is constructed iteratively. In this method, recognition of a vertex in P (Q)
can be achieved with d and t-recognitions as usual. However in our method a vertex
can also be recognized conditionally. A conditional recognition of the start of an
edge (ni, ni+1;x/y) in P (Q) is possible if this edge corresponds to an invertible
transition (s, s′;x/y) in FSM M . An invertible transition is defined as follows.
Definition 4. A transition (s, s′;x/y) is invertible if ∀s′′ ∈ S such that s′′ 6= s,
either δ(s′′, x) 6= s′ or λ(s′′, x) 6= y.
In simple words, a transition (s, s′;x/y) is invertible if it is the only transition
entering state s′ with input x and output y.
47
Although in a different context, the idea of using invertible transition to reduce
the checking sequence length has been suggested before in [12, 13, 6].
If (s, s′;x/y) is an invertible transition of FSM M the recognition of the start
vertex ni of the edge (ni, ni+1;x/y) as s in P (Q) is possible when the following
conditions are met.
• the end vertex ni+1 is recognized as state s
′ of M and
• For each state s′′ ofM such that s′′ 6= s, P (Q) contains an edge (nj, nj+1;x/y
′)
such that nj is recognized as s
′′ and either y′ 6= y or nj+1 is recognized as some
state different than s′
In simple words, recognized vertices of P (Q) have to provide enough evidence
that all states except s when input x is applied either produces an output different
than y or enters a state different than s′. Thus, if an unrecognized vertex ni in
P (Q) enters state s′ with input x and output y, then the only remaining state that
ni can be recognized as is s. Invertibility of transition (s, s
′;x/y) is crucial in such
conditional recognitions, since otherwise there would be at least one other state s′′
different than s that also enters s′ with input x and output y. In that case ni cannot
be recognized conditionally since there is not enough evidence to know whether ni
should be recognized as s or s′′.
A conditional recognition is valid only when all conditions are satisfied. Hence it
should be checked that if the conditions are satisfied. However in this method, we do
not check if any of the conditions is satisfied in the first phase. Instead if there is an
edge (ni, ni+1;x/y) in P (Q) and it corresponds to an invertible transition (s, s
′;x/y),
then ni directly assumed to be recognized as s without considering if there is enough
evidence in P (Q) to validate this recognition. That is why the sequence generated
in the first phase of the algorithm is not guaranteed to be a checking sequence and
some post processing may become necessary to obtain a checking sequence.
A description of the first phase of method is presented in Algorithm 9. At the
end of the first phase, generated sequence Q is checked to see if it is a checking
sequence using the method described in Chapter 4. If Q is a checking sequence the
algorithm terminates. Otherwise Phase 2 of the algorithm is executed to extend Q
to a checking sequence.
48
Algorithm 9: Phase 1
Input: D¯ = {Ds1 , Ds2 , ..., Dsn} a distinguishing set for an FSM M
Output: Q a possible checking sequence for M
Q0 is the empty sequence ;1
i = 1;2
while there are unverified transitions do3
let sk = δ(s1, Qi−1) ;4
if end vertex of P (Qi−1) is recognized (as sk of M) then5
Find a shortest verified transfer sequence β from sk to some state sj,6
such that sj has some unverified transition (sj, su;x/y) ;
Qi = Qi−1βxDsu ;7
else
Qi = Qi−1Dsk ;8
Update recognized vertices in P (Qi) ;9
Update verified transitions ;10
The length of the sequence generated by Phase 1 of the algorithm can be analyzed
as follows. To be able to verify a transition, the last vertex of the sequence must
be recognized. In worst case, the last vertex of the sequence is recognized after
identification sequence is appended (n+ 1) times. Let |D¯| denote the length of the
longest identification sequence in the distinguishing set D¯. Then the last vertex of
the sequence is recognized after at most (n+ 1)× |D¯| input symbols are appended.
When the last vertex is recognized we may need to append a transfer sequence. The
maximum length of a transfer sequence can be n in case all states must be visited. In
addition to that since a transition verification sequence is concatenation of an input
symbol and an identification sequence, its length can be at most |D¯|+1. Summing all
these gives the length of sequence required for a single transition verification. Since
there are np transitions then upper bound for the length of generated sequence can
be calculated by np× ((n+ 1)|D¯|+ n+ |D¯|+ 1) which is O(n2p|D¯|).
The running time of Phase 1 is dominated by updating recognized vertices in
the current sequence. Given a sequence of length l updating the recognized vertices
in that sequence takes O(l2) time. Since the length of the sequence is generated in
49
Phase 1 is O(n2p|D¯|), a single iteration of the while loop takes O(n4p2|D¯|2). While
loop iterates at most np times, hence the running time of Phase 1 is O(n5p3|D¯|2).
Example
We will now illustrate the execution of Phase 1 will on the FSM M1 shown in
Figure 2.1 with the distinguishing set D¯ = {Ds1 , Ds2 , Ds3} where Ds1 = a,Ds2 =
ab,Ds3 = ab. First note that all transitions in FSM M1 is invertible so all nodes
can be conditionally recognized if needed. Initially we have the empty sequence Q0,
hence we extend it by Ds1 = a. Thus Q1 = a, P (Q1) = (n1, n2; a/0) and n1 is
d-recognized as state s1. From now on, we will show iterations of algorithm using
a table format. For the first iteration, Table 6.1 shows the current sequence Q1 in
the first row. Second row shows the initial vertex of the corresponding path P (Q1)
for the input at the same column. Third row shows if the corresponding vertex in
the second row is recognized, D is used for d-recognition, T is used for t-recognition
and C is used for conditional recognition. Last row shows the corresponding states
of M1. That is, we can read Table 6.1 as, current input sequence Q1 is a, in P (Q1)
vertex n1 enters vertex n2 with input a and n1 is d-recognized as state s1, whereas
n2 is not recognized yet.
Sequence a
Vertex n1 n2
Recognition D
State s1 s3
Table 6.1: Iteration 1
In the second iteration the end vertex in P (Q1), n2, is not yet recognized, thus
we extend sequence by Ds3 = ab. With this extension n2 is d-recognized and n3
is conditionally recognized. Also the transitions (s1, s3; a/0) and (s3, s3; a/1) are
verified. Iteration 2 is shown in Table 6.2.
In the third iteration the end vertex in P (Q2), n4, is not yet recognized, thus we
extend sequence by Ds2 = ab as shown in Table 6.3. In this iteration, transitions
(s3, s2; b/0) and (s2, s2; a/1) are verified.
In the fourth iteration the end vertex in P (Q3), n6, is not yet recognized, thus
50
Sequence a a b
Vertex n1 n2 n3 n4
Recognition D D C
State s1 s3 s3 s2
Table 6.2: Iteration 2
Sequence a a b a b
Vertex n1 n2 n3 n4 n5 n6
Recognition D D C D C
State s1 s3 s3 s2 s2 s1
Table 6.3: Iteration 3
we extend sequence by Ds1 = a as shown in Table 6.4. In this iteration, transition
(s2, s1; b/1) is verified.
Sequence a a b a b a
Vertex n1 n2 n3 n4 n5 n6 n7
Recognition D D C D C D T
State s1 s3 s3 s2 s2 s1 s3
Table 6.4: Iteration 4
In the fifth iteration, the end vertex in P (Q4), n7, is recognized as s3. Since
all transitions of s3 are verified, we need to transfer to a state with an unverified
transition. Actually the only transition that remains unverified is (s1, s1; b/0). Hence
we transfer to s1 with sequence bb first and then verify the transition with bDs1 = ba.
Hence in this iteration the sequence is extended by bbba as shown in Table 6.5. As
every transition is now verified, sequence generation in Phase 1 is finished. If we
check whether the generated sequence Q = Q5 is a checking sequence using the
method in Chapter 4, we see that Q is a checking sequence. Actually the example
sequence used in Chapter 4 is the same sequence generated here and in that section
it is shown that this sequence is checking sequence for M1.
51
Sequence a a b a b a b b b a
Vertex n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11
Recognition D D C D C D T T T D T
State s1 s3 s3 s2 s2 s1 s3 s2 s1 s1 s1
Table 6.5: Iteration 5
6.2 Phase 2: Extending Sequence Q to a Checking
Sequence
If the sequence generated in Phase 1, namely Q, is not a checking sequence, then
Phase 2 of the algorithm is executed. In this post processing phase, Q is extended
with some identification sequences. Just after Phase 1, using the method in Chapter
4 the sequence is checked whether it is a checking sequence or not. If Q cannot be
understood to be a checking sequence, that means the final uncertainty automaton
for Q has some unrecognized nodes. Hence in Phase 2, what we simply do is to find
an unrecognized node in uncertainty automaton and extend the sequence with the
corresponding identification sequence for that unrecognized node. The recognition
of an unrecognized node may result in new recognitions via candidate eliminations
and merges on the uncertainty automaton. If there are still unrecognized nodes re-
maining in the uncertainty automaton then the sequence is further extended until no
unrecognized nodes remains in the uncertainty automaton, thus sequence becomes
a checking sequence.
Among unrecognized nodes in the uncertainty automaton which one to recognize
is chosen using a very simple approach. Assume that in the uncertainty automaton
the current node is ni. From ni a shortest transfer sequence β (possibly empty) to
an unrecognized node nj is found in the uncertainty automaton. If nj has to be
recognized as s, then the sequence βDs, where Ds is the identification sequence of
s, is appended to the sequence.
However there might be some cases such that in uncertainty automaton a transfer
sequence to some unrecognized node cannot be found from the current node ni. That
case happens when every node that is reachable from ni is already recognized and
at least one of these nodes has an undefined transition. Then we find the shortest
52
transfer sequence β to such a node. Assume that nj is that node which is recognized
as s and has some undefined transition, say x. Then the sequence is extended by the
transition verification sequence appended to β, namely βxDs′ where s
′ = δ(s, x).
An application of this case can be seen in the example given in this section
Example
Now the execution of Phase 2 will be illustrated on an example. Consider the
FSM M2 given in Figure 6.1 with the distinguishing set D¯ = {Ds1 , Ds2 , Ds3 , Ds4}
where Ds1 = ab, Ds2 = ab, Ds3 = aa and Ds4 = aa.
s1 s4
s3 s2
a/1
b/0
a/1
b/1
a/0
b/1
a/0
b/0
Figure 6.1: FSM M2
Phase 1 generates the sequence Q = ababbababbaaaaa for the FSM M2 with the
given distinguishing set D¯. When it is checked whether Q is a checking sequence for
M2, the method described in Chapter 4 produces the final uncertainty automaton
shown in Figure 6.2 with the candidate sets given in Table 6.6. Since there are
still some unrecognized nodes in the uncertainty automaton, Q is not a checking
sequence. Thus in Phase 2, Q needs to be extended to become a checking sequence
for M2.
C(n1) = {s1} C(n2) = {s3, s4} C(n6) = {s2} C(n7) = {s3, s4}
C(n11) = {s3} C(n12) = {s1, s2} C(n13) = {s4}
Table 6.6: Candidate Sets For the Uncertainty Automaton in Figure 6.2
Notice that in the uncertainty automaton nodes n2, n7 and n12 are unrecognized,
each having multiple states in their candidate sets. In order to extend sequence Q
into a checking sequence, we need to recognize all nodes in the uncertainty automa-
53
n1 n6 n7
n2 n13 n12 n11
a/1b/0
b/0
a/1
b/1
b/1
a/0a/1
a/0
Figure 6.2: Final Uncertainty Automaton for Q generated in Phase 1
ton. Hence firstly we like to transfer to one of these unrecognized nodes using the
shortest possible transfer sequence and recognize that node with the corresponding
identification sequence from D¯. However when sequence Q is traced (starting from
the node that is recognized as the initial state s1, i.e. node n1) on the uncertainty
automaton, the last node happens to be n13. Note that node n13 is already recog-
nized as s4. As the only edge leaving node n13 is a self-loop, there is no way to
transfer to an unrecognized node. The reason for that is node n13 have no defined
edge with input b. In other words b transition of s4 has not been verified yet, because
possibly in Phase 1 some conditions that is required for a conditional recognition
remained unsatisfied. To get rid of this situation, we extend Q to verify b transition
of s4. Hence the sequence bDs1 = bab is appended to Q (note that δ(s4, b) = s1).
When we check whether the extended sequence Q′ = Qbab is a checking sequence
for M2, the resulting uncertainty automaton happens to have all nodes recognized
and all transitions verified as shown in Figure 6.3 and candidate sets in Table 6.7.
Thus Q′ is a checking sequence for M2.
C(n1) = {s1} C(n6) = {s2} C(n11) = {s3} C(n13) = {s4}
Table 6.7: Candidate Sets For the Uncertainty Automaton in Figure 6.3
6.3 Experimental Results
In this section the experimental results for the checking sequence generation method
will be discussed. The methods have been implemented with Java and the exper-
54
n1 n13
n11 n6
a/1
b/0
a/1
b/1
a/0
b/1
a/0
b/0
Figure 6.3: Final Uncertainty Automaton for Q′ = Qbab
iments have been executed on a machine with Intel Xeon 2.33 GHz and 32 GB
ram.
The FSMs that is used in experiments are generated using the random FSM
generation tool whose details explained in Chapter 3. For the experiments 10 sets
of FSMs are used. Each set of FSMs contain 200 FSMs having number of states
n, where n is ranging from 10 to 100 (increasing with a step size 10). Each FSM
has 5 input symbols and 5 output symbols. Also each FSM has a PDS. In this
experimental setup PDS is used instead of distinguishing sets. That is because for
a given FSM, the tool we are using is biased toward finding distinguishing sets that
generally contain repetitions of the same input symbol. We call such sequences as
uniform sequences. However for finding PDS there is no such bias. If an identifi-
cation sequence is uniform, then this situation increases the chances of overlapping
among identification sequences and causes biased results.
We are going to compare the performance of our method with Sima˜o et al.’s
method given in [24]. The comparisons will be in terms of checking sequence length
and method execution time.
6.3.1 Comparison with Sima˜o et al.’s Method
For the experimental results that will be presented in this section our method uses
candidate elimination using a recognized node as the only candidate elimination
method while checking if the sequence generated is a checking sequence. That is
candidate elimination using candidate trial and candidate elimination using a set of
55
incompatible nodes are not used due to their high computational costs. However the
experiments show that even with this reduced recognition ability our method can
outperform Sima˜o et al.’s method.
Number of States Our Method Sima˜o et al.’s Method
10 179 207
20 451 528
30 788 893
40 1172 1320
50 1476 1665
60 1856 2043
70 2267 2492
80 2787 3046
90 3269 3559
100 3644 3944
Table 6.8: Average CS Lengths
Table 6.8 shows for each set of 200 FSMs with number of states ranging 10 to
100, the average checking sequence length of our method and Sima˜o et al.’s method.
Figure 6.4 shows the data in Table 6.8 as a chart.
Figure 6.4: Average CS Lengths
Table 6.9 contains the average improvement over Sima˜o et al.’s method in terms
56
of checking sequence length. First column shows the average improvement calcu-
lated for all FSMs. Second and third columns show the average improvement when
FSMs with non-uniform distinguishing sequences and FSMs with uniform distin-
guishing sequences considered separately. The values are calculated by averaging
the improvements of each FSM in a set. In Figure 6.6 average improvements are
shown as a chart.
Considering the performance of our method only, Figure 6.5 shows the box plot
for CS lengths. A box plot is interpreted as follows. For each set of FSMs on the
horizontal axis the plot contains a box on a vertical line segment. The ends of a line
segment marks the minimum and maximum values. The box contains the middle
50% of the values and its upper and lower edges are on the 75th percentile and 25th
percentile respectively. The line inside the box shows the median value.
57
Figure 6.5: Our Method’s CS Lenghts as a Box Plot
When FSMs with non-uniform distinguishing sequences are considered, our method
on average produces significantly shorter checking sequences. The average improve-
ment varies between 7, 95% and 15, 23%. However when FSMs with uniform distin-
guishing sequences are considered our method’s improvement is not significant and
both methods have more or less the same performance. This performance difference
between uniform and non-uniform distinguishing sequences stem from the fact that
Sima˜o et al.’s method uses overlapping between distinguishing sequences, but our
method have no such consideration. Our method does not search for overlapping
since using conditonal recognition we are able recognize nearly all nodes in the se-
quence (only if a transition is not invertible then conditional recognition does not
work). However when distinguishing sequence is uniform then Sima˜o et al.’s method
can find many overlaps, thus it can also recognize many nodes like our method. For
non-uniform distinguishing sequences chances of overlap greatly reduces, thus our
method creates a significant difference in checking sequence length.
In Figure 6.7, the improvement of our method over Sima˜o et al.’s method in
checking sequence length shown as a box plot. It can be seen that although for
some FSMs our method may perform worse than Sima˜o et al.’s method, in majority
of the cases it performs better and improvement can be as high as 30%.
When method execution times are considered, Figure 6.8 shows the average run-
58
Number FSMs with FSMs with
of States All FSMs (%) non-uniform DS (%) uniform DS (%)
10 12,61 12,99 2,14
20 14,22 15,23 2,64
30 11,50 12,78 1,63
40 11,08 11,65 2,08
50 11,32 11,55 2,39
60 9,06 9,34 1,17
70 8,88 9,20 2,09
80 8,36 8,80 1,37
90 8,07 8,34 -2,73
100 7,46 7,95 0,88
Table 6.9: Average Improvements Over Sima˜o et al.’s Method
ning times of our method and Sima˜o et al.’s method together as a chart. The results
show that there is no significant difference in running time, with this experimental
setup where we did not use expensive candidate elimination techniques. However
the length of the sequence is reduced by at least 7% in our experiments on FSMs
upto 100 states.
6.3.2 Contributions of Phase 1 and Phase 2
In this section, we will analyze experimental results for our method only and present
the contributions of Phase 1 and Phase 2 to the average checking sequence length
and average execution times.
Figure 6.9 shows the contributions of Phase 1 and Phase 2 to the average checking
sequence lengths. In Figure 6.11 the average distribution of the execution time
between Phase 1 and Phase 2 is shown. The percentage contribution of Phase 2 to
the checking sequence length and time increases with the size of the FSM. Although
the time taken by Phase 2 seems to contribute more and more as the size of the
FSM grows, the percentage contribution of Phase 2 to the length of the checking
sequence seems to be saturating around 30%. Percentage contribution of Phase 2
to CS length shown in Figure 6.10.
59
Figure 6.6: Average Improvements Over Sima˜o et al.’s Method
6.3.3 Effect of Candidate Elimination Using a Set of Incom-
patible Nodes
When the candidate elimination method candidate elimination using a set of in-
compatible nodes is also used the average length of checking sequences generated by
our method further reduces but the execution times increase as expected. Results
will be given when the maximum cardinality, k, of the incompatible set is set to 5
and 10 (k = 5 and k = 10). Remember that k = 1 actually means using the method
candidate elimination using a recognized node only and the results for this case were
already given in the previous sections.
Note that the value of k has no effect on the length of the sequence generated at
the end of the Phase 1 since during Phase 1 candidate elimination methods are not
used. As the value of k increases a better analysis of the uncertainty automaton will
be performed and this is expected to reduce the length of the extension introduced
by Phase 2. Figure 6.12 shows the improvement on this extension length by giving
the ratio of extension lengths when k = 5 and k = 10 to the extension length when
k = 1. The experiments show that as the size of the FSM gets bigger although k = 5
and k = 10 cases perform a more complex analysis their extension length quickly
approach to the extension length of case k = 1.
The value of k does not affect the time spent in Phase 1. As the value of k
increases since a more complex analysis is performed the running time of Phase 2
60
Figure 6.7: Improvements Over Sima˜o et al.’s Method as a Box Plot
is expected to increase. Figure 6.13 shows the speed down factor when k = 5 and
k = 10 compared to case k = 1. The experiments show that the analysis required
by k = 5 and k = 10 cases slow down the execution at least more than 5 times.
61
Figure 6.8: Average Method Execution Times
Figure 6.9: Contributions of Phase 1 and Phase 2 to CS Length
62
Figure 6.10: Percentage Contribution of Phase 2 CS Length
Figure 6.11: Distribution of Execution Time between Phase 1 and Phase 2
63
Figure 6.12: Effect of Candidate Elimination Using a Set of Incompatible Nodes on
Length
Figure 6.13: Effect of Candidate Elimination Using a Set of Incompatible Nodes on
Time
64
Chapter 7
Conclusion
In this thesis, three aspects of FSM based testing is addressed.
The absence of a benchmark set of FSMs makes it difficult to compare the
performance of different checking sequence generation methods. Although it is not
clear at all how well a randomly generated FSM will represent the features of a real
FSM specification, it seems that there is no easy way of obtaining an extensive set
of FSMs other than generating them randomly.
For this reason, we have implemented a random FSM generation tool in order to
satisfy an important need in FSM based testing. For FSMs with different properties
(such as being strongly connected or not, having a PDS/ADS or not, etc.) there
are different and specialized methods for generating test and checking sequences.
The tool does not only generate random FSMs of the certain type required by
the checking sequence generation method developed within this thesis, but it can
generate quite a wide range of types of FSMs with the required properties to support
FSM based testing in general. Generation of a random FSM with some properties
are difficult when it is left to chance. We developed some algorithms to obtain FSMs
with these properties. Our main consideration while designing these algorithms was
to affect the randomness of an FSM as little as possible. The thesis contains the
details of the algorithms we developed to create random FSMs with such properties.
One especially hard to satisfy property turns out to be existence of preset dis-
tinguishing sequence (PDS). The existence check of PDS (which is known to be
PSPACE–complete) is currently performed by an exponential analysis in the tool.
Therefore generating large FSMs with PDS is quite difficult. Although we have
65
suggested the shuﬄing method to speed up regenerating candidate FSMs for this
method, it seems that an approach that will generate a random FSM from a certain
class of FSMs guaranteed to have PDS will be more effective.
Second contribution of the thesis is a method that can answer the following ques-
tion: Given an input output sequence X/Y and a distinguishing sequence D¯ for an
FSM M , is X/Y a checking sequence for M which is generated by using D¯ The
method uses state recognition techniques already existing in the literature, such as
d- and t-recognition. However we also introduce some novel state recognition meth-
ods. Although this increases the capability of the method to recognize a checking
sequence, it is still possible that a sequence being found not to be a checking se-
quence even if it is really a checking sequence, i.e. we may have false negatives.
However, it is not possible for us to have false positives. In other words, whenever a
sequence is found to be a checking sequence, it is really a checking sequence for M .
Although this check is used as a termination condition in our checking sequence
generation method, we believe the information provided by the uncertainty automa-
ton can be used in the context of passive testing as well. In passive testing, the tester
is nothing more than an observer of the IUT in its normal operation and cannot
interfere with the operation of the system for testing purposes. Being a passive ob-
server, the tester can only declare the implementation under test (IUT) to be faulty
whenever an unexpected behavior is seen. However, if IUT keeps producing the ex-
pected responses, there is no verdict for the passive testing activity. We conjecture
that the input output behavior X/Y of IUT as observed by the tester can be used
to build an uncertainty automaton to see if the observation seen so far is a checking
sequence or not. Even if it does not turn out to be a checking sequence, the size of
the uncertainty automaton, probably together with the sizes of the candidate sets
of the unrecognized nodes of it, may provide a metric to judge how close X/Y is to
being a checking sequence.
The final and major contribution of the thesis is a new distinguishing sequence
based checking sequence generation algorithm. Our method is based on a recent
method that uses a local optimization. This local optimization based method in
most cases yield better results than the existing global optimization based methods.
Our method consists of two phases, in the first phase a sequence is generated with
66
little consideration in state recognition. If the sequence generated in first phase is
not a checking sequence then it is extended to a checking sequence in Phase 2. The
experimental results have shown that our method achieves at least 7% reduction
in the length of the checking sequence over the method that it is based on. We
think that, there is still a room for further improvement using our method. The
experiments show that approximately 30% of the checking sequence length stem
from the extensions in Phase 2 and this extension length can be reduced. In Phase
2 of the algorithm we implemented a very simple idea to extend the sequence to a
checking sequence. However a closer analysis of the final form of the uncertainty
automaton may actually yield shorter extensions required. As a future work, we
want to find some good heuristics that makes these extensions more cleverly. It may
also be worthwhile to reconsider our eager and careless conditional state recognition
approach in Phase 1.
Another promising research direction seems to be the fusion of the new local
optimization based methods and the old global optimization based methods. Al-
though the new methods generate shorter sequences, they are based on very greedy
ideas. Hence bringing in some kind of global knowledge into the algorithms may
improve the performance even more.
67
Bibliography
[1] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Princiles, Tech-
niques, and Tools. Addison-Wesley, 1986.
[2] A.V. Aho, A. T. Dahbura, D. Lee, and M.U. Uyar. An optimization technique
for protocol conformance test generation based on uio sequences and rural chi-
nese postman tours. IEEE Transactions on Communications, pages 1604–1615,
1991.
[3] Jessica Chen, Robert M. Hierons, Hasan Ural, and Hu¨snu¨ Yenigu¨n. Eliminat-
ing redundant tests in a checking sequence. In Ferhat Khendek and Rachida
Dssouli, editors, TestCom, volume 3502 of Lecture Notes in Computer Science,
pages 146–158. Springer, 2005.
[4] T. S. Chow. Testing software design modeled by finite-state machines. IEEE
Trans. Software Eng., pages 178–187, 1978.
[5] Karnig Derderian, Robert M. Hierons, Mark Harman, and Qiang Guo. Au-
tomated unique input output sequence generation for conformance testing of
fsms. Comput. J., 49(3):331–344, 2006.
[6] Lihua Duan and Jessica Chen. Reducing test sequence length using invertible
sequences. In Michael Butler, Michael G. Hinchey, and Mar´ıa M. Larrondo-
Petrie, editors, ICFEM, volume 4789 of Lecture Notes in Computer Science,
pages 171–190. Springer, 2007.
[7] A.D. Friedman and P.R. Menon. Fault Detection in Digital Circuits. Englewood
Cliffs, NJ: Prentice-Hall, 1971.
68
[8] Michael R. Garey and David S. Johnson. Computers and Intractabilty: A Guide
to the Theory of NP-Completeness. 1979.
[9] A. Gill. Introduction to the Theory of Finite-State Machines. New York:
McGraw-Hill, 1962.
[10] F. C. Hennie. Fault detecting experiments for sequential circuits. Proc. 5th
Ann. Symp. Switching Circuit Theory and Logical Design, pages 95–110, 1964.
[11] Rob M. Hierons and Hasan Ural. Optimizing the length of checking sequences.
IEEE Transanctions on Computers, 55(5):618–629, 2006.
[12] Robert M. Hierons. Extending test sequence overlap by invertibility. Comput.
J., 39(4):325–330, 1996.
[13] Robert M. Hierons. Testing from a finite-state machine: Extending invertibility
to sequences. Comput. J., 40(4):220–230, 1997.
[14] Robert M. Hierons and Hasan Ural. Reduced length checking sequences. IEEE
Trans. Comput., 51(9):1111–1117, 2002.
[15] Z. Kohavi. Switching and Finite Automata Theory. New York: McGraw-Hill,
1978.
[16] D. Lee and M. Yannakakis. Testing finite-state machines: State identification
and verification. IEEE Transactions on Computers, 43(3):306–320, 1994.
[17] D. Lee and M. Yannakakis. Principles and methods of testing fsms - a survey.
Proceedings of the IEEE, 84:1090–1123, 1996.
[18] R.E. Miller and S. Paul. On the generation of minimal-length conformance
tests for communication protocols. IEEE/ACM Trans. Netw., pages 116–129,
1993.
[19] Edward F. Moore. Gedanken experiments on sequential machines. In Automata
Studies, pages 129–153. Princeton U., 1956.
[20] S. Naito and M. Tsunoyama. Fault detection for sequential machines by tran-
sition tours. Proc. Ilth IEEE Fault Tolerant Comput. Symp., pages 238–243,
1981.
69
[21] I. Pomeranz and S.M. Reddy. Functional test generation for full scan circuits.
Proc. Conf. on Design, Automation and Test in Europe, pages 396–403, 2000.
[22] K. K. Sabnani and A. T. Dahbura. A protocol test generation procedure.
Computer Networks and ISDN Syst., pages 285–297, 1988.
[23] D.P. Sidhu and T.K. Leung. Formal methods for protocol testing: a detailed
study. IEEE Trans. Softw. Eng., pages 413–426, 1989.
[24] Adenilso Sima˜o and Alexandre Petrenko. Generating checking sequences for
partial reduced finite state machines. In TestCom ’08 / FATES ’08: Pro-
ceedings of the 20th IFIP TC 6/WG 6.1 International Conference on Testing
of Software and Communicating Systems, pages 153–168, Berlin, Heidelberg,
2008. Springer-Verlag.
[25] Hasan Ural, Xiaolin Wu, and Fan Zhang. On minimizing the lengths of checking
sequences. IEEE Transactions on Computers, 46(1):93–99, 1997.
[26] Hasan Ural and Fan Zhang. Reducing the lengths of checking sequences by
overlapping. In M. U¨mit Uyar, Ali Y. Duale, and Mariusz A. Fecko, editors,
TestCom, volume 3964 of Lecture Notes in Computer Science, pages 274–288.
Springer, 2006.
70
