Labelled-transition systems (LTS) are widely used by developers and testers to model software systems in terms of their sequential behaviour. They provide an overview of the behaviour of the system and their reaction to different inputs. LTS models are the foundation for various automated verification techniques such as model-checking and model-based testing. These techniques require up-to-date models to be meaningful. Unfortunately, software models are rare in practice. Due to the effort and time required to build these models manually, a software engineer would want to infer them automatically from traces (sequences of events or function calls).
Many techniques have focused on inferring LTS models from given traces of system execution, where these traces are produced by running a system on a series of tests. State-merging is the foundation of some of the most successful LTS inference techniques to construct LTS models. Passive inference approaches such as k-tail and Evidence-Driven State Merging (EDSM) can infer LTS models from these traces. Moreover, the best-performing methods of inferring LTS models rely on the availability of negatives, i.e. traces that are not permitted from specific states and such information is not usually available. The long-standing challenge for such inference approaches is constructing models well from very few traces and without negatives.
Active inference techniques such as Query-driven State Merging (QSM) can learn LTSs from traces by asking queries as tests to a system being learnt. It may lead to infer inaccurate LTSs since the performance of QSM relies on the availability of traces. The challenge for such inference approaches is inferring LTSs well from very few traces and with fewer queries asked.
In this thesis, investigations of the existing techniques are presented to the challenge of inferring LTS models from few positive traces. These techniques fail to find correct LTS models in cases of insufficient training data. This thesis focuses on finding better solutions to this problem by using evidence obtained from the Markov models to bias the EDSM learner towards merging states that are more likely to correspond to the same state in a model.
Markov models are used to capture the dependencies between event sequences in the collected traces. Those dependencies rely on whether elements of event permitted or prohibited to follow short sequences appear in the traces. This thesis proposed EDSM-Markov a passive inference technique that aimed to improve the existing ones in the absence of negative traces and to prevent the over-generalization problem. In this thesis, improvements obtained by the proposed learners are demonstrated by a series of experiments using randomly-generated labelled-transition systems and case studies. The results obtained from the conducted experiments showed that EDSM-Markov can infer better LTSs compared to other techniques.
This thesis also proposes modifications to the QSM learner to improve the accuracy of the inferred LTSs. This results in a new learner, which is named ModifiedQSM. This includes considering more tests to the system being inferred in order to avoid the over-generalization problem. It includes investigations of using Markov models to reduce the number of queries consumed by the ModifiedQSM learner. Hence, this thesis introduces a new LTS inference technique, which is called MarkovQSM. Moreover, enhancements of LTSs inferred by ModifiedQSM and MarkovQSM learners are demonstrated by a series of experiments. The results from the experiments demonstrate that ModifiedQSM can infer better LTSs compared to other techniques. Moreover, MarkovQSM has proven to significantly reduce the number of membership queries consumed compared to ModifiedQSM with a very small loss of accuracy