    Software quality and reliability prediction using Dempster -Shafer theory

    As software systems are increasingly deployed in mission critical applications, accurate quality and reliability predictions are becoming a necessity. Most accurate prediction models require extensive testing effort, implying increased cost and slowing down the development life cycle. We developed two novel statistical models based on Dempster-Shafer theory, which provide accurate predictions from relatively small data sets of direct and indirect software reliability and quality predictors. The models are flexible enough to incorporate information generated throughout the development life-cycle to improve the prediction accuracy.;Our first contribution is an original algorithm for building Dempster-Shafer Belief Networks using prediction logic. This model has been applied to software quality prediction. We demonstrated that the prediction accuracy of Dempster-Shafer Belief Networks is higher than that achieved by logistic regression, discriminant analysis, random forests, as well as the algorithms in two machine learning software packages, See5 and WEKA. The difference in the performance of the Dempster-Shafer Belief Networks over the other methods is statistically significant.;Our second contribution is also based on a practical extension of Dempster-Shafer theory. The major limitation of the Dempsters rule and other known rules of evidence combination is the inability to handle information coming from correlated sources. Motivated by inherently high correlations between early life-cycle predictors of software reliability, we extended Murphy\u27s rule of combination to account for these correlations. When used as a part of the methodology that fuses various software reliability prediction systems, this rule provided more accurate predictions than previously reported methods. In addition, we proposed an algorithm, which defines the upper and lower bounds of the belief function of the combination results. To demonstrate its generality, we successfully applied it in the design of the Online Safety Monitor, which fuses multiple correlated time varying estimations of convergence of neural network learning in an intelligent flight control system

    Predicting Fault-prone Software Module Using Data Mining Technique and Fuzzy Logic

    This paper discusses a new model towards reliability and quality improvement of software systems by predicting fault-prone module before testing. Model utilizes the classification capability of data mining techniques and knowledge stored in software metrics to classify the software module as fault-prone or not fault-prone. A decision tree is constructed using ID3 algorithm for existing project data in order to gain information for the purpose of decision making whether a particular module id fault-prone or not. The gained information is converted into fuzzy rules and integrated with fuzzy inference system to predict fault-prone or not fault-prone software module for target data. The model is also able to predict fault-proneness degree of faulty module. The goal is to help software manager to concentrate their testing efforts to fault-prone modules in order to improve the reliability and quality of the software system. We used NASA projects data set from the PROMOSE repository to validate the predictive accuracy of the model

    A fault detection strategy for software projects

    Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

    Predicting software faults in large space systems using machine learning techniques

    Recently, the use of machine learning (ML) algorithms has proven to be of great practical value in solving a variety of engineering problems including the prediction of failure, fault, and defect-proneness as the space system software becomes complex. One of the most active areas of recent research in ML has been the use of ensemble classifiers. How ML techniques (or classifiers) could be used to predict software faults in space systems, including many aerospace systems is shown, and further use ensemble individual classifiers by having them vote for the most popular class to improve system software fault-proneness prediction. Benchmarking results on four NASA public datasets show the Naive Bayes classifier as more robust software fault prediction while most ensembles with a decision tree classifier as one of its components achieve higher accuracy rates

    Temporospatial Context-Aware Vehicular Crash Risk Prediction

    With the demand for more vehicles increasing, road safety is becoming a growing concern. Traffic collisions take many lives and cost billions of dollars in losses. This explains the growing interest of governments, academic institutions and companies in road safety. The vastness and availability of road accident data has provided new opportunities for gaining a better understanding of accident risk factors and for developing more effective accident prediction and prevention regimes. Much of the empirical research on road safety and accident analysis utilizes statistical models which capture limited aspects of crashes. On the other hand, data mining has recently gained interest as a reliable approach for investigating road-accident data and for providing predictive insights. While some risk factors contribute more frequently in the occurrence of a road accident, the importance of driver behavior, temporospatial factors, and real-time traffic dynamics have been underestimated. This study proposes a framework for predicting crash risk based on historical accident data. The proposed framework incorporates machine learning and data analytics techniques to identify driving patterns and other risk factors associated with potential vehicle crashes. These techniques include clustering, association rule mining, information fusion, and Bayesian networks. Swarm intelligence based association rule mining is employed to uncover the underlying relationships and dependencies in collision databases. Data segmentation methods are employed to eliminate the effect of dependent variables. Extracted rules can be used along with real-time mobility to predict crashes and their severity in real-time. The national collision database of Canada (NCDB) is used in this research to generate association rules with crash risk oriented subsequents, and to compare the performance of the swarm intelligence based approach with that of other association rule miners. Many industry-demanding datasets, including road-accident datasets, are deficient in descriptive factors. This is a significant barrier for uncovering meaningful risk factor relationships. To resolve this issue, this study proposes a knwoledgebase approximation framework to enhance the crash risk analysis by integrating pieces of evidence discovered from disparate datasets capturing different aspects of mobility. Dempster-Shafer theory is utilized as a key element of this knowledgebase approximation. This method can integrate association rules with acceptable accuracy under certain circumstances that are discussed in this thesis. The proposed framework is tested on the lymphography dataset and the road-accident database of the Great Britain. The derived insights are then used as the basis for constructing a Bayesian network that can estimate crash likelihood and risk levels so as to warn drivers and prevent accidents in real-time. This Bayesian network approach offers a way to implement a naturalistic driving analysis process for predicting traffic collision risk based on the findings from the data-driven model. A traffic incident detection and localization method is also proposed as a component of the risk analysis model. Detecting and localizing traffic incidents enables timely response to accidents and facilitates effective and efficient traffic flow management. The results obtained from the experimental work conducted on this component is indicative of the capability of our Dempster-Shafer data-fusion-based incident detection method in overcoming the challenges arising from erroneous and noisy sensor readings

    An Analysis and Reasoning Framework for Project Data Software Repositories

    As the requirements for software systems increase, their size, complexity and functionality consequently increases as well. This has a direct impact on the complexity of numerous artifacts related to the system such as specification, design, implementation and, testing models. Furthermore, as the software market becomes more and more competitive, the need for software products that are of high quality and require the least monetary, time and human resources for their development and maintenance becomes evident. Therefore, it is important that project managers and software engineers are given the necessary tools to obtain a more holistic and accurate perspective of the status of their projects in order to early identify potential risks, flaws, and quality issues that may arise during each stage of the software project life cycle. In this respect, practitioners and academics alike have recognized the significance of investigating new methods for supporting software management operations with respect to large software projects. The main target of this M.A.Sc. thesis is the design of a framework in terms of, first, a reference architecture for mining and analyzing of software project data repositories according to specific objectives and analytic knowledge, second, the techniques to model such analytic knowledge and, third, a reasoning methodology for verifying or denying hypotheses related to analysis objectives. Such a framework could assist project managers, team leaders and development teams towards more accurate prediction of project traits such as quality analysis, risk assessment, cost estimation and progress evaluation. More specifically, the framework utilizes goal models to specify analysis objectives as well as, possible ways by which these objectives can be achieved. Examples of such analysis objectives for a project could be to yield, high code quality, achieve low production cost or, cope with tight delivery deadlines. Such goal models are consequently transformed into collections of Markov Logic Network rules which are then applied to the repository data in order to verify or deny with a degree of probability, whether the particular project objectives can be met as the project evolves. The proposed framework has been applied, as a proof of concept, on a repository pertaining to three industrial projects with more that one hundred development tasks