198 research outputs found

    An implementation research on software defect prediction using machine learning techniques

    Get PDF
    Software defect prediction is the process of improving software testing process by identifying defects in the software. It is accomplished by using supervised machine learning with software metrics and defect data as variables. While the theory behind software defect prediction has been validated in previous studies, it has not widely been implemented into practice. In this thesis, a software defect prediction framework is implemented for improving testing process resource allocation and software release time optimization at RELEX Solutions. For this purpose, code and change metrics are collected from RELEX software. The used metrics are selected with the criteria of their frequency of usage in other software defect prediction studies, and availability of the metric in metric collection tools. In addition to metric data, defect data is collected from issue tracker. Then, a framework for classifying the collected data is implemented and experimented on. The framework leverages existing machine learning algorithm libraries to provide classification functionality, using classifiers which are found to perform well in similar software defect prediction experiments. The results from classification are validated utilizing commonly used classifier performance metrics, in addition to which the suitability of the predictions is verified from a use case point of view. It is found that software defect prediction does work in practice, with the implementation achieving comparable results to other similar studies when measuring by classifier performance metrics. When validating against the defined use cases, the performance is found acceptable, however the performance varies between different data sets. It is thus concluded that while results are tentatively positive, further monitoring with future software versions is needed to verify performance and reliability of the framework

    Predicting Software Fault Proneness Using Machine Learning

    Get PDF
    Context: Continuous Integration (CI) is a DevOps technique which is widely used in practice. Studies show that its adoption rates will increase even further. At the same time, it is argued that maintaining product quality requires extensive and time consuming, testing and code reviews. In this context, if not done properly, shorter sprint cycles and agile practices entail higher risk for the quality of the product. It has been reported in literature [68], that lack of proper test strategies, poor test quality and team dependencies are some of the major challenges encountered in continuous integration and deployment. Objective: The objective of this thesis, is to bridge the process discontinuity that exists between development teams and testing teams, due to continuous deployments and shorter sprint cycles, by providing a list of potentially buggy or high risk files, which can be used by testers to prioritize code inspection and testing, reducing thus the time between development and release. Approach: Out approach is based on a five step process. The first step is to select a set of systems, a set of code metrics, a set of repository metrics, and a set of machine learning techniques to consider for training and evaluation purposes. The second step is to devise appropriate client programs to extract and denote information obtained from GitHub repositories and source code analyzers. The third step is to use this information to train the models using the selected machine learning techniques. This step allowed to identify the best performing machine learning techniques out of the initially selected in the first step. The fourth step is to apply the models with a voting classifier (with equal weights) and provide answers to five research questions pertaining to the prediction capability and generality of the obtained fault proneness prediction framework. The fifth step is to select the best performing predictors and apply it to two systems written in a completely different language (C++) in order to evaluate the performance of the predictors in a new environment. Obtained Results: The obtained results indicate that a) The best models were the ones applied on the same system as the one trained on; b) The models trained using repository metrics outperformed the ones trained using code metrics; c) The models trained using code metrics were proven not adequate for predicting fault prone modules; d) The use of machine learning as a tool for building fault-proneness prediction models is promising, but still there is work to be done as the models show weak to moderate prediction capability. Conclusion: This thesis provides insights into how machine learning can be used to predict whether a source code file contains one or more faults that may contribute to a major system failure. The proposed approach is utilizing information extracted both from the system’s source code, such as code metrics, and from a series of DevOps tools, such as bug repositories, version control systems and, testing automation frameworks. The study involved five Java and five Python systems and indicated that machine learning techniques have potential towards building models for alerting developers about failure prone code

    EMPIRICAL CHARACTERIZATION OF SOFTWARE QUALITY

    Get PDF
    The research topic focuses on the characterization of software quality considering the main software elements such as people, process and product. Many attributes (size, language, testing techniques etc.) probably could have an effect on the quality of software. In this thesis we aim to understand the impact of attributes of three P’s (people, product, process) on the quality of software by empirical means. Software quality can be interpreted in many ways, such as customer satisfaction, stability and defects etc. In this thesis we adopt ‘defect density’ as a quality measure. Therefore the research focus on the empirical evidences of the impact of attributes of the three P’s on the software defect density. For this reason empirical research methods (systematic literature reviews, case studies, and interviews) are utilized to collect empirical evidence. Each of this research method helps to extract the empirical evidences of the object under study and for data analysis statistical methods are used. Considering the product attributes, we have studied the size, language, development mode, age, complexity, module structure, module dependency, and module quality and their impact on project quality. Considering the process attributes, we have studied the process maturity and structure, and their impact on the project quality. Considering the people attributes, we have studied the experience and capability, and their impact on the project quality. Moreover, in the process category, we have studied the impact of one testing approach called ‘exploratory testing’ and its impact on the quality of software. Exploratory testing is a widely used software-testing practice and means simultaneous learning, test design, and test execution. We have analyzed the exploratory testing weaknesses, and proposed a hybrid testing approach in an attempt to improve the quality. Concerning the product attributes, we found that there exist a significant difference of quality between open and close source projects, java and C projects, and large and small projects. Very small and defect free modules have impact on the software quality. Different complexity metrics have different impact on the software quality considering the size. Product complexity as defined in Table 53 has partial impact on the software quality. However software age and module dependencies are not factor to characterize the software quality. Concerning the people attributes, we found that platform experience, application experience and language and tool experience have significant impact on the software quality. Regarding the capability we found that programmer capability has partial impact on the software quality where as analyst capability has no impact on the software quality. Concerning process attributes we found that there is no difference of quality between the project developed under CMMI and those that are not developed under CMMI. Regarding the CMMI levels there is difference of software quality particularly between CMMI level 1 and CMMI level 3. Comparing different process types we found that hybrid projects are of better quality than waterfall projects. Process maturity defined by (SEI-CMM) has partial impact on the software quality. Concerning exploratory testing, we found that exploratory testing weaknesses induce the testing technical debt therefore a process is defined in conjunction with the scripted testing in an attempt to reduce the associated technical debt of exploratory testing. The findings are useful for both researchers and practitioners to evaluate their project

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science
    corecore