484,152 research outputs found

    Software Reliability prediction using Ensemble Model

    Get PDF
    Software Reliability is the key factor of software quality estimation and prediction during testing period. We have implemented three models such as Radial Basis Function Neural Network (RBFNN) model, Ensemble model based on two types Feed Forward Neural Networks and one Radial Basis Function Neural Network and Radial basis function Neural Network Ensembles (RNNE) model for Software reliability prediction over five benchmark datasets. We have used Bayesian regularization method on all three models to avoid over-fitting problem and generalization of the neural network. We have been used two types of meaningful performance measures such as Relative Error (RE) and Average Errors (AE) for software reliability prediction. The results of all three proposed models have been compared with some traditional models such as Duane model and Artificial neural networks like Feed Forward Neural Network (FFNN) model. The experimental result shows that the nonparametric growth model called Ensemble model (multiple predictors) shows best minimal error than parametric model. Finally, It has been observed that the multiple predictors like Ensemble model always shows the best performance than single predictor like artificial neural network and some other traditional neural networ

    Ensemble missing data techniques for software effort prediction

    Get PDF
    Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble

    Improving Defect Prediction Models by Combining Classifiers Predicting Different Defects

    Get PDF
    Background: The software industry spends a lot of money on finding and fixing defects. It utilises software defect prediction models to identify code that is likely to be defective. Prediction models have, however, reached a performance bottleneck. Any improvements to prediction models would likely yield less defects-reducing costs for companies. Aim: In this dissertation I demonstrate that different families of classifiers find distinct subsets of defects. I show how this finding can be utilised to design ensemble models which outperform other state-of-the-art software defect prediction models. Method: This dissertation is supported by published work. In the first paper I explore the quality of data which is a prerequisite for building reliable software defect prediction models. The second and third papers explore the ability of different software defect prediction models to find distinct subsets of defects. The fourth paper explores how software defect prediction models can be improved by combining a collection of classifiers that predict different defective components into ensembles. An additional, non-published work, presents a visual technique for the analysis of predictions made by individual classifiers and discusses some possible constraints for classifiers used in software defect prediction. Result: Software defect prediction models created by classifiers of different families predict distinct subsets of defects. Ensembles composed of classifiers belonging to different families outperform other ensemble and standalone models. Only a few highly diverse and accurate base models are needed to compose an effective ensemble. This ensemble can consistently predict a greater number of defects compared to the increase in incorrect predictions. Conclusion: Ensembles should not use the majority-voting techniques to combine decisions of classifiers in software defect prediction as this will miss correct predictions of classifiers which uniquely identify defects. Some classifiers could be less successful for software defect prediction due to complex decision boundaries of defect data. Stacking based ensembles can outperform other ensemble and stand-alone techniques. I propose new possible avenues of research that could further improve the modelling of ensembles in software defect prediction. Data quality should be explicitly considered prior to experiments for researchers to establish reliable results

    A systematic literature review of machine learning techniques for software maintainability prediction

    Get PDF
    Context: Software maintainability is one of the fundamental quality attributes of software engineering. The accurate prediction of software maintainability is a significant challenge for the effective management of the software maintenance process. Objective: The major aim of this paper is to present a systematic review of studies related to the prediction of maintainability of object-oriented software systems using machine learning techniques. This review identifies and investigates a number of research questions to comprehensively summarize, analyse and discuss various viewpoints concerning software maintainability measurements, metrics, datasets, evaluation measures, individual models and ensemble models. Method: The review uses the standard systematic literature review method applied to the most common computer science digital database libraries from January 1991 to July 2018. Results: We survey 56 relevant studies in 35 journals and 21 conference proceedings. The results indicate that there is relatively little activity in the area of software maintainability prediction compared with other software quality attributes. CHANGE maintenance effort and the maintainability index were the most commonly used software measurements (dependent variables) employed in the selected primary studies, and most made use of class-level product metrics as the independent variables. Several private datasets were used in the selected studies, and there is a growing demand to publish datasets publicly. Most studies focused on regression problems and performed k-fold cross-validation. Individual prediction models were employed in the majority of studies, while ensemble models relatively rarely. Conclusion: Based on the findings obtained in this systematic literature review, ensemble models demonstrated increased accuracy prediction over individual models, and have been shown to be useful models in predicting software maintainability. However, their application is relatively rare and there is a need to apply these, and other models to an extensive variety of datasets with the aim of improving the accuracy and consistency of results

    Pinset : A DSL for extracting datasets from models for data mining-based quality analysis

    Get PDF
    Data mining techniques have been successfully applied to software quality analysis and assurance, including quality of modeling artefacts. Before such techniques can be used, though, data under analysis commonly need to be formatted into two-dimensional tables. This constraint is imposed by data mining algorithms, which typically require a collection of records as input for their computations. The process of extracting data from the corresponding sources and formatting them properly can become error-prone and cumbersome. In the case of models, this process is mostly carried out through scripts written in a model management language, such as EOL or ATL. To improve this situation, we present Pinset, a domain-specific language devised for the extraction of tabular datasets from software models. Pinset offers a tailored syntax and built-in facilities for common activities in dataset extraction. For evaluation, Pinset has been used on UML class diagrams to calculate metrics that can be employed as input for several fault-prediction algorithms. The use of Pinset for this calculations led to more compact and high-level specifications when compared to equivalent scripts written in generic model management languages

    Ensemble model-based method for time series sensors’ data validation and imputation applied to a real waste water treatment plant

    Get PDF
    Intelligent Decision Support Systems (IDSSs) integrate different Artificial Intelligence (AI) techniques with the aim of taking or supporting human-like decisions. To this end, these techniques are based on the available data from the target process. This implies that invalid or missing data could trigger incorrect decisions and therefore, undesirable situations in the supervised process. This is even more important in environmental systems, which incorrect malfunction could jeopardise related ecosystems. In data-driven applications such as IDSS, data quality is a basal problem that should be addressed for the sake of the overall systems’ performance. In this paper, a data validation and imputation methodology for time-series is presented. This methodology is integrated in an IDSS software tool which generates suitable control set-points to control the process. The data validation and imputation approach presented here is focused on the imputation step, and it is based on an ensemble of different prediction models obtained for the sensors involved in the process. A Case-Based Reasoning (CBR) approach is used for data imputation, i.e., similar past situations to the current one can propose new values for the missing ones. The CBR model is complemented with other prediction models such as Auto Regressive (AR) models or Artificial Neural Network (ANN) models. Then, the different obtained predictions are ensembled to obtain a better prediction performance than the obtained by each individual prediction model separately. Furthermore, the use of a meta-prediction model, trained using the predictions of all individual models as inputs, is proposed and compared with other ensemble methods to validate its performance. Finally, this approach is illustrated in a real Waste Water Treatment Plant (WWTP) case study using one of the most relevant measures for the correct operation of the WWTPs IDSS, i.e., the ammonia sensor, and considering real faults, showing promising results with improved performance when using the ensemble approach presented here compared against the prediction obtained by each individual model separately.The authors acknowledge the partial support of this work by the Industrial Doctorate Programme (2017DI-006) and the Research Consolidated Groups/Centres Grant (2017 SGR 574) from the Catalan Agency of University and Research Grants Management (AGAUR), from Catalan Government.Peer ReviewedPostprint (published version

    Models for estimating leaf area in the ‘Palmer’ mango

    Get PDF
    Techniques for measuring leaf area are basic for evaluating plant growth in the mango. As such, the aim of this study was to determine the leaf area of the ‘Palmer’ mango using mathematical models proposed by the present study, and compare the results of the proposed models with models available in the literature for other mango cultivars. The mango leaf was simulated as a function of leaf length (L) and width (W) using two distinct geometric models: an ellipse and a rosacea petal. Models found in the literature and determined for other cultivars, were also tested. The values for leaf area were obtained using the ImageJ software and taken at their actual value; these were later compared with the values achieved by the geometric models. The models were tested for quality of prediction through cross-validation. The models proposed in the present study were not superior to the best models found in the literature. The model LA = 3.80 + 0.67 (LW) achieved the best performance, with a mean absolute percentage error (MAPE) of 3.78%. Using only length, the best model was LA = 0.0142C2 + 6.1902C - 49.444, with a MAPE of 4.07%. The use of mathematical models proved to be a suitable option for estimating leaf area in the ‘Palmer’ mango. Moreover, the use of R2 as the only form of model quality assessment can lead to errors in choosing the best model
    corecore