Search CORE

10,885 research outputs found

A comprehensible analysis of the efficacy of Ensemble Models for Bug Prediction

Author: Garcia Rogério Eduardo
Marçal Ingrid
Publication venue
Publication date: 18/10/2023
Field of study

The correctness of software systems is vital for their effective operation. It makes discovering and fixing software bugs an important development task. The increasing use of Artificial Intelligence (AI) techniques in Software Engineering led to the development of a number of techniques that can assist software developers in identifying potential bugs in code. In this paper, we present a comprehensible comparison and analysis of the efficacy of two AI-based approaches, namely single AI models and ensemble AI models, for predicting the probability of a Java class being buggy. We used two open-source Apache Commons Project's Java components for training and evaluating the models. Our experimental findings indicate that the ensemble of AI models can outperform the results of applying individual AI models. We also offer insight into the factors that contribute to the enhanced performance of the ensemble AI model. The presented results demonstrate the potential of using ensemble AI models to enhance bug prediction results, which could ultimately result in more reliable software systems

arXiv.org e-Print Archive

Parameter tuning in KNN for software defect prediction: an empirical analysis

Author: Adeyemo Victor Elijah
Atoyebi Jelili Olaniyi
Balogun Abdullateef Olwagbemiga
Jibril Hajarah Afor
Mabayoje Modinat Abolore
Mojeed Hammed Adeleye
Publication venue: 'Institute of Research and Community Services Diponegoro University (LPPM UNDIP)'
Publication date: 01/10/2019
Field of study

Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP

Directory of Open Access Journals

Jurnal Teknologi dan Sistem Komputer

Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

Author: Boyd-Graber Jordan
Danescu-Niculescu-Mizil Cristian
Kumar Srijan
Niculae Vlad
Publication venue
Publication date: 01/01/2015
Field of study

Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal. We reveal that subtle signs of imminent betrayal are encoded in the conversational patterns of the dyad, even if the victim is not aware of the relationship's fate. In particular, we find that lasting friendships exhibit a form of balance that manifests itself through language. In contrast, sudden changes in the balance of certain conversational attributes---such as positive sentiment, politeness, or focus on future planning---signal impending betrayal.Comment: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal

arXiv.org e-Print Archive

CiteSeerX

Crossref

An Analysis of Software Defect Prediction Studies through Reproducibility and Replication

Author: Mahmood Zaheed
Publication venue
Publication date: 06/09/2018
Field of study

University of Hertfordshire Research Archive

An extensive analysis of efficient bug prediction configurations

Author: Ghafari Mohammad
Lungu Mircea
Nierstrasz Oscar
Osman Haidar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Background: Bug prediction helps developers steer maintenance activities towards the buggy parts of a software. There are many design aspects to a bug predictor, each of which has several options, i.e., software metrics, machine learning model, and response variable. Aims: These design decisions should be judiciously made because an improper choice in any of them might lead to wrong, misleading, or even useless results. We argue that bug prediction con?gurations are intertwined and thus need to be evaluated in their entirety, in contrast to the common practice in the ?eld where each aspect is investigated in isolation. Method: We use a cost-aware evaluation scheme to evaluate 60 di?erent bug prediction con?guration combinations on ?ve open source Java projects. Results:We ?nd out that the best choices for building a cost-e?ective bug predictor are change metrics mixed with source code metrics as independent variables, Random Forest as the machine learning model, and the number of bugs as the response variable. Combining these con?guration options results in the most e?cient bug predictor across all subject systems. Conclusions: We demonstrate a strong evidence for the interplay among bug prediction con?gurations and provide concrete guidelines for researchers and practitioners on how to build and evaluate e?cient bug predictors

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Bern Open Repository and Information System (BORIS)

Dissertations of the University of Groningen

Customer retention

Author: Fourie Andre'
Publication venue
Publication date: 01/01/2018
Field of study

A research report submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in partial fulfillment of the requirements for the degree of Master of Science in Engineering. Johannesburg, May 2018The aim of this study is to model the probability of a customer to attrite/defect from a bank where, for example, the bank is not their preferred/primary bank for salary deposits. The termination of deposit inflow serves as the outcome parameter and the random forest modelling technique was used to predict the outcome, in which new data sources (transactional data) were explored to add predictive power. The conventional logistic regression modelling technique was used to benchmark the random forest’s results. It was found that the random forest model slightly overfit during the training process and loses predictive power during validation and out of training period data. The random forest model, however, remains predictive and performs better than logistic regression at a cut-off probability of 20%.MT 201

Wits Institutional Repository on DSPACE

An implementation research on software defect prediction using machine learning techniques

Author: Pulliainen Laur
Publication venue: Helsingfors universitet
Publication date: 01/01/2018
Field of study

Software defect prediction is the process of improving software testing process by identifying defects in the software. It is accomplished by using supervised machine learning with software metrics and defect data as variables. While the theory behind software defect prediction has been validated in previous studies, it has not widely been implemented into practice. In this thesis, a software defect prediction framework is implemented for improving testing process resource allocation and software release time optimization at RELEX Solutions. For this purpose, code and change metrics are collected from RELEX software. The used metrics are selected with the criteria of their frequency of usage in other software defect prediction studies, and availability of the metric in metric collection tools. In addition to metric data, defect data is collected from issue tracker. Then, a framework for classifying the collected data is implemented and experimented on. The framework leverages existing machine learning algorithm libraries to provide classification functionality, using classifiers which are found to perform well in similar software defect prediction experiments. The results from classification are validated utilizing commonly used classifier performance metrics, in addition to which the suitability of the predictions is verified from a use case point of view. It is found that software defect prediction does work in practice, with the implementation achieving comparable results to other similar studies when measuring by classifier performance metrics. When validating against the defined use cases, the performance is found acceptable, however the performance varies between different data sets. It is thus concluded that while results are tentatively positive, further monitoring with future software versions is needed to verify performance and reliability of the framework

Helsingin yliopiston digitaalinen arkisto

Sampling imbalance dataset for software defect prediction using hybrid neuro-fuzzy systems with Naive Bayes classifier

Author: B. Latha
K. Punitha
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2016
Field of study

Predviđanje grešaka u računalnom programu (SDP-software defect prediction) je težak zadatak kad se radi o projektima računalnog programa. Taj je postupak koristan za identifikaciju i lokaciju neispravnosti iz modula. Taj će zadatak postati skuplji uz dodatak složenih mehanizama za ispitivanje i ocjenjivanje kad se poveća veličina modula programa. Daljnje konsistentne i disciplinirane provjere programa nude nekoliko prednosti, na pr. točnost u procjeni troškova i programiranja projekta, povećanje kvalitete postupka i proizvoda. Detaljna analiza metričkih podataka programa također može značajno pomoći u lociranju mogućih grešaka u programskom kodiranju. Osnovni je cilj ovoga rada predstaviti metode za detekciju i otkrivanje grešaka u programu primjenom postupaka strojnog učenja. U radu su korišteni nebalansirani nizovi podaka iz NASA-inog Metrics Data Programa (MDP) i programska metrika niza podataka izabrana je primjenom Genetičkog algoritma metodom Optimizacije kolonije mrava (Ant Colony Optimization -GACO). Postupak uzorkovanja metodom Modified Co Forest - polu-nadgledanog učenja, generira balansirano označene nizove podataka koristeći nebalansirane nizove, a primjenjuje se za učinkoviti postupak otkrivanja greške u programu s Hibridnim Neuro-Fuzzy sustavima za strojno učenje po Naive Bayes metodama. Eksperimentalni rezultati predložene metode dokazuju da je ova metoda za otkrivanje greške u računalnom program učinkovitija od drugih postojećih metoda, s boljim rezultatima u predviđanju greške.Software defect prediction (SDP) is a process with difficult tasks in the case of software projects. The SDP process is useful for the identification and location of defects from the modules. This task will tend to become more costly with the addition of complex testing and evaluation mechanisms, when the software project modules size increases. Further measurement of software in a consistent and disciplined manner offers several advantages like accuracy in the estimation of project costs and schedules, and improving product and process qualities. Detailed analysis of software metric data also gives significant clues about the locations of possible defects in a programming code. The main goal of this proposed work is to introduce software defects detection and prevention methods for identifying defects from software using machine learning approaches. This proposed work used imbalanced datasets from NASA’s Metrics Data Program (MDP) and software metrics of datasets are selected by using Genetic algorithm with Ant Colony Optimization (GACO) method. The sampling process with semi supervised learning Modified Co Forest method generates the balanced labelled using imbalanced datasets, which is used for efficient software defect detection process with machine learning Hybrid Neuro-Fuzzy Systems with Naive Bayes methods. The experimental results of this proposed method proves that this defect detecting machine learning method yields more efficiency and better performance in defect prediction result of software in comparison with the other available methods

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

A Systematic Literature Review and Meta-analysis on Cross Project Defect Prediction

Author: Gunarathna D
Hosseini R
Turhan B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Background: Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective: To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular na¨ıve Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research

Brunel University Research Archive