13 research outputs found

    Pengkombinasian Pohon Keputusan Data Pencilan Kelas Dan Pohon Keputusan Data Normal Untuk Peningkatan Akurasi Prediksi Cacat Perangkat Lunak

    Get PDF
    Keberadaan pencilan dalam set data prediksi cacat perangkat lunak memunculkan dua pilihan penanganan, yaitu apakah tetap digunakan dengan mengabaikan keberadaannya sebagai pencilan, ataukah dihapuskan dari set data uji coba. Pengabaian keberadaan pencilan dalam set data uji menghasilkan akurasi hasil prediksi yang secara signifikan lebih rendah dibandingkan dengan jika pencilan tersebut dihapuskan dari set data uji coba. Dalam paper ini diusulkan utilisasi pencilan kelas dalam set data prediksi cacat perang kat lunak untuk meningkatkan kinerja sistem prediksi cacat perangkat lunak. Metode yang diusulkan terdiri dari proses prediksi pencilan kontekstual dari set data uji, menggunakan model prediksi berbasis pohon keputusan tunggal. Selanjutnya dilakukan dengan proses prediksi label kelas cacat perangkat lunak menggunakan dua buah model prediksi berbasis pohon keputusan tunggal, yang masing-masing berfungsi untuk melakukan prediksi cacat perangkat lunak terhadap subset data normal dan subset data pencilan. Hasil pengujian terhadap lima set data NASA dari repository PROMISE menunjukkan bahwa metode yang diusulkan memiliki kinerja akurasi yang lebih baik dibandingkan penggunaan algoritma J48 yang mengabaikan keberadaan pencilan, maupun yang menghapuskan pencilan dari set data sampel pengujian

    Predicting and Evaluating Software Model Growth in the Automotive Industry

    Full text link
    The size of a software artifact influences the software quality and impacts the development process. In industry, when software size exceeds certain thresholds, memory errors accumulate and development tools might not be able to cope anymore, resulting in a lengthy program start up times, failing builds, or memory problems at unpredictable times. Thus, foreseeing critical growth in software modules meets a high demand in industrial practice. Predicting the time when the size grows to the level where maintenance is needed prevents unexpected efforts and helps to spot problematic artifacts before they become critical. Although the amount of prediction approaches in literature is vast, it is unclear how well they fit with prerequisites and expectations from practice. In this paper, we perform an industrial case study at an automotive manufacturer to explore applicability and usability of prediction approaches in practice. In a first step, we collect the most relevant prediction approaches from literature, including both, approaches using statistics and machine learning. Furthermore, we elicit expectations towards predictions from practitioners using a survey and stakeholder workshops. At the same time, we measure software size of 48 software artifacts by mining four years of revision history, resulting in 4,547 data points. In the last step, we assess the applicability of state-of-the-art prediction approaches using the collected data by systematically analyzing how well they fulfill the practitioners' expectations. Our main contribution is a comparison of commonly used prediction approaches in a real world industrial setting while considering stakeholder expectations. We show that the approaches provide significantly different results regarding prediction accuracy and that the statistical approaches fit our data best

    Dynamic Detection of Software Defects Using Supervised Learning Techniques

    Get PDF
    Software testing is the main step of detecting the faults in Software through executing it. Therefore, it is substantial to predict the faults that may happen while executing the software to maintain the existence of the software. There are different techniques of artificial intelligence that are utilized to predict future defects. The Machine learning is one of the most significant technique that used to build predicting models. In this paper, conducted a systematic review of the supervised machine learning techniques which are used for software defect prediction and evaluated the performance. Thus, using five state-of-the-art supervised machine learning (classifiers), for the evaluation, several of the data are used to predict software fault. In addition to, compared the performance of these classifiers with various parameters. After that, proceeds many experiments to improve the efficiency of the prediction of the defect through modifying the default parameters of the classifier. The results showed the ability of supervised machine learning algorithms to classify classes as bugs or not bugs. Thus, using supervised machine learning models for predicting software bugs is better than the traditional statistical models. Additionally, using PCA never noticeable impact on prediction systems performance while modifying the default parameters positively impact classifier values, especially with Artificial Neural Network (ANN).The main finding of this paper is gained through the application of Ensemble Learning methods, whereas Bagging achieves 95.1% accuracy with Mozilla dataset and Voting achieves 93.79% accuracy with kc1 dataset

    A Review of Metrics and Modeling Techniques in Software Fault Prediction Model Development

    Get PDF
    This paper surveys different software fault predictions progressed through different data analytic techniques reported in the software engineering literature. This study split in three broad areas; (a) The description of software metrics suites reported and validated in the literature. (b) A brief outline of previous research published in the development of software fault prediction model based on various analytic techniques. This utilizes the taxonomy of analytic techniques while summarizing published research. (c) A review of the advantages of using the combination of metrics. Though, this area is comparatively new and needs more research efforts

    A study of subgroup discovery approaches for defect prediction

    Get PDF
    Context: Although many papers have been published on software defect prediction techniques, machine learning approaches have yet to be fully explored. Objective: In this paper we suggest using a descriptive approach for defect prediction rather than the pre-cise classification techniques that are usually adopted. This allows us to characterise defective modules with simple rules that can easily be applied by practitioners and deliver a practical (or engineering) approach rather than a highly accurate result. Method: We describe two well-known subgroup discovery algorithms, the SD algorithm and the CN2-SD algorithm to obtain rules that identify defect prone modules. The empirical work is performed with pub-licly available datasets from the Promise repository and object-oriented metrics from an Eclipse reposi-tory related to defect prediction. Subgroup discovery algorithms mitigate against characteristics of datasets that hinder the applicability of classification algorithms and so remove the need for preprocess-ing techniques. Results: The results show that the generated rules can be used to guide testing effort in order to improve the quality of software development projects. Such rules can indicate metrics, their threshold values and relationships between metrics of defective modules. Conclusions: The induced rules are simple to use and easy to understand as they provide a description rather than a complete classification of the whole dataset. Thus this paper represents an engineering approach to defect prediction, i.e., an approach which is useful in practice, easily understandable and can be applied by practitioners.ICEBERG IAPP-2012-324356MICINN TIN2011-28956-C02-0

    Searching for rules to detect defective modules: A subgroup discovery approach

    Get PDF
    Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only fault-prone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases.Ministerio de Educaci贸n y Ciencia TIN2007-68084-C02-00Ministerio de Educaci贸n y Ciencia TIN2010-21715-C02-0

    Discrimination Analysis for Predicting Defect-Prone Software Modules

    Get PDF
    Software defect prediction studies usually build models without analyzing the data used in the procedure. As a result, the same approach has different performances on different data sets. In this paper, we introduce discrimination analysis for providing a good method to give insight into the inherent property of the software data. Based on the analysis, we find that the data sets used in this field have nonlinearly separable and class-imbalanced problems. Unlike the prior works, we try to exploit the kernel method to nonlinearly map the data into a high-dimensional feature space. By combating these two problems, we propose an algorithm based on kernel discrimination analysis called KDC to build more effective prediction model. Experimental results on the data sets from different organizations indicate that KDC is more accurate in terms of F-measure than the state-of-the-art methods. We are optimistic that our discrimination analysis method can guide more studies on data structure, which may derive useful knowledge from data science for building more accurate prediction models

    Enhancing Software Project Outcomes: Using Machine Learning and Open Source Data to Employ Software Project Performance Determinants

    Get PDF
    Many factors can influence the ongoing management and execution of technology projects. Some of these elements are known a priori during the project planning phase. Others require real-time data gathering and analysis throughout the lifetime of a project. These real-time project data elements are often neglected, misclassified, or otherwise misinterpreted during the project execution phase resulting in increased risk of delays, quality issues, and missed business opportunities. The overarching motivation for this research endeavor is to offer reliable improvements in software technology management and delivery. The primary purpose is to discover and analyze the impact, role, and level of influence of various project related data on the ongoing management of technology projects. The study leverages open source data regarding software performance attributes. The goal is to temper the subjectivity currently used by project managers (PMs) with quantifiable measures when assessing project execution progress. Modern-day PMs who manage software development projects are charged with an arduous task. Often, they obtain their inputs from technical leads who tend to be significantly more technical. When assessing software projects, PMs perform their role subject to the limitations of their capabilities and competencies. PMs are required to contend with the stresses of the business environment, the policies, and procedures dictated by their organizations, and resource constraints. The second purpose of this research study is to propose methods by which conventional project assessment processes can be enhanced using quantitative methods that utilize real-time project execution data. Transferability of academic research to industry application is specifically addressed vis-脿-vis a delivery framework to provide meaningful data to industry practitioners
    corecore