5,398 research outputs found

    Software Effort Estimation Accuracy Prediction of Machine Learning Techniques: A Systematic Performance Evaluation

    Full text link
    Software effort estimation accuracy is a key factor in effective planning, controlling and to deliver a successful software project within budget and schedule. The overestimation and underestimation both are the key challenges for future software development, henceforth there is a continuous need for accuracy in software effort estimation (SEE). The researchers and practitioners are striving to identify which machine learning estimation technique gives more accurate results based on evaluation measures, datasets and the other relevant attributes. The authors of related research are generally not aware of previously published results of machine learning effort estimation techniques. The main aim of this study is to assist the researchers to know which machine learning technique yields the promising effort estimation accuracy prediction in the software development. In this paper, the performance of the machine learning ensemble technique is investigated with the solo technique based on two most commonly used accuracy evaluation metrics. We used the systematic literature review methodology proposed by Kitchenham and Charters. This includes searching for the most relevant papers, applying quality assessment criteria, extracting data and drawing results. We have evaluated a state-of-the-art accuracy performance of 28 selected studies (14 ensemble, 14 solo) using Mean Magnitude of Relative Error (MMRE) and PRED (25) as a set of reliable accuracy metrics for performance evaluation of accuracy among two techniques to report the research questions stated in this study. We found that machine learning techniques are the most frequently implemented in the construction of ensemble effort estimation (EEE) techniques. The results of this study revealed that the EEE techniques usually yield a promising estimation accuracy than the solo techniques.Comment: Pages: 27 Figures: 15 Tables:

    Evaluating an automated procedure of machine learning parameter tuning for software effort estimation

    Get PDF
    Software effort estimation requires accurate prediction models. Machine learning algorithms have been used to create more accurate estimation models. However, these algorithms are sensitive to factors such as the choice of hyper-parameters. To reduce this sensitivity, automated approaches for hyper-parameter tuning have been recently investigated. There is a need for further research on the effectiveness of such approaches in the context of software effort estimation. These evaluations could help understand which hyper-parameter settings can be adjusted to improve model accuracy, and in which specific contexts tuning can benefit model performance. The goal of this work is to develop an automated procedure for machine learning hyper-parameter tuning in the context of software effort estimation. The automated procedure builds and evaluates software effort estimation models to determine the most accurate evaluation schemes. The methodology followed in this work consists of first performing a systematic mapping study to characterize existing hyper-parameter tuning approaches in software effort estimation, developing the procedure to automate the evaluation of hyper-parameter tuning, and conducting controlled quasi experiments to evaluate the automated procedure. From the systematic literature mapping we discovered that effort estimation literature has favored the use of grid search. The results we obtained in our quasi experiments demonstrated that fast, less exhaustive tuners were viable in place of grid search. These results indicate that randomly evaluating 60 hyper-parameters can be as good as grid search, and that multiple state-of-the-art tuners were only more effective than this random search in 6% of the evaluated dataset-model combinations. We endorse random search, genetic algorithms, flash, differential evolution, and tabu and harmony search as effective tuners.Los algoritmos de aprendizaje automático han sido utilizados para crear modelos con mayor precisión para la estimación del esfuerzo del desarrollo de software. Sin embargo, estos algoritmos son sensibles a factores, incluyendo la selección de hiper parámetros. Para reducir esto, se han investigado recientemente algoritmos de ajuste automático de hiper parámetros. Es necesario evaluar la efectividad de estos algoritmos en el contexto de estimación de esfuerzo. Estas evaluaciones podrían ayudar a entender qué hiper parámetros se pueden ajustar para mejorar los modelos, y en qué contextos esto ayuda el rendimiento de los modelos. El objetivo de este trabajo es desarrollar un procedimiento automatizado para el ajuste de hiper parámetros para algoritmos de aprendizaje automático aplicados a la estimación de esfuerzo del desarrollo de software. La metodología seguida en este trabajo consta de realizar un estudio de mapeo sistemático para caracterizar los algoritmos de ajuste existentes, desarrollar el procedimiento automatizado, y conducir cuasi experimentos controlados para evaluar este procedimiento. Mediante el mapeo sistemático descubrimos que la literatura en estimación de esfuerzo ha favorecido el uso de la búsqueda en cuadrícula. Los resultados obtenidos en nuestros cuasi experimentos demostraron que algoritmos de estimación no-exhaustivos son viables para la estimación de esfuerzo. Estos resultados indican que evaluar aleatoriamente 60 hiper parámetros puede ser tan efectivo como la búsqueda en cuadrícula, y que muchos de los métodos usados en el estado del arte son solo más efectivos que esta búsqueda aleatoria en 6% de los escenarios. Recomendamos el uso de la búsqueda aleatoria, algoritmos genéticos y similares, y la búsqueda tabú y harmónica.Escuela de Ciencias de la Computación e InformáticaCentro de Investigaciones en Tecnologías de la Información y ComunicaciónUCR::Vicerrectoría de Investigación::Sistema de Estudios de Posgrado::Ingeniería::Maestría Académica en Computación e Informátic

    Open Hybrid Model: A New Ensemble Model for Software Development Cost Estimation

    Get PDF
    Given various features of a software project, it may face different administrative challenges requiring right decisions by software project managers. A major challenge is to estimate software development cost for which different methods have been proposed by many researchers. According to the literature, the capability of a proposed model or method is demonstrated in a specific set of software projects. Hence, the aim of this study is to present a model to take advantage of the capabilities of various software development cost estimation models and methods simultaneously. For this purpose, a new model called "open hybrid model" was proposed based on the firefly algorithm. The proposed model includes an extensible bank of estimation methods. The model also includes an extensible bank of rules to describe the relation between existing methods. Considering project conditions, the proposed model tries to find the best rule for combining estimation methods in the methods bank. Three datasets of real projects were used to evaluate the precision of the proposed model, and the results were compared with those of other 11 methods. The results were compared based on performance parmeters widely used to show the accuracy and stability of estimation models. According to the results, the open hybrid model was able to select the most appropriate methods present in the methods bank

    An Exploratory Survey of Phase-wise Project Cost Estimation Techniques

    Get PDF
    This article explores a number of existing project cost estimation techniques to investigate how the estimation can be done in a more accurate and effective manner. The survey looks into various estimation models that utilize many theoretical techniques such as statistics, fuzzy logic, case-based reasoning, analogies, and neural networks. As the essence of conventional estimation inaccuracy lies in life cycle cost drivers that are unsuitable to be applied across the project life cycle, this study introduces a phase-wise estimation technique that posits some overhead and latency costs. Performance evaluation methods of the underlying phase-wise principle are also presented. Contributions of this phase-wise approach will improve the estimation accuracy owing to less latency cost and increase the project visibility which in turn helps the project managers better scrutinize and administer project activities

    A Principled Methodology: A Dozen Principles of Software Effort Estimation

    Get PDF
    Software effort estimation (SEE) is the activity of estimating the total effort required to complete a software project. Correctly estimating the effort required for a software project is of vital importance for the competitiveness of the organizations. Both under- and over-estimation leads to undesirable consequences for the organizations. Under-estimation may result in overruns in budget and schedule, which in return may cause the cancellation of projects; thereby, wasting the entire effort spent until that point. Over-estimation may cause promising projects not to be funded; hence, harming the organizational competitiveness.;Due to the significant role of SEE for software organizations, there is a considerable research effort invested in SEE. Thanks to the accumulation of decades of prior research, today we are able to identify the core issues and search for the right principles to tackle pressing questions. For example, regardless of decades of work, we still lack concrete answers to important questions such as: What is the best SEE method? The introduced estimation methods make use of local data, however not all the companies have their own data, so: How can we handle the lack of local data? Common SEE methods take size attributes for granted, yet size attributes are costly and the practitioners place very little trust in them. Hence, we ask: How can we avoid the use of size attributes? Collection of data, particularly dependent variable information (i.e. effort values) is costly: How can find an essential subset of the SEE data sets? Finally, studies make use of sampling methods to justify a new method\u27s performance on SEE data sets. Yet, trade-off among different variants is ignored: How should we choose sampling methods for SEE experiments? ;This thesis is a rigorous investigation towards identification and tackling of the pressing issues in SEE. Our findings rely on extensive experimentation performed with a large corpus of estimation techniques on a large set of public and proprietary data sets. We summarize our findings and industrial experience in the form of 12 principles: 1) Know your domain 2) Let the Experts Talk 3) Suspect your data 4) Data Collection is Cyclic 5) Use a Ranking Stability Indicator 6) Assemble Superior Methods 7) Weighting Analogies is Over-elaboration 8) Use Easy-path Design 9) Use Relevancy Filtering 10) Use Outlier Pruning 11) Combine Outlier and Synonym Pruning 12) Be Aware of Sampling Method Trade-off

    An Intelligent Framework for Estimating Software Development Projects using Machine Learning

    Get PDF
    The IT industry has faced many challenges related to software effort and cost estimation. A cost assessment is conducted after software effort estimation, which benefits customers as well as developers. The purpose of this paper is to discuss various methods for the estimation of software effort and cost in the context of software engineering, such as algorithmic methods, expert judgment methods, analogy-based estimation methods, and machine learning methods, as well as their different aspects. In spite of this, estimation of the effort involved in software development are subject to uncertainty. Several methods have been developed in the literature for improving estimation accuracy, many of which involve the use of machine learning techniques. A machine learning framework is proposed in this paper to address this challenging problem. In addition to being completely independent of algorithmic models and estimation problems, this framework also features a modular architecture. It has high interpretability, learning capability, and robustness to imprecise and uncertain inputs
    corecore