8 research outputs found

    Application of ensemble techniques in predicting object-oriented software maintainability

    Get PDF
    While prior object-oriented software maintainability literature acknowledges the role of machine learning techniques as valuable predictors of potential change, the most suitable technique that achieves consistently high accuracy remains undetermined. With the objective of obtaining more consistent results, an ensemble technique is investigated to advance the performance of the individual models and increase their accuracy in predicting software maintainability of the object-oriented system. This paper describes the research plan for predicting object-oriented software maintainability using ensemble techniques. First, we present a brief overview of the main research background and its different components. Second, we explain the research methodology. Third, we provide expected results. Finally, we conclude summary of the current status

    Improving Defect Prediction Models by Combining Classifiers Predicting Different Defects

    Get PDF
    Background: The software industry spends a lot of money on finding and fixing defects. It utilises software defect prediction models to identify code that is likely to be defective. Prediction models have, however, reached a performance bottleneck. Any improvements to prediction models would likely yield less defects-reducing costs for companies. Aim: In this dissertation I demonstrate that different families of classifiers find distinct subsets of defects. I show how this finding can be utilised to design ensemble models which outperform other state-of-the-art software defect prediction models. Method: This dissertation is supported by published work. In the first paper I explore the quality of data which is a prerequisite for building reliable software defect prediction models. The second and third papers explore the ability of different software defect prediction models to find distinct subsets of defects. The fourth paper explores how software defect prediction models can be improved by combining a collection of classifiers that predict different defective components into ensembles. An additional, non-published work, presents a visual technique for the analysis of predictions made by individual classifiers and discusses some possible constraints for classifiers used in software defect prediction. Result: Software defect prediction models created by classifiers of different families predict distinct subsets of defects. Ensembles composed of classifiers belonging to different families outperform other ensemble and standalone models. Only a few highly diverse and accurate base models are needed to compose an effective ensemble. This ensemble can consistently predict a greater number of defects compared to the increase in incorrect predictions. Conclusion: Ensembles should not use the majority-voting techniques to combine decisions of classifiers in software defect prediction as this will miss correct predictions of classifiers which uniquely identify defects. Some classifiers could be less successful for software defect prediction due to complex decision boundaries of defect data. Stacking based ensembles can outperform other ensemble and stand-alone techniques. I propose new possible avenues of research that could further improve the modelling of ensembles in software defect prediction. Data quality should be explicitly considered prior to experiments for researchers to establish reliable results

    Use and misuse of the term "Experiment" in mining software repositories research

    Get PDF
    The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.This work has been partially supported by the Spanish project: MCI PID2020-117191RB-I00.Peer ReviewedPostprint (author's final draft

    A Systematic Mapping Study of Empirical Studies Performed with Collections of Software Projects

    Get PDF
    Contexto: los proyectos software son insumos comunes en los experimentos de la Ingeniería del Software, aunque estos muchas veces sean seleccionados sin seguir una estrategia específica, lo cual disminuye la representatividad y replicación de los resultados. Una opción es el uso de colecciones preservadas de proyectos software, pero estas deben ser vigentes y con reglas explícitas que aseguren su actualización a lo largo del tiempo. Objetivo: realizar un estudio secundario sistematizado sobre las estrategias de selección de los proyectos software en estudios empíricos para conocer las reglas tenidas en cuenta, el grado de uso de colecciones de proyectos, los metadatos extraídos y los análisis estadísticos posteriores realizados. Método: se utilizó un mapeo sistemático para identificar estudios publicados desde enero de 2013 a diciembre de 2020. Resultados: se identificaron 122 estudios de los cuales el 72% utilizó reglas propias para la selección de proyectos y un 27% usó colecciones de proyectos existentes. Asimismo, no se encontraron evidencias de un marco estandarizado para la selección de proyectos, ni la aplicación de métodos estadísticos que se relacionen con la estrategia de recolección de las muestras.Context: software projects are commonresources in Software Engineering experiments,although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarante etheir updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.Fil: Carruthers, Juan Andrés. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: Diaz Pace, Jorge Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Irrazábal, Emanuel Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentin

    A Systematic Mapping Study of Empirical Studies Performed with Collections of Software Projects

    Get PDF
    Contexto: los proyectos software son insumos comunes en los experimentos de la Ingeniería del Software, aunque estos muchas veces sean seleccionados sin seguir una estrategia específica, lo cual disminuye la representatividad y replicación de los resultados. Una opción es el uso de colecciones preservadas de proyectos software, pero estas deben ser vigentes y con reglas explícitas que aseguren su actualización a lo largo del tiempo. Objetivo: realizar un estudio secundario sistematizado sobre las estrategias de selección de los proyectos software en estudios empíricos para conocer las reglas tenidas en cuenta, el grado de uso de colecciones de proyectos, los metadatos extraídos y los análisis estadísticos posteriores realizados. Método: se utilizó un mapeo sistemático para identificar estudios publicados desde enero de 2013 a diciembre de 2020. Resultados: se identificaron 122 estudios de los cuales el 72% utilizó reglas propias para la selección de proyectos y un 27% usó colecciones de proyectos existentes. Asimismo, no se encontraron evidencias de un marco estandarizado para la selección de proyectos, ni la aplicación de métodos estadísticos que se relacionen con la estrategia de recolección de las muestras.Context: software projects are commonresources in Software Engineering experiments,although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarante etheir updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.Fil: Carruthers, Juan Andrés. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: Diaz Pace, Jorge Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Irrazábal, Emanuel Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentin

    Fault Driven Supervised Tie Breaking for Test Case Prioritization

    Get PDF
    Regression test suites are an excellent tool to validate the existing functionality of an application during the development process. However, they can be large and time consuming to execute, thus making them inefficient in finding faults. Test Case Prioritization is an area of study that looks to improve the fault detection rates of these test suites by re-ordering execution sequence of the test cases. It attempts to execute the test cases that have the highest probability of detecting faults first. Most prioritization techniques base their decisions on the coverage information gathered from running the test cases. These coverage-based techniques however have a high probability of encountering ties in coverages between two or more test cases. Most studies employ a random selection to break these ties despite it being considered a lower bound method. This thesis designs and develops a framework to supervise the tie breaking in coverage based Test Case Prioritization using fault predictor models. Fault predictor models can assist in identifying the modules in the application that are most prone to containing faults. By selecting test cases that cover modules most prone to faults, the fault detection rate of the test cases can be improved. A fault prediction framework is also introduced in this thesis that supervises the tie breaking for coverage-based techniques. The framework employs an ensemble learner that aggregates results from multiple predictors. To date, no single predictor has been found that can perform consistently on all datasets. Numerous predictors have also required expert knowledge to make them performant. An ensemble learner is a reliable technique to mitigate the problems and bias faced by single predictors and disregard results from poorly performing predictors. In order to evaluate the supervised tie breaking, empirical studies were conducted on two large scale applications, Cassandra and Tomcat. As part of the evaluation, real faults that existed in the application during development were used instead of hand seeded faults or mutation faults as used by many other studies. The data used for fault prediction were also not groomed or marked by experts, unlike other studies. Results from the studies showed significant improvements in the fault detection rates for both case studies when using the fault driven supervision for tie breaking

    Building an Ensemble for Software Defect Prediction Based on Diversity Selection

    Get PDF
    Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements
    corecore