Search CORE

3 research outputs found

A Systematic Mapping Study of Empirical Studies Performed with Collections of Software Projects

Author: Carruthers Juan Andrés
Diaz Pace Jorge Andres
Irrazábal Emanuel Agustín
Publication venue: Instituto Politécnico Nacional. Centro de Investigacion en Computación
Publication date: 01/12/2022
Field of study

Contexto: los proyectos software son insumos comunes en los experimentos de la Ingeniería del Software, aunque estos muchas veces sean seleccionados sin seguir una estrategia específica, lo cual disminuye la representatividad y replicación de los resultados. Una opción es el uso de colecciones preservadas de proyectos software, pero estas deben ser vigentes y con reglas explícitas que aseguren su actualización a lo largo del tiempo. Objetivo: realizar un estudio secundario sistematizado sobre las estrategias de selección de los proyectos software en estudios empíricos para conocer las reglas tenidas en cuenta, el grado de uso de colecciones de proyectos, los metadatos extraídos y los análisis estadísticos posteriores realizados. Método: se utilizó un mapeo sistemático para identificar estudios publicados desde enero de 2013 a diciembre de 2020. Resultados: se identificaron 122 estudios de los cuales el 72% utilizó reglas propias para la selección de proyectos y un 27% usó colecciones de proyectos existentes. Asimismo, no se encontraron evidencias de un marco estandarizado para la selección de proyectos, ni la aplicación de métodos estadísticos que se relacionen con la estrategia de recolección de las muestras.Context: software projects are commonresources in Software Engineering experiments,although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarante etheir updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.Fil: Carruthers, Juan Andrés. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: Diaz Pace, Jorge Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Irrazábal, Emanuel Agustín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina. Universidad Nacional del Nordeste. Facultad de Cs.exactas Naturales y Agrimensura. Departamento de Informatica; Argentin

CONICET Digital

Identifying Thresholds for Software Faultiness via Optimistic and Pessimistic Estimations

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/09/2016
Field of study

2noBackground. When estimating whether a software module is faulty based on the value of a measure X for a software internal attribute (e.g., size, structural complexity, cohesion, coupling), it is sensible to set a threshold on fault-proneness first and then induce a threshold on X by using a fault-proneness model where X plays the role of independent variable. However, some modules cannot be estimated as either faulty or non-faulty with confidence: they belong to a “grey zone” and estimating them as either would be quite aleatory and may result in several erroneous decisions. Objective. We propose and evaluate an approach to setting thresholds on X to identify which modules can be confidently estimated faulty or non-faulty, and which ones cannot be estimated either way. Method. Suppose that we do not know if the modules belonging to a subset of a set of modules are faulty or not, as happens in practical cases with the modules whose faultiness needs to be estimated. We build two fault-proneness models by using the set of modules as the training set. The “pessimistic” model is built by assuming that all modules whose faultiness is unknown are actually faulty and the “optimistic” model by assuming that they are actually non-faulty. The optimistic and pessimistic models can be used to set two thresholds, an optimistic and a pessimistic one. A module is estimated faulty by the optimistic (resp., pessimistic) model with optimistic (resp., pessimistic) threshold if its fault-proneness is above the threshold, and non-faulty otherwise. A module that is estimated faulty (resp., non-faulty) by both the optimistic model with optimistic threshold and the pessimistic model with the pessimistic threshold is esti- mated faulty (resp., non-faulty). Modules for which the estimates of the two models with associated thresholds conflict, are in the “grey zone,” i.e., no reliable faultiness estimation can be made for them. Results. We applied our approach to datasets from the PROMISE repository, we carried out cross-validations, and we assessed accuracy via commonly used indicators. We also compared our results with those obtained with the conventional approach that uses one Binary Logistic Regression model. Our results show that our approach is effective in identifying the grey zone of values of X in which modules cannot be reliably estimated as either faulty or non-faulty and, conversely, the intervals in which modules can be estimated faulty or non-faulty. Our approach turns out to be more accurate, in terms of F-measure, than the conventional one in the majority of cases. In addition, it provides F-measure values that are very concentrated, i.e., it consistently identifies the intervals in which modules can be estimated faulty or non-faulty. Conclusions. Our method can be practically used for identifying “grey zones” in which it does not make much sense to estimate modules’ faultiness based on measure X and, therefore, the zones in which modules’ faultiness can be estimated with confidence.reservedLavazza, Luigi; Morasca, SandroLavazza, LUIGI ANTONIO; Morasca, Sandr

Archivio istituzionale della ricerca - Università dell'Insubria

Recent researches on social sciences

Author: Arslan Hasan
Dorczak Roman
Musialik Rafał
Publication venue: Jagiellonian University Institute of Public Affairs
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository