Search CORE

8 research outputs found

Prediction for Resolution Time of Software Defect

Author: Wang Da
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2010
Field of study

In practical software development projects, solving test issues efficiently during Software Development Life Cycle is critical to release software products on time. Different test environments, test resources and test requirements could result in different outcomes. Therefore, getting accurate prediction of the software defects\u27 resolution time could be beneficial to the practical projects.;In our study, data mining techniques offer great promise in prediction of software defects\u27 resolution time. Our research is conducted based on the NASA Metrics Data Program (MDP). We first calculate the resolution time for available projects. Using unsupervised discretization methods, we split resolution time into certain interval as response variable. Then, investigating the relationship between metric properties and time intervals, we fit a model that attempts to produce prediction on resolution time. Experiments and analysis successfully demonstrate the feasibility of our approach

The Research Repository @ WVU (West Virginia University)

Evaluating Defect Prediction using a Massive Set of Metrics

Author: David LO
TIAN Yuan
XIA Xin
XUAN Xiao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Searching for rules to detect defective modules: A subgroup discovery approach

Author: Aguilar Ruiz Jesús Salvador
Riquelme Santos José Cristóbal
Rodríguez Daniel
Ruiz Roberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only fault-prone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases.Ministerio de Educación y Ciencia TIN2007-68084-C02-00Ministerio de Educación y Ciencia TIN2010-21715-C02-0

idUS. Depósito de Investigación Universidad de Sevilla

Using Hadoop MapReduce for parallel genetic algorithms: A comparison of the global, grid and island models

Author: Ferrucci F
Salza P
Sarro F
Publication venue
Publication date: 01/12/2018
Field of study

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs’ computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider

UCL Discovery

Searching for rules to detect defective modules: A subgroup discovery approach

Author: Agrawal
Aguilar-Ruiz
Arisholm
Basili
Boetticher
Breiman
Cano
Catal
Chen
Chidamber
Clark
D. Rodríguez
Elish
Fenton
Fenton
Fernández
Friedman
Gamberger
Geng
Halstead
J.C. Riquelme
J.S. Aguilar–Ruiz
Jovanoski
Kavšek
Khoshgoftaar
Khoshgoftaar
Khoshgoftaar
Khoshgoftaar
Klösgen
Koru
Lavrač
Lessmann
Li
Liu
McCabe
Menzies
Menzies
Mitchell
Munson
Musa
Myrtveit
Ostrand
Peng
Peng
Peng
Quinlan
R. Ruiz
Shepperd
Turhan
Vandecruys
Venturini
Witten
Wrobel
Zhang
Železný
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

An Analysis of Software Defect Prediction Studies through Reproducibility and Replication

Author: Mahmood Zaheed
Publication venue
Publication date: 06/09/2018
Field of study

University of Hertfordshire Research Archive

Integrationstest : Testprozess, Testfokus und Integrationsreihenfolge

Author: Borner Lars
Publication venue
Publication date: 01/01/2009
Field of study

Im Integrationstest werden die Abhängigkeiten zwischen den Bausteinen eines Softwaresystems getestet. Die große Anzahl Abhängigkeiten heutiger Systeme stellt für die beteiligten Rollen des Integrationstests eine große Herausforderung dar. Die vorliegende Promotionsarbeit stellt neue und innovative Ansätze vor, um diese Rollen zu unterstützen. Im ersten Teil der Arbeit wird ein Testprozess definiert, der die spezifischen Eigenheiten des Integrationstests berücksichtigt. Der definierte Integrationstestprozess setzt dabei seinen Schwerpunkt auf die im Prozess zu treffenden Entscheidungen. Er beschreibt, welche Entscheidungen in welcher Reihenfolge von welcher Rolle getroffen werden und welchen Einfluss diese Entscheidungen auf weitere Entscheidungen besitzen. Im weiteren Verlauf der Arbeit werden neue Ansätze vorgestellt, die das Treffen von zwei Entscheidungen im Integrationstestprozess unterstützen: die Testfokusauswahl und die Integrationsreihenfolge. Das Testen aller Abhängigkeiten ist aufgrund der Ressourcenbeschränkungen in realen Softwareprodukten nicht möglich. Die wenigen verfügbaren Ressourcen müssen daher für das Testen der fehleranfälligen Abhängigkeiten eingesetzt werden. Für das Identifizieren der fehleranfälligen Abhängigkeiten, und somit für die Testfokusauswahl, stellt die Promotionsarbeit einen neuen Ansatz vor. Der Ansatz verwendet Informationen über die Fehleranzahl von Bausteinen und die Eigenschaften von Abhängigkeiten aus früheren Versionen der zu integrierenden Software, um statistisch signifikante Zusammenhänge zwischen den Eigenschaften und der Fehleranzahl aufzudecken. Diese Zusammenhänge werden in der aktuellen Version ausgenutzt, um den Testfokus, d.h. die zu testenden Abhängigkeiten, auszuwählen. Im Integrationstest werden Bausteine schrittweise zu einem Gesamtsystem zusammengesetzt, um die Lokalisation der Fehlerursache beim Auftreten eines Fehlers zu erleichtern. Der Nachteil dieses schrittweisen Vorgehens ist, dass Bausteine, die noch nicht integriert, aber für das Ausführen der Tests notwendig sind, simuliert werden müssen. Das Ziel ist es daher, eine Integrationsreihenfolge zu ermitteln, die einen minimalen Simulationsaufwand bedeutet. Zusätzlich sollten Abhängigkeiten, die als Testfokus ausgewählt wurden, frühzeitig integriert werden, um eventuelle Fehler frühzeitig aufzudecken. In dieser Promotionsarbeit wurde der erste Ansatz entwickelt, eine Integrationsreihenfolge zu ermitteln, die sowohl den Testfokus als auch den Simulationsaufwand berücksichtigt. Die in der Arbeit entwickelten Ansätze wurden in Fallstudien mit mehreren realistisch großen Softwaresystemen evaluiert

Heidelberger Dokumentenserver