821 research outputs found
A Framework for Genetic Algorithms Based on Hadoop
Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in
many real-world applications. The sequential execution of GAs requires
considerable computational power both in time and resources. Nevertheless, GAs
are naturally parallel and accessing a parallel platform such as Cloud is easy
and cheap. Apache Hadoop is one of the common services that can be used for
parallel applications. However, using Hadoop to develop a parallel version of
GAs is not simple without facing its inner workings. Even though some
sequential frameworks for GAs already exist, there is no framework supporting
the development of GA applications that can be executed in parallel. In this
paper is described a framework for parallel GAs on the Hadoop platform,
following the paradigm of MapReduce. The main purpose of this framework is to
allow the user to focus on the aspects of GA that are specific to the problem
to be addressed, being sure that this task is going to be correctly executed on
the Cloud with a good performance. The framework has been also exploited to
develop an application for Feature Subset Selection problem. A preliminary
analysis of the performance of the developed GA application has been performed
using three datasets and shown very promising performance
Search-Based Software Engineering in the Era of Modern Software Systems
This short paper accompanies the keynote given by Federica Sarro at the 31st IEEE International Requirements Engineering Conference, Hanover, Germany, September 2023
Did You Do Your Homework? Raising Awareness on Software Fairness and Discrimination
Machine Learning is a vital part of various modern day decision making software.
At the same time, it has shown to exhibit bias, which can cause an unjust treatment of individuals and population groups. One method to achieve fairness in machine learning software is to provide individuals with the same degree of benefit, regardless of sensitive attributes (e.g., students receive the same grade, independent of their sex or race). However, there can be other attributes that one might want to discriminate against (e.g., students with homework should receive higher grades). We will call such attributes anti-protected attributes. When reducing the bias of machine learning software, one risks the loss of discriminatory behaviour of anti-protected attributes. To combat this, we use grid search to show that machine learning software can be debiased (e.g., reduce gender bias) while also improving the ability to discriminate against anti-protected attributes
The Effect of Offspring Population Size on NSGA-II: A Preliminary Study
Non-Dominated Sorting Genetic Algorithm (NSGA-II) is one of the
most popular Multi-Objective Evolutionary Algorithms (MOEA)
and has been applied to a large range of problems.
Previous studies have shown that parameter tuning can improve
NSGA-II performance. However, the tuning of the offspring population size, which guides the exploration-exploitation trade-off in
NSGA-II, has been overlooked so far. Previous work has generally
used the population size as the default offspring population size for
NSGA-II.
We therefore investigate the impact of offspring population size
on the performance of NSGA-II. We carry out an empirical study by
comparing the effectiveness of three configurations vs. the default
NSGA-II configuration on six optimization problems based on four
Pareto front quality indicators and statistical tests.
Our findings show that the performance of NSGA-II can be improved by reducing the offspring population size and in turn increasing the number of generations. This leads to similar or statistically
significant better results than those obtained by using the default
NSGA-II configuration in 92% of the experiments performed
Multi-objective software effort estimation
We introduce a bi-objective effort estimation algorithm that combines Confidence Interval Analysis and assessment of Mean Absolute Error. We evaluate our proposed algorithm on three different alternative formulations, baseline comparators and current state-of-the-art effort estimators applied to five real-world datasets from the PROMISE repository, involving 724 different software projects in total. The results reveal that our algorithm outperforms the baseline, state-of-the-art and all three alternative formulations, statistically significantly (p < 0:001) and with large effect size (A12≥ 0:9) over all five datasets. We also provide evidence that our algorithm creates a new state-of-the-art, which lies within currently claimed industrial human-expert-based thresholds, thereby demonstrating that our findings have actionable conclusions for practicing software engineers
Agile Effort Estimation: Have We Solved the Problem Yet? Insights from the Replication of the GPT2SP Study
Replication studies in Software Engineering are indispensable for ensuring the reliability, generalizability, and transparency of research findings. They contribute to the cumulative growth of knowledge in the field and promote a scientific approach that benefits both researchers and practitioners. In this article, we report our experience replicating a recently published work proposing a Transformer-based approach for Agile Story Point Estimation' dubbed GPT2SP. GPT2SP was proposed with the intent of addressing the three limitations of a previous Deep Learning-based approach dubbed Deep-SE, and the results reported in the original study set GPT2SP as the new state-of-the-art. However, when we used the GPT2SP source code made publicly available by the authors of the original study, we found a bug in the computation of the evaluation measure and the reuse of erroneous results from previous work, which had unintentionally introduced biases in the GPT2SP's performance evaluation. In this study, we report on the results we obtained after fixing the issues present in the original study, which reveal that their results were in fact unintentionally inflated due to these issues and that despite advancements, challenges remain in providing accurate effort estimations for agile software projects
Predictive analytics for software testing: Keynote paper
This keynote discusses the use of Predictive Analytics for Software Engineering, and in particular for Software Defect Prediction and Software Testing, by presenting the latest results achieved in these fields leveraging Artificial Intelligence, Search-based and Machine Learning methods, and by giving some directions for future work
Multi-Objective Software Effort Estimation: A Replication Study
Replication studies increase our confidence in previous results when the findings are similar each time, and help mature our knowledge by addressing both internal and external validity aspects. However, these studies are still rare in certain software engineering fields. In this paper, we replicate and extend a previous study, which denotes the current state-of-the-art for multi-objective software effort estimation, namely CoGEE. We investigate the original research questions with an independent implementation and the inclusion of a more robust baseline (LP4EE), carried out by the first author, who was not involved in the original study. Through this replication, we strengthen both the internal and external validity of the original study. We also answer two new research questions investigating the effectiveness of CoGEE by using four additional evolutionary algorithms (i.e., IBEA, MOCell, NSGA-III, SPEA2) and a well-known Java framework for evolutionary computation, namely JMetal (rather than the previously used R software), which allows us to strengthen the external validity of the original study. The results of our replication confirm that: (1) CoGEE outperforms both baseline and state-of-the-art benchmarks statistically significantly (p < 0:001); (2) CoGEE’s multi-objective nature makes it able to reach such a good performance; (3) CoGEE’s estimation errors lie within claimed industrial human-expert-based thresholds. Moreover, our new results show that the effectiveness of CoGEE is generally not limited to nor dependent on the choice of the multi-objective algorithm. Using CoGEE with either NSGA-II, NSGA-III, or MOCell produces human competitive results in less than a minute. The Java version of CoGEE has decreased the running time by over 99.8% with respect to its R counterpart. We have made publicly available the Java code of CoGEE to ease its adoption, as well as, the data used in this study in order to allow for future replication and extension of our work
- …