Search CORE

821 research outputs found

A Framework for Genetic Algorithms Based on Hadoop

Author: Ferrucci F
Kechadi M
others
Salza P
Sarro F
Publication venue
Publication date: 01/01/2013
Field of study

Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are naturally parallel and accessing a parallel platform such as Cloud is easy and cheap. Apache Hadoop is one of the common services that can be used for parallel applications. However, using Hadoop to develop a parallel version of GAs is not simple without facing its inner workings. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets and shown very promising performance

arXiv.org e-Print Archive

UCL Discovery

Archivio della Ricerca - Università di Salerno

Search-Based Software Engineering in the Era of Modern Software Systems

Author: Sarro F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

This short paper accompanies the keynote given by Federica Sarro at the 31st IEEE International Requirements Engineering Conference, Hanover, Germany, September 2023

UCL Discovery

Did You Do Your Homework? Raising Awareness on Software Fairness and Discrimination

Author: Hort M
Sarro F
Publication venue: Automated Software Engineering
Publication date: 20/11/2021
Field of study

Machine Learning is a vital part of various modern day decision making software. At the same time, it has shown to exhibit bias, which can cause an unjust treatment of individuals and population groups. One method to achieve fairness in machine learning software is to provide individuals with the same degree of benefit, regardless of sensitive attributes (e.g., students receive the same grade, independent of their sex or race). However, there can be other attributes that one might want to discriminate against (e.g., students with homework should receive higher grades). We will call such attributes anti-protected attributes. When reducing the bias of machine learning software, one risks the loss of discriminatory behaviour of anti-protected attributes. To combat this, we use grid search to show that machine learning software can be debiased (e.g., reduce gender bias) while also improving the ability to discriminate against anti-protected attributes

UCL Discovery

Guest editorial: Special section on Search-based Software Engineering track at GECCO 2018

Author: Antoniol G
Sarro F
Publication venue
Publication date: 01/02/2020
Field of study

UCL Discovery

The Effect of Offspring Population Size on NSGA-II: A Preliminary Study

Author: Hort M
Sarro F
Publication venue: Genetic and Evolutionary Computation Conference
Publication date: 14/07/2021
Field of study

Non-Dominated Sorting Genetic Algorithm (NSGA-II) is one of the most popular Multi-Objective Evolutionary Algorithms (MOEA) and has been applied to a large range of problems. Previous studies have shown that parameter tuning can improve NSGA-II performance. However, the tuning of the offspring population size, which guides the exploration-exploitation trade-off in NSGA-II, has been overlooked so far. Previous work has generally used the population size as the default offspring population size for NSGA-II. We therefore investigate the impact of offspring population size on the performance of NSGA-II. We carry out an empirical study by comparing the effectiveness of three configurations vs. the default NSGA-II configuration on six optimization problems based on four Pareto front quality indicators and statistical tests. Our findings show that the performance of NSGA-II can be improved by reducing the offspring population size and in turn increasing the number of generations. This leads to similar or statistically significant better results than those obtained by using the default NSGA-II configuration in 92% of the experiments performed

UCL Discovery

Multi-objective software effort estimation

Author: Harman M
Petrozziello A
Sarro F
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2016
Field of study

We introduce a bi-objective effort estimation algorithm that combines Confidence Interval Analysis and assessment of Mean Absolute Error. We evaluate our proposed algorithm on three different alternative formulations, baseline comparators and current state-of-the-art effort estimators applied to five real-world datasets from the PROMISE repository, involving 724 different software projects in total. The results reveal that our algorithm outperforms the baseline, state-of-the-art and all three alternative formulations, statistically significantly (p < 0:001) and with large effect size (A12≥ 0:9) over all five datasets. We also provide evidence that our algorithm creates a new state-of-the-art, which lies within currently claimed industrial human-expert-based thresholds, thereby demonstrating that our findings have actionable conclusions for practicing software engineers

UCL Discovery

Agile Effort Estimation: Have We Solved the Problem Yet? Insights from the Replication of the GPT2SP Study

Author: Moussa R
Sarro F
Tawosi V
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/07/2024
Field of study

Replication studies in Software Engineering are indispensable for ensuring the reliability, generalizability, and transparency of research findings. They contribute to the cumulative growth of knowledge in the field and promote a scientific approach that benefits both researchers and practitioners. In this article, we report our experience replicating a recently published work proposing a Transformer-based approach for Agile Story Point Estimation' dubbed GPT2SP. GPT2SP was proposed with the intent of addressing the three limitations of a previous Deep Learning-based approach dubbed Deep-SE, and the results reported in the original study set GPT2SP as the new state-of-the-art. However, when we used the GPT2SP source code made publicly available by the authors of the original study, we found a bug in the computation of the evaluation measure and the reuse of erroneous results from previous work, which had unintentionally introduced biases in the GPT2SP's performance evaluation. In this study, we report on the results we obtained after fixing the issues present in the original study, which reveal that their results were in fact unintentionally inflated due to these issues and that despite advancements, challenges remain in providing accurate effort estimations for agile software projects

UCL Discovery

Predictive analytics for software testing: Keynote paper

Author: Sarro F
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 28/05/2018
Field of study

This keynote discusses the use of Predictive Analytics for Software Engineering, and in particular for Software Defect Prediction and Software Testing, by presenting the latest results achieved in these fields leveraging Artificial Intelligence, Search-based and Machine Learning methods, and by giving some directions for future work

UCL Discovery

Multi-Objective Software Effort Estimation: A Replication Study

Author: Harman M
Petrozziello A
Sarro F
Tawosi V
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2021
Field of study

Replication studies increase our confidence in previous results when the findings are similar each time, and help mature our knowledge by addressing both internal and external validity aspects. However, these studies are still rare in certain software engineering fields. In this paper, we replicate and extend a previous study, which denotes the current state-of-the-art for multi-objective software effort estimation, namely CoGEE. We investigate the original research questions with an independent implementation and the inclusion of a more robust baseline (LP4EE), carried out by the first author, who was not involved in the original study. Through this replication, we strengthen both the internal and external validity of the original study. We also answer two new research questions investigating the effectiveness of CoGEE by using four additional evolutionary algorithms (i.e., IBEA, MOCell, NSGA-III, SPEA2) and a well-known Java framework for evolutionary computation, namely JMetal (rather than the previously used R software), which allows us to strengthen the external validity of the original study. The results of our replication confirm that: (1) CoGEE outperforms both baseline and state-of-the-art benchmarks statistically significantly (p < 0:001); (2) CoGEE’s multi-objective nature makes it able to reach such a good performance; (3) CoGEE’s estimation errors lie within claimed industrial human-expert-based thresholds. Moreover, our new results show that the effectiveness of CoGEE is generally not limited to nor dependent on the choice of the multi-objective algorithm. Using CoGEE with either NSGA-II, NSGA-III, or MOCell produces human competitive results in less than a minute. The Java version of CoGEE has decreased the running time by over 99.8% with respect to its R counterpart. We have made publicly available the Java code of CoGEE to ease its adoption, as well as, the data used in this study in order to allow for future replication and extension of our work

UCL Discovery