Scaling Genetic Algorithms to Large Distributed Datasets

Alterkawi, Laila

Scaling Genetic Algorithms to Large Distributed Datasets

Authors: Laila Alterkawi
Publication date: 27 September 2022
Publisher
Doi

Abstract

Analysing large-scale data brings promises of new levels of scientiﬁc discovery and economic value. However, the fact that such a volume of data is by its nature distributed and the need for new computational methods to be eﬀective in the face of signiﬁcant changes in data complexity and size has led to the need to develop large-scale data analytics. Genetic algorithms (GAs) have proven their ﬂexibility in many application areas, and substantial research has been dedicated to improving their performance through parallelisation. In contrast with most previous eﬀorts, we reject approaches based on the centralisation of data in the main memory of a single node or requiring remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines. In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models. We adopt the two models to distribute BioHEL, a popular large-scale single-node GA classiﬁer, using the Spark distributed data processing platform. We investigate the eﬀect of GA control parameters (population size and migration frequency). We study the accuracy, time performance and scalability of the proposed models. Our results show that our distributed genetic algorithm design provides a good tradeoﬀ between accuracy and time

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Kent Academic Repository

oai:kar.kent.ac.uk:97569

Last time updated on 27/10/2022