Search CORE

6 research outputs found

Machine Learning-Based Surrogate Model for Genetic Algorithm with Aggressive Mutation for Feature Selection

Author: Chevallier Marc
Clairmont Charly
Publication venue: HAL CCSD
Publication date: 24/06/2024
Field of study

International audienceThe genetic algorithm with aggressive mutations GAAM, is a specialised algorithm for feature selection. This algorithm is dedicated to the selection of a small number of features and allows the user to specify the maximum number of features desired. A major obstacle to the use of this algorithm is its high computational cost, which increases significantly with the number of dimensions to be retained. To solve this problem, we introduce a surrogate model based on machine learning, which reduces the number of evaluations of the fitness function by an average of 48% on the datasets tested, using the standard parameters specified in the original paper. Additionally, we experimentally demonstrate that eliminating the crossover step in the original algorithm does not result in any visible changes in the algorithm’s results. We also demonstrate that the original algorithm uses an artificially complex mutation method that could be replaced by a simpler method without loss of efficiency. The sum of the improvements resulted in an average reduction of 53% in the number of evaluations of the fitness functions. Finally, we have shown that these outcomes apply to parameters beyond those utilized in the initial article, while still achieving a comparable decrease in the count of evaluation function calls. Tests were conducted on 9 datasets of varying dimensions, using two different classifiers

HAL-Paris 13

Detecting Near Duplicate Dataset

Author: Boufarès Faouzi
Chevallier Marc
Clairmont Charly
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/12/2021
Field of study

International audienc

HAL-Paris 13

Detecting Near Duplicate Dataset with Machine Learning

Author: Boufarès Faouzi
Chevallier Marc
Clairmont Charly
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: Machine Intelligence Research Labs (MIR Labs)
Publication date: 01/01/2022
Field of study

International audienceThis paper introduces the concept of near duplicate dataset, a quasi-duplicate version of a dataset. This version has undergone an unknown number of row and column insertions and deletions (modifications on schema and instance). This concept is interesting for data exploration, data integration and data quality. To formalise these insertions and deletions, two parameters are introduced. Our technique for detecting these quasi-duplicate datasets is based on features extraction and machine learning. This method is original because it does not rely on classical techniques of comparisons between columns but on the comparison of metadata vectors summarising the datasets. In order to train these algorithms, we introduce a method to artificially generate training data. We perform several experiments to evaluate the best parameters to use when creating training data and the performance of several classifiers. In the studied cases, these experiments lead us to an accuracy rate higher than 95%

HAL-Paris 13

Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection

Author: Boufarès Faouzi
Chevallier Marc
Clairmont Charly
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2022
Field of study

International audienc

HAL-Paris 13

Seeding Initial Population, in Genetic Algorithm for Features Selection

Author: Boufarès Faouzi
Chevallier Marc
Clairmont Charly
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/04/2021
Field of study

International audienc

HAL-Paris 13

Near duplicate column identification: a machine learning approach

Author: Boufares Faouzi
Chevallier Marc
Clairmont Charly
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/12/2021
Field of study

International audienc

HAL-Paris 13