Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach

Abstract

Selection of a representative set of features is still a crucial and challeng- ing problem in machine learning. The complexity of the problem increases when any of the following situations occur: a very large number of at- tributes (large dimensionality); a very small number of instances or time points (small-instance set). The rst situation poses problems for machine learning algorithm as the search space for selecting a combination of relevant features becomes impossible to explore in a reasonable time and with rea- sonable computational resources. The second aspect poses the problem of having insu cient data to learn from (insu cient examples). In this work, we approach both these issues at the same time. The methods we proposed are heuristics inspired from nature (in particular, from biology). We pro- pose a hybrid of two methods which has the advantage of providing a good learning from fewer examples and a fair selection of features from a really large set, all these while ensuring a high standard classi cation accuracy of the data. The methods used are antlion optimization (ALO), grey wolf opti- mization (GWO), and a combination of the two (ALO-GWO). We test their performance on datasets having almost 50,000 features and less than 200 instances. The results look promising while compared with other methods such as genetic algorithms (GA) and particle swarm optimization (PSO)

    Similar works