Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys

Carry, B.; Klimczak, H.; Kotlowski, W.; Kryszczynska, A.; Oszkiewicz, Dagmara; Penttilä, Antti; Wilawer, E.

Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys

Authors: B. Carry
H. Klimczak
W. Kotlowski
A. Kryszczynska
Dagmara Oszkiewicz
Antti Penttilä
E. Wilawer
Publication date: 31 October 2022
Publisher
Doi

Abstract

Context. Multifilter photometry from large sky surveys is commonly used to assign asteroid taxonomic types and study various problems in planetary science. To maximize the science output of those surveys, it is important to use methods that best link the spectro-photometric measurements to asteroid taxonomy. Aims. We aim to determine which machine learning methods are the most suitable for the taxonomic classification for various sky surveys. Methods. We utilized five machine learning supervised classifiers: logistic regression, naive Bayes, support vector machines (SVMs), gradient boosting, and MultiLayer Perceptrons (MLPs). Those methods were found to reproduce the Bus-DeMeo taxonomy at various rates depending on the set of filters used by each survey. We report several evaluation metrics for a comprehensive comparison (prediction accuracy, balanced accuracy, F1 score, and the Matthews correlation coefficient) for 11 surveys and space missions. Results. Among the methods analyzed, multilayer perception and gradient boosting achieved the highest accuracy and naive Bayes achieved the lowest accuracy in taxonomic prediction across all surveys. We found that selecting the right machine learning algorithm can improve the success rate by a factor of >2. The best balanced accuracy (similar to 85% for a taxonomic type prediction) was found for the Visible and Infrared Survey telescope for Astronomy (VISTA) and the ESA Euclid mission surveys where broadband filters best map the 1 mu m and 2 mu m olivine and pyroxene absorption bands. Conclusions. To achieve the highest accuracy in the taxonomic type prediction based on multifilter photometric measurements, we recommend the use of gradient boosting and MLP optimized for each survey. This can improve the overall success rate even when compared with naive Bayes. A merger of different datasets can further boost the prediction accuracy. For the combination of the Legacy Survey of Space and Time and VISTA survey, we achieved 90% for the taxonomic type prediction.Peer reviewe