1 research outputs found
Fermi LAT AGN classification using supervised machine learning
Classifying Active Galactic Nuclei (AGN) is a challenge, especially for BL
Lac Objects (BLLs), which are identified by their weak emission line spectra.
To address the problem of classification, we use data from the 4th Fermi
Catalog, Data Release 3. Missing data hinders the use of machine learning to
classify AGN. A previous paper found that Multiple Imputation by Chain
Equations (MICE) imputation is useful for estimating missing values. Since many
AGN have missing redshift and the highest energy, we use data imputation with
MICE and K-nearest neighbor (kNN) algorithm to fill in these missing variables.
Then, we classify AGN into the BLLs or the Flat Spectrum Radio Quasars (FSRQs)
using the SuperLearner, an ensemble method that includes several classification
algorithms like logistic regression, support vector classifiers, Random
Forests, Ranger Random Forests, multivariate adaptive regression spline (MARS),
Bayesian regression, Extreme Gradient Boosting. We find that a SuperLearner
model using MARS regression and Random Forests algorithms is 91.1% accurate for
kNN imputed data and 91.2% for MICE imputed data. Furthermore, the kNN-imputed
SuperLearner model predicts that 892 of the 1519 unclassified blazars are BLLs
and 627 are Flat Spectrum Radio Quasars (FSRQs), while the MICE-imputed
SuperLearner model predicts 890 BLLs and 629 FSRQs in the unclassified set.
Thus, we can conclude that both imputation methods work efficiently and with
high accuracy and that our methodology ushers the way for using SuperLearner as
a novel classification method in the AGN community and, in general, in the
astrophysics community.Comment: 15 pages, 8 figures, to be published in Monthly Notices of the Royal
Astronomical Societ