25 research outputs found
Classifying Unidentified X-ray Sources in the Chandra Source Catalog Using a Multiwavelength Machine-learning Approach
The rapid increase in serendipitous X-ray source detections requires the
development of novel approaches to efficiently explore the nature of X-ray
sources. If even a fraction of these sources could be reliably classified, it
would enable population studies for various astrophysical source types on a
much larger scale than currently possible. Classification of large numbers of
sources from multiple classes characterized by multiple properties (features)
must be done automatically and supervised machine learning (ML) seems to
provide the only feasible approach. We perform classification of Chandra Source
Catalog version 2.0 (CSCv2) sources to explore the potential of the ML approach
and identify various biases, limitations, and bottlenecks that present
themselves in these kinds of studies. We establish the framework and present a
flexible and expandable Python pipeline, which can be used and improved by
others. We also release the training data set of 2941 X-ray sources with
confidently established classes. In addition to providing probabilistic
classifications of 66,369 CSCv2 sources (21% of the entire CSCv2 catalog), we
perform several narrower-focused case studies (high-mass X-ray binary
candidates and X-ray sources within the extent of the H.E.S.S. TeV sources) to
demonstrate some possible applications of our ML approach. We also discuss
future possible modifications of the presented pipeline, which are expected to
lead to substantial improvements in classification confidences.Comment: Published in ApJ 941, 104 (2022). The data and software used in this
paper are available at https://github.com/huiyang-astro/MUWCLASS_CSCv2 and
also on Vizier https://cdsarc.cds.unistra.fr/viz-bin/cat/J/ApJ/941/104. The
interactive training dataset visualization is available at
https://home.gwu.edu/~kargaltsev/XCLASS