At present, the field of astronomical machine learning lacks widely-used
benchmarking datasets; most research employs custom-made datasets which are
often not publicly released, making comparisons between models difficult. In
this paper we present CRUMB, a publicly-available image dataset of
Fanaroff-Riley galaxies constructed from four "parent" datasets extant in the
literature. In addition to providing the largest image dataset of these
galaxies, CRUMB uses a two-tier labelling system: a "basic" label for
classification and a "complete" label which provides the original class labels
used in the four parent datasets, allowing for disagreements in an image's
class between different datasets to be preserved and selective access to
sources from any desired combination of the parent datasets.Comment: Accepted in Machine Learning and the Physical Sciences Workshop at
NeurIPS 2023; 6 pages, 1 figure, 1 tabl