The human gut microbiome is associated with a large number of disease
etiologies. As such, it is a natural candidate for machine learning based
biomarker development for multiple diseases and conditions. The microbiome is
often analyzed using 16S rRNA gene sequencing. However, several properties of
microbial 16S rRNA gene sequencing hinder machine learning, including
non-uniform representation, a small number of samples compared with the
dimension of each sample, and sparsity of the data, with the majority of
bacteria present in a small subset of samples. We suggest two novel methods to
combine information from different bacteria and improve data representation for
machine learning using bacterial taxonomy. iMic and gMic translate the
microbiome to images and graphs respectively, and convolutional neural networks
are then applied to the graph or image. We show that both algorithms improve
performance of static 16S rRNA gene sequence-based machine learning compared to
the best state-of-the-art methods. Furthermore, these methods ease the
interpretation of the classifiers. iMic is then extended to dynamic microbiome
samples, and an iMic explainable AI algorithm is proposed to detect bacteria
relevant to each condition.Comment: 19 pages of manuscript, 3 figures, and 4 pages of Supp. Ma