Image and graph convolution networks improve microbiome-based machine learning accuracy

Abstract

The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine learning based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing. However, several properties of microbial 16S rRNA gene sequencing hinder machine learning, including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of bacteria present in a small subset of samples. We suggest two novel methods to combine information from different bacteria and improve data representation for machine learning using bacterial taxonomy. iMic and gMic translate the microbiome to images and graphs respectively, and convolutional neural networks are then applied to the graph or image. We show that both algorithms improve performance of static 16S rRNA gene sequence-based machine learning compared to the best state-of-the-art methods. Furthermore, these methods ease the interpretation of the classifiers. iMic is then extended to dynamic microbiome samples, and an iMic explainable AI algorithm is proposed to detect bacteria relevant to each condition.Comment: 19 pages of manuscript, 3 figures, and 4 pages of Supp. Ma

    Similar works

    Full text

    thumbnail-image

    Available Versions