31 research outputs found

    MPLasso: Inferring microbial association networks using prior microbial knowledge

    No full text
    <div><p>Due to the recent advances in high-throughput sequencing technologies, it becomes possible to directly analyze microbial communities in human body and environment. To understand how microbial communities adapt, develop, and interact with the human body and the surrounding environment, one of the fundamental challenges is to infer the interactions among different microbes. However, due to the compositional and high-dimensional nature of microbial data, statistical inference cannot offer reliable results. Consequently, new approaches that can accurately and robustly estimate the associations (putative interactions) among microbes are needed to analyze such compositional and high-dimensional data. We propose a novel framework called Microbial Prior Lasso (MPLasso) which integrates graph learning algorithm with microbial co-occurrences and associations obtained from scientific literature by using automated text mining. We show that MPLasso outperforms existing models in terms of accuracy, microbial network recovery rate, and reproducibility. Furthermore, the association networks we obtain from the Human Microbiome Project datasets show credible results when compared against laboratory data.</p></div

    Association network visualization of top degree nodes at different human body sites for different data types.

    No full text
    <p>The same node colors represent the communities nodes belong to. For BucMuc: (a) HMASM, (b) HMMCP, and (c) HMQCP. For SupPla: (d) HMASM, (e) HMMCP, and (f) HMQCP. For TonDor: (g) HMASM, (h) HMMCP, and (i) HMQCP. As can be seen from species level data (HMASM), phylogenetically related OTUs fall in the same community. Node size represents the relative node degree within the association network with counterclockwise layout. The color of the edges is the same as the node color and does not have any special meaning. Abbreviations: BucMuc: Buccal mucosa, SupPla: Supragingival plague, TonDor: Tongue dorsum.</p

    Our proposed framework of inferring microbial association network.

    No full text
    <p>We conduct two different sets of experiments, namely, synthetic and real data. For the synthetic experiment, we generate data based on different graph structures and evaluate the performance of our proposed algorithm by using three performance metrics (i.e., <i>L</i><sub>1</sub>, ACC, and AUPR). For the real data experiments, the prior information is obtained through automated text-mining. Since there is no “gold standard” network to evaluate performance, we evaluate the reproducibility of inferred networks instead.</p

    The performance of different amount of prior information on three different graph structures.

    No full text
    <p>(a) <i>L</i><sub>1</sub> distance (b) ACC (c) AUPR.</p

    Reproducibility for MPLasso, SPIEC (gl), and CCLasso at different body sites of different types of HMP datasets.

    No full text
    <p>Reproducibility for MPLasso, SPIEC (gl), and CCLasso at different body sites of different types of HMP datasets.</p

    Runtime and perplexity comparisons among deep-learning methods using the Twitter dataset.

    No full text
    <p>Baseline: the convolutional autoencoder [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0118309#pone.0118309.ref010" target="_blank">10</a>]; LS: RCBM with stepsizes determined by line-search; Proposed: RCBM with stepsizes determined by <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0118309#pone.0118309.e021" target="_blank">Equation 9</a>. On the left panel, the dashed and dotted lines mark the runtimes of one and three days, respectively.</p

    Runtime and perplexity comparisons against non-deep-learning methods using the Twitter dataset.

    No full text
    <p>SSM: State Space Model; SC: Sparse Coding; RCBM-FS: the proposed method. On the left panel, the dashed line marks the runtime of one day.</p

    Forecasting error (via RMSE) of various models and features using the Twitter dataset.

    No full text
    <p>The models used include VARMA and SVR, whereas the features used include the raw social dynamics (Baseline), level-1 activation vectors (H1), level-2 activation vectors (H2), and both levels of activation vectors (H1 + H2) of a 2-level RCBM. The forecasting accuracy using a 1-level RCBM with a doubled number of filters is denoted as (H1<sup>2</sup>).</p

    Comparison of our proposed MPLasso and graphical Lasso (GLasso) on inferring the same compositional data in a small example.

    No full text
    <p>(a) The edges of the true network are shown with red lines. (b) The entities of the compositional data matrix shown with denser colors represent higher values (c) Given the prior network where blue and black edges are correct and wrong information, respectively, the MPLasso can still accurately estimate the graph structure with one missing edge and only one wrongly estimated edge (black edge). (d) GLasso wrongly estimates several edges along with missing edges.</p

    CBM’s generation process.

    No full text
    <p><i>W</i>: filter matrices; <i>h</i>: activation vectors; <i>X</i>: social dynamic. The filters <i>W</i><sub>1</sub> and <i>W</i><sub>2</sub> are activated differently depending on their corresponding activation vectors <i>h</i><sub>1</sub> and <i>h</i><sub>2</sub>, respectively.</p
    corecore