Search CORE

54 research outputs found

Comparison of the numbers of proteins in top 20 eggNOG families receiving the most hits of proteins identified in the SD3 sample by the graph-centric approach (Graph2Pro, blue) and the conventional approach (FragGeneScan, red).

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Comparison of the numbers of proteins in top 20 eggNOG families receiving the most hits of proteins identified in the SD3 sample by the graph-centric approach (Graph2Pro, blue) and the conventional approach (FragGeneScan, red).</p

The Francis Crick Institute

Summary of the assemblies for three data sets used in the benchmarking experiments.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Summary of the assemblies for three data sets used in the benchmarking experiments.</p

The Francis Crick Institute

Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/01/2013
Field of study

<div>Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at <a href="http://omics.informatics.indiana.edu/mg/MetaNetSam/" target="_blank">http://omics.informatics.indiana.edu/mg/MetaNetSam/</a>. </div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/12/2016
Field of study

<div>Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at <a href="https://github.com/COL-IU/Graph2Pro" target="_blank">https://github.com/COL-IU/Graph2Pro</a>.</div

Directory of Open Access Journals

The Francis Crick Institute

Network of reactions that are different in the two environments.

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Each vertex represents a reaction. An edge is connected between two vertices if the two reactions share one or more metabolites. Square shaped vertices represent the reactions discovered to be different by using t-test on marginal probabilities, but not different when using the Fisher's test on the enzyme occurrences; Circle shaped vertices represent the reactions considered to be different in both statistical tests. (a) 327 reactions with higher marginal probabilities in Alaska permafrost samples; (b) 120 reactions with lower marginal probabilities in Alaska permafrost samples.</p

The Francis Crick Institute

An overview for protein identification using metaproteomic data, with metagenomic (MG) sequencing and metatranscriptomic (MT) data obtained from matched samples.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

We report two novel graph traversal algorithms (Graph2Pep and Graph2Pro, highlighted in red in the figure) to extract peptides and proteins from the de Bruijn graph representation of metagenome/metatranscriptome assemblies, respectively. We note the same pipeline can be applied when only matched metagenomic or metatranscriptomic data (but not both) is available, in which the graph algorithms will be applied to the assembly graph of metagenome (or metatranscriptome).</p

The Francis Crick Institute

Improvement of protein identification by using assembly graph.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Improvement of protein identification by using assembly graph.</p

The Francis Crick Institute

A schematic illustration of the graph traversal algorithms for extracting tryptic peptides (Graph2Pep; A) and proteins (Graph2Pro; B) from the de Bruijn graph assembly.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

A schematic illustration of the graph traversal algorithms for extracting tryptic peptides (Graph2Pep; A) and proteins (Graph2Pro; B) from the de Bruijn graph assembly.</p

The Francis Crick Institute

Properties of the Markov chain.

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

(a) Correlations of the probability of reaction in consecutive subnetworks sampled from the Markov chain. As the batch size in subsampling increases, the correlation decreases and become insignificant (0.1) for most reactions when batch size is set to 10,000. (b) Ergodic averages of the marginal probability for all reactions catalyzed by promiscuous enzymes in a metagenome. (subsampling with batch size = 10,000) (c) Running time of the Markov chain of global metabolic networks of various sizes for 250 million iterations. Top are the total numbers of reactions in each sample. Bottom are the numbers of reactions that are catalyzed by catalytically promiscuous enzymes.</p

The Francis Crick Institute

Probabilities of the 5 reactions catalyzed by muconate cycloisomerase (K01856).

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

The difference of the probabilities of the reaction R06989 between the two groups is more significant than the other reactions.</p

The Francis Crick Institute