Search CORE

47 research outputs found

Additional file 1 of Protein embedding based alignment

Author: Benjamin Giovanni Iovino (18067746)
Yuzhen Ye (4283)
Publication venue
Publication date: 28/02/2024
Field of study

Additional file 1. Supplementary information (Table S1 and Table S2, Figure S1 and Figure S2)

FigShare

Additional file 6: of Not all predicted CRISPR–Cas systems are equal: isolated cas genes and classes of CRISPR like elements

Author: Quan Zhang (168081)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

False-CRISPR elements found in Biswas’ collection. (DOCX 33 kb

FigShare

Schematic illustration of the MinPath method.

Author: Thomas G. Doak (48083)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Assume 6 families (or orthologous groups, f1, …, f6) are identified from a given sample of genes (e.g., the genes could be from a genome, or sampled from a metagenome). The naïve mapping approach (shown on the left) will lead to a reconstruction with 4 pathways annotated (p1, p2, p3, and p4). Due to the overlapping nature of the biological pathways (see text for more details), pathway p3 shares function f3 with pathway p2. We claim that only three pathways, p1, p2, and p3 are sufficient to explain the existence of the 6 families annotated in the dataset, and a conservative reconstruction of pathways should have only 3 pathways (shown on the right). As we show in the paper, such a conservative estimation of pathways provides a more reliable estimation of the functional diversity of a sample.</p

FigShare

The ascorbate and aldarate metabolism pathway, eliminated by MinPath.

Author: Thomas G. Doak (48083)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

The diagram was prepared based on the corresponding KEGG pathway (ID = 00053), and only part of the pathway is shown for clarity. The three enzymes that are annotated in the human genome are highlighted in green, even though none of these enzymes are unique to this pathway.</p

FigShare

Selected spurious pathways of the E. coli genome (collected in KEGG) eliminated by MinPath.

Author: Thomas G. Doak (48083)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Selected spurious pathways of the E. coli genome (collected in KEGG) eliminated by MinPath.</p

FigShare

Comparison of the number of pathways reconstructed for various genomes by different methods.

Author: Thomas G. Doak (48083)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

The coloring schema is as following: MinPath (red triangles), naïve mapping approach (green), and the pathway annotation maintained in KEGG database after human evaluation (blue).</p

FigShare

Comparison of biological pathway reconstruction based on MinPath and the naïve mapping approach for selected metagenomesa.

Author: Thomas G. Doak (48083)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

ametagenomes sampled from different environments <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000465#pcbi.1000465-Dinsdale1" target="_blank">[17]</a> (-Mic, and -Vir are for microbial and viral metagenomes, respectively, as shown in the table).bmicrobial metagenomes sampled from coral, with the total number of sequencing datasets shown in the brackets.cbased on the KEGG pathways (the KEGG database used in this study was downloaded in Dec, 2008, which has 345 pathways).dbased on the SEED subsystems (we used FIGfams release 6, which has more subsystems than reported in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000465#pcbi.1000465-Dinsdale1" target="_blank">[17]</a>, and the total number of subsystems included is 898).ethe two numbers present the total number of pathways (or subsystems) found in at least two of the datasets (e.g., two out of 7 for Coral-Mic), and in at least one of the datasets for each environmental location, respectively.</p

FigShare

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/12/2016
Field of study

<div>Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at <a href="https://github.com/COL-IU/Graph2Pro" target="_blank">https://github.com/COL-IU/Graph2Pro</a>.</div

Directory of Open Access Journals

FigShare

Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences

Author: Dazhi Jiao (392777)
Haixu Tang (16054)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/01/2013
Field of study

<div>Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at <a href="http://omics.informatics.indiana.edu/mg/MetaNetSam/" target="_blank">http://omics.informatics.indiana.edu/mg/MetaNetSam/</a>. </div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

An overview for protein identification using metaproteomic data, with metagenomic (MG) sequencing and metatranscriptomic (MT) data obtained from matched samples.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

We report two novel graph traversal algorithms (Graph2Pep and Graph2Pro, highlighted in red in the figure) to extract peptides and proteins from the de Bruijn graph representation of metagenome/metatranscriptome assemblies, respectively. We note the same pipeline can be applied when only matched metagenomic or metatranscriptomic data (but not both) is available, in which the graph algorithms will be applied to the assembly graph of metagenome (or metatranscriptome).</p

FigShare

Additional file 1 of Protein embedding based alignment

Additional file 6: of Not all predicted CRISPR–Cas systems are equal: isolated cas genes and classes of CRISPR like elements

Schematic illustration of the MinPath method.

The ascorbate and aldarate metabolism pathway, eliminated by MinPath.

Selected spurious pathways of the <i>E. coli</i> genome (collected in KEGG) eliminated by MinPath.

Comparison of the number of pathways reconstructed for various genomes by different methods.

Comparison of biological pathway reconstruction based on MinPath and the naïve mapping approach for selected metagenomes<sup>a</sup>.

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences

An overview for protein identification using metaproteomic data, with metagenomic (MG) sequencing and metatranscriptomic (MT) data obtained from matched samples.