Search CORE

11 research outputs found

An overview for protein identification using metaproteomic data, with metagenomic (MG) sequencing and metatranscriptomic (MT) data obtained from matched samples.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

We report two novel graph traversal algorithms (Graph2Pep and Graph2Pro, highlighted in red in the figure) to extract peptides and proteins from the de Bruijn graph representation of metagenome/metatranscriptome assemblies, respectively. We note the same pipeline can be applied when only matched metagenomic or metatranscriptomic data (but not both) is available, in which the graph algorithms will be applied to the assembly graph of metagenome (or metatranscriptome).</p

FigShare

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date: 01/12/2016
Field of study

<div>Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at <a href="https://github.com/COL-IU/Graph2Pro" target="_blank">https://github.com/COL-IU/Graph2Pro</a>.</div

Directory of Open Access Journals

FigShare

A schematic illustration of the graph traversal algorithms for extracting tryptic peptides (Graph2Pep; A) and proteins (Graph2Pro; B) from the de Bruijn graph assembly.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

A schematic illustration of the graph traversal algorithms for extracting tryptic peptides (Graph2Pep; A) and proteins (Graph2Pro; B) from the de Bruijn graph assembly.</p

FigShare

Improvement of protein identification by using assembly graph.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Improvement of protein identification by using assembly graph.</p

FigShare

The 2-chlorobenzoate degradation pathway.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Circles represent compounds, and boxes (with EC numbers) represent enzymes. Enzymes with MS/MS data support are highlighted in purple. The figure was prepared using PathVisio [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005224#pcbi.1005224.ref054" target="_blank">54</a>] based on the MetaCyc’s diagrams of pathways PWY-6221 and P183-PWY.</p

FigShare

The number of identified enzymes involved in the Rubisco shunt.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

The number of identified enzymes involved in the Rubisco shunt.</p

FigShare

Summary of peptide identification in wastewater datasets based on the assembly of combined metagenomic and metatranscriptomic data.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Summary of peptide identification in wastewater datasets based on the assembly of combined metagenomic and metatranscriptomic data.</p

FigShare

Comparison of the numbers of proteins in top 20 eggNOG families receiving the most hits of proteins identified in the SD3 sample by the graph-centric approach (Graph2Pro, blue) and the conventional approach (FragGeneScan, red).

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Comparison of the numbers of proteins in top 20 eggNOG families receiving the most hits of proteins identified in the SD3 sample by the graph-centric approach (Graph2Pro, blue) and the conventional approach (FragGeneScan, red).</p

FigShare

Summary of the assemblies for three data sets used in the benchmarking experiments.

Author: Haixu Tang (16054)
Sujun Li (2234731)
Yuzhen Ye (4283)
Publication venue
Publication date
Field of study

Summary of the assemblies for three data sets used in the benchmarking experiments.</p

FigShare

XLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides

Author: Chao Ji (276390)
Haixu Tang (16054)
James P. Reilly (1531357)
Predrag Radivojac (19890)
Sujun Li (2234731)
Publication venue
Publication date
Field of study

Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein–protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch)

FigShare