16 research outputs found
Implementing the MSFragger Search Engine as a Node in Proteome Discoverer
Here, we describe the implementation
of the fast proteomics search
engine MSFragger as a processing node in the widely used Proteome
Discoverer (PD) software platform. PeptideProphet (via the Philosopher
tool kit) is also implemented as an additional PD node to allow validation
of MSFragger open (mass-tolerant) search results. These two nodes,
along with the existing Percolator validation module, allow users
to employ different search strategies and conveniently inspect search
results through PD. Our results have demonstrated the improved numbers
of PSMs, peptides, and proteins identified by MSFragger coupled with
Percolator and significantly faster search speed compared to the conventional
SEQUEST/Percolator PD workflows. The MSFragger-PD node is available
at https://github.com/nesvilab/PD-Nodes/releases/
Implementing the MSFragger Search Engine as a Node in Proteome Discoverer
Here, we describe the implementation
of the fast proteomics search
engine MSFragger as a processing node in the widely used Proteome
Discoverer (PD) software platform. PeptideProphet (via the Philosopher
tool kit) is also implemented as an additional PD node to allow validation
of MSFragger open (mass-tolerant) search results. These two nodes,
along with the existing Percolator validation module, allow users
to employ different search strategies and conveniently inspect search
results through PD. Our results have demonstrated the improved numbers
of PSMs, peptides, and proteins identified by MSFragger coupled with
Percolator and significantly faster search speed compared to the conventional
SEQUEST/Percolator PD workflows. The MSFragger-PD node is available
at https://github.com/nesvilab/PD-Nodes/releases/
iMet-Q: A User-Friendly Tool for Label-Free Metabolomics Quantitation Using Dynamic Peak-Width Determination
<div><p>Efficient and accurate quantitation of metabolites from LC-MS data has become an important topic. Here we present an automated tool, called iMet-Q (<u>i</u>ntelligent <u>Met</u>abolomic <u>Q</u>uantitation), for label-free metabolomics quantitation from high-throughput MS1 data. By performing peak detection and peak alignment, iMet-Q provides a summary of quantitation results and reports ion abundance at both replicate level and sample level. Furthermore, it gives the charge states and isotope ratios of detected metabolite peaks to facilitate metabolite identification. An in-house standard mixture and a public Arabidopsis metabolome data set were analyzed by iMet-Q. Three public quantitation tools, including XCMS, MetAlign, and MZmine 2, were used for performance comparison. From the mixture data set, seven standard metabolites were detected by the four quantitation tools, for which iMet-Q had a smaller quantitation error of 12% in both profile and centroid data sets. Our tool also correctly determined the charge states of seven standard metabolites. By searching the mass values for those standard metabolites against Human Metabolome Database, we obtained a total of 183 metabolite candidates. With the isotope ratios calculated by iMet-Q, 49% (89 out of 183) metabolite candidates were filtered out. From the public Arabidopsis data set reported with two internal standards and 167 elucidated metabolites, iMet-Q detected all of the peaks corresponding to the internal standards and 167 metabolites. Meanwhile, our tool had small abundance variation (≤0.19) when quantifying the two internal standards and had higher abundance correlation (≥0.92) when quantifying the 167 metabolites. iMet-Q provides user-friendly interfaces and is publicly available for download at <a href="http://ms.iis.sinica.edu.tw/comics/Software_iMet-Q.html" target="_blank">http://ms.iis.sinica.edu.tw/comics/Software_iMet-Q.html</a>.</p></div
The box plot of abundance correlation of 167 elucidated metabolites across replicates in the public Arabidopsis data detected by the four quantitation tools.
<p>The box plot of abundance correlation of 167 elucidated metabolites across replicates in the public Arabidopsis data detected by the four quantitation tools.</p
The reproducibility (Rep.) and normalized abundance (Abund.) of two internal standards detected by four quantitation tools.
<p>The reproducibility (Rep.) and normalized abundance (Abund.) of two internal standards detected by four quantitation tools.</p
Hierarchical clustering by using the quantitation results of iMet-Q, XCMS, MetAlign, and MZmine 2.
<p>Each entry in the tree leaves of a dendrogram represents a replicate. For each tool, we first combined its quantitation results of positive- and negative-ion modes. Colors were assigned to each replicate in the combined quantitation results according to the plant classes which the replicates originated from as follows: orange for cotyledon, red for stem, green for leaf, blue for flower, light blue for shoot apex, yellow for root, pink for seed, and gray for silique. Next, the figure was produced using MATLAB <i>dendrogram</i> function with PMMCC as the abundance correlation measure between any two replicates in the combined quantitation results.</p
The quantitation error (%) of seven standard metabolites calculated by iMet-Q, XCMS, MetAlign, and MZmine 2.
<p>The quantitation error (%) of seven standard metabolites calculated by iMet-Q, XCMS, MetAlign, and MZmine 2.</p
Schematic depiction of iMet-Q workflow for peak detection and peak alignment.
<p>Schematic depiction of iMet-Q workflow for peak detection and peak alignment.</p
A cartoon for the illustration of constructing extracted ion chromatograms.
<p>The blue straight lines represent the clustered signals, <i>w</i> and <i>t</i> are the FWHM and retention time of , respectively. Signal A and B are determined as the boundaries of the EIC, and the area in light blue color is the abundance.</p
Informatics View on the Challenges of Identifying Missing Proteins from Shotgun Proteomics
Protein
experiment evidence at protein level from mass spectrometry
and antibody experiments are essential to characterize the human proteome.
neXtProt (2014-09 release) reported 20 055 human proteins,
including 16 491 proteins identified at protein level and 3564
proteins unidentified. Excluding 616 proteins at uncertain level,
2948 proteins were regarded as missing proteins. Missing proteins
were unidentified partially due to MS limitations and intrinsic properties
of proteins, for example, only appearing in specific diseases or tissues.
Despite such reasons, it is desirable to explore issues affecting
validation of missing proteins from an “ideal” shotgun
analysis of human proteome. We thus performed in silico digestions
on the human proteins to generate all in silico fully digested peptides.
With these presumed peptides, we investigated the identification of
proteins without any unique peptide, the effect of sequence variants
on protein identification, difficulties in identifying olfactory receptors,
and highly similar proteins. Among all proteins with evidence at transcript
level, G protein-coupled receptors and olfactory receptors, based
on InterPro classification, were the largest families of proteins
and exhibited more frequent variants. To identify missing proteins,
the above analyses suggested including sequence variants in protein
FASTA for database searching. Furthermore, evidence of unique peptides
identified from MS experiments would be crucial for experimentally
validating missing proteins