16 research outputs found

    Implementing the MSFragger Search Engine as a Node in Proteome Discoverer

    No full text
    Here, we describe the implementation of the fast proteomics search engine MSFragger as a processing node in the widely used Proteome Discoverer (PD) software platform. PeptideProphet (via the Philosopher tool kit) is also implemented as an additional PD node to allow validation of MSFragger open (mass-tolerant) search results. These two nodes, along with the existing Percolator validation module, allow users to employ different search strategies and conveniently inspect search results through PD. Our results have demonstrated the improved numbers of PSMs, peptides, and proteins identified by MSFragger coupled with Percolator and significantly faster search speed compared to the conventional SEQUEST/Percolator PD workflows. The MSFragger-PD node is available at https://github.com/nesvilab/PD-Nodes/releases/

    Implementing the MSFragger Search Engine as a Node in Proteome Discoverer

    No full text
    Here, we describe the implementation of the fast proteomics search engine MSFragger as a processing node in the widely used Proteome Discoverer (PD) software platform. PeptideProphet (via the Philosopher tool kit) is also implemented as an additional PD node to allow validation of MSFragger open (mass-tolerant) search results. These two nodes, along with the existing Percolator validation module, allow users to employ different search strategies and conveniently inspect search results through PD. Our results have demonstrated the improved numbers of PSMs, peptides, and proteins identified by MSFragger coupled with Percolator and significantly faster search speed compared to the conventional SEQUEST/Percolator PD workflows. The MSFragger-PD node is available at https://github.com/nesvilab/PD-Nodes/releases/

    iMet-Q: A User-Friendly Tool for Label-Free Metabolomics Quantitation Using Dynamic Peak-Width Determination

    No full text
    <div><p>Efficient and accurate quantitation of metabolites from LC-MS data has become an important topic. Here we present an automated tool, called iMet-Q (<u>i</u>ntelligent <u>Met</u>abolomic <u>Q</u>uantitation), for label-free metabolomics quantitation from high-throughput MS1 data. By performing peak detection and peak alignment, iMet-Q provides a summary of quantitation results and reports ion abundance at both replicate level and sample level. Furthermore, it gives the charge states and isotope ratios of detected metabolite peaks to facilitate metabolite identification. An in-house standard mixture and a public Arabidopsis metabolome data set were analyzed by iMet-Q. Three public quantitation tools, including XCMS, MetAlign, and MZmine 2, were used for performance comparison. From the mixture data set, seven standard metabolites were detected by the four quantitation tools, for which iMet-Q had a smaller quantitation error of 12% in both profile and centroid data sets. Our tool also correctly determined the charge states of seven standard metabolites. By searching the mass values for those standard metabolites against Human Metabolome Database, we obtained a total of 183 metabolite candidates. With the isotope ratios calculated by iMet-Q, 49% (89 out of 183) metabolite candidates were filtered out. From the public Arabidopsis data set reported with two internal standards and 167 elucidated metabolites, iMet-Q detected all of the peaks corresponding to the internal standards and 167 metabolites. Meanwhile, our tool had small abundance variation (≤0.19) when quantifying the two internal standards and had higher abundance correlation (≥0.92) when quantifying the 167 metabolites. iMet-Q provides user-friendly interfaces and is publicly available for download at <a href="http://ms.iis.sinica.edu.tw/comics/Software_iMet-Q.html" target="_blank">http://ms.iis.sinica.edu.tw/comics/Software_iMet-Q.html</a>.</p></div

    Hierarchical clustering by using the quantitation results of iMet-Q, XCMS, MetAlign, and MZmine 2.

    No full text
    <p>Each entry in the tree leaves of a dendrogram represents a replicate. For each tool, we first combined its quantitation results of positive- and negative-ion modes. Colors were assigned to each replicate in the combined quantitation results according to the plant classes which the replicates originated from as follows: orange for cotyledon, red for stem, green for leaf, blue for flower, light blue for shoot apex, yellow for root, pink for seed, and gray for silique. Next, the figure was produced using MATLAB <i>dendrogram</i> function with PMMCC as the abundance correlation measure between any two replicates in the combined quantitation results.</p

    A cartoon for the illustration of constructing extracted ion chromatograms.

    No full text
    <p>The blue straight lines represent the clustered signals, <i>w</i> and <i>t</i> are the FWHM and retention time of , respectively. Signal A and B are determined as the boundaries of the EIC, and the area in light blue color is the abundance.</p

    Informatics View on the Challenges of Identifying Missing Proteins from Shotgun Proteomics

    No full text
    Protein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins. Missing proteins were unidentified partially due to MS limitations and intrinsic properties of proteins, for example, only appearing in specific diseases or tissues. Despite such reasons, it is desirable to explore issues affecting validation of missing proteins from an “ideal” shotgun analysis of human proteome. We thus performed in silico digestions on the human proteins to generate all in silico fully digested peptides. With these presumed peptides, we investigated the identification of proteins without any unique peptide, the effect of sequence variants on protein identification, difficulties in identifying olfactory receptors, and highly similar proteins. Among all proteins with evidence at transcript level, G protein-coupled receptors and olfactory receptors, based on InterPro classification, were the largest families of proteins and exhibited more frequent variants. To identify missing proteins, the above analyses suggested including sequence variants in protein FASTA for database searching. Furthermore, evidence of unique peptides identified from MS experiments would be crucial for experimentally validating missing proteins
    corecore