11 research outputs found
Improving Collision Induced Dissociation (CID), High Energy Collision Dissociation (HCD), and Electron Transfer Dissociation (ETD) Fourier Transform MS/MS Degradome–Peptidome Identifications Using High Accuracy Mass Information
MS dissociation methods, including collision induced dissociation (CID), high energy collision dissociation (HCD), and electron transfer dissociation (ETD), can each contribute distinct peptidome identifications using conventional peptide identification methods (Shen et al. <i>J. Proteome Res</i>. <b>2011</b>), but such samples still pose significant informatics challenges. In this work, we explored utilization of high accuracy fragment ion mass measurements, in this case provided by Fourier transform MS/MS, to improve peptidome peptide data set size and consistency relative to conventional descriptive and probabilistic scoring methods. For example, we identified 20–40% more peptides than SEQUEST, Mascot, and MS_GF scoring methods using high accuracy fragment ion information and the same false discovery rate (FDR) from CID, HCD, and ETD spectra. Identified species covered >90% of the collective identifications obtained using various conventional peptide identification methods, which significantly addresses the common issue of different data analysis methods generating different peptide data sets. Choice of peptide dissociation and high-precision measurement-based identification methods presently available for degradomic–peptidomic analyses needs to be based on the coverage and confidence (or specificity) afforded by the method, as well as practical issues (e.g., throughput). By using accurate fragment information, >1000 peptidome components can be identified from a single human blood plasma analysis with low peptide-level FDRs (e.g., 0.6%), providing an improved basis for investigating potential disease-related peptidome components
Improving Collision Induced Dissociation (CID), High Energy Collision Dissociation (HCD), and Electron Transfer Dissociation (ETD) Fourier Transform MS/MS Degradome–Peptidome Identifications Using High Accuracy Mass Information
MS dissociation methods, including collision induced dissociation (CID), high energy collision dissociation (HCD), and electron transfer dissociation (ETD), can each contribute distinct peptidome identifications using conventional peptide identification methods (Shen et al. <i>J. Proteome Res</i>. <b>2011</b>), but such samples still pose significant informatics challenges. In this work, we explored utilization of high accuracy fragment ion mass measurements, in this case provided by Fourier transform MS/MS, to improve peptidome peptide data set size and consistency relative to conventional descriptive and probabilistic scoring methods. For example, we identified 20–40% more peptides than SEQUEST, Mascot, and MS_GF scoring methods using high accuracy fragment ion information and the same false discovery rate (FDR) from CID, HCD, and ETD spectra. Identified species covered >90% of the collective identifications obtained using various conventional peptide identification methods, which significantly addresses the common issue of different data analysis methods generating different peptide data sets. Choice of peptide dissociation and high-precision measurement-based identification methods presently available for degradomic–peptidomic analyses needs to be based on the coverage and confidence (or specificity) afforded by the method, as well as practical issues (e.g., throughput). By using accurate fragment information, >1000 peptidome components can be identified from a single human blood plasma analysis with low peptide-level FDRs (e.g., 0.6%), providing an improved basis for investigating potential disease-related peptidome components
Identification of Ultramodified Proteins Using Top-Down Tandem Mass Spectra
Post-translational
modifications (PTMs) play an important role
in various biological processes through changing protein structure
and function. Some ultramodified proteins (like histones) have multiple
PTMs forming PTM patterns that define the functionality of a protein.
While bottom-up mass spectrometry (MS) has been successful in identifying
individual PTMs within short peptides, it is unable to identify PTM
patterns spreading along entire proteins in a coordinated fashion.
In contrast, top-down MS analyzes intact proteins and reveals PTM
patterns along the entire proteins. However, while recent advances
in instrumentation have made top-down MS accessible to many laboratories,
most computational tools for top-down MS focus on proteins with few
PTMs and are unable to identify complex PTM patterns. We propose a
new algorithm, MS-Align-E, that identifies both expected and unexpected
PTMs in ultramodified proteins. We demonstrate that MS-Align-E identifies
many proteoforms of histone H4 and benchmark it against the currently
accepted software tools
Moving beyond the van Krevelen Diagram: A New Stoichiometric Approach for Compound Classification in Organisms
van
Krevelen diagrams (O/C vs H/C ratios of elemental formulas)
have been widely used in studies to obtain an estimation of the main
compound categories present in environmental samples. However, the
limits defining a specific compound category based solely on O/C and
H/C ratios of elemental formulas have never been accurately listed
or proposed to classify metabolites in biological samples. Furthermore,
while O/C vs H/C ratios of elemental formulas can provide an overview
of the compound categories, such classification is inefficient because
of the large overlap among different compound categories along both
axes. We propose a more accurate compound classification for biological
samples analyzed by high-resolution mass spectrometry based on an
assessment of the C/H/O/N/P stoichiometric ratios of over 130 000
elemental formulas of compounds classified in 6 main categories: lipids,
peptides, amino sugars, carbohydrates, nucleotides, and phytochemical
compounds (oxy-aromatic compounds). Our multidimensional stoichiometric
compound classification (MSCC) constraints showed a highly accurate
categorization of elemental formulas to the main compound categories
in biological samples with over 98% of accuracy representing a substantial
improvement over any classification based on the classic van Krevelen
diagram. This method represents a signficant step forward in environmental
research, especially ecological stoichiometry and eco-metabolomics
studies, by providing a novel and robust tool to improve our understanding
of the ecosystem structure and function through the chemical characterization
of biological samples
Moving beyond the van Krevelen Diagram: A New Stoichiometric Approach for Compound Classification in Organisms
van
Krevelen diagrams (O/C vs H/C ratios of elemental formulas)
have been widely used in studies to obtain an estimation of the main
compound categories present in environmental samples. However, the
limits defining a specific compound category based solely on O/C and
H/C ratios of elemental formulas have never been accurately listed
or proposed to classify metabolites in biological samples. Furthermore,
while O/C vs H/C ratios of elemental formulas can provide an overview
of the compound categories, such classification is inefficient because
of the large overlap among different compound categories along both
axes. We propose a more accurate compound classification for biological
samples analyzed by high-resolution mass spectrometry based on an
assessment of the C/H/O/N/P stoichiometric ratios of over 130 000
elemental formulas of compounds classified in 6 main categories: lipids,
peptides, amino sugars, carbohydrates, nucleotides, and phytochemical
compounds (oxy-aromatic compounds). Our multidimensional stoichiometric
compound classification (MSCC) constraints showed a highly accurate
categorization of elemental formulas to the main compound categories
in biological samples with over 98% of accuracy representing a substantial
improvement over any classification based on the classic van Krevelen
diagram. This method represents a signficant step forward in environmental
research, especially ecological stoichiometry and eco-metabolomics
studies, by providing a novel and robust tool to improve our understanding
of the ecosystem structure and function through the chemical characterization
of biological samples
Advanced Solvent Based Methods for Molecular Characterization of Soil Organic Matter by High-Resolution Mass Spectrometry
Soil organic matter (SOM), a complex,
heterogeneous mixture of
above and belowground plant litter and animal and microbial residues
at various degrees of decomposition, is a key reservoir for carbon
(C) and nutrient biogeochemical cycling in soil based ecosystems.
A limited understanding of the molecular composition of SOM limits
the ability to routinely decipher chemical processes within soil and
accurately predict how terrestrial carbon fluxes will respond to changing
climatic conditions and land use. To elucidate the molecular-level
structure of SOM, we selectively extracted a broad range of intact
SOM compounds by a combination of different organic solvents from
soils with a wide range of C content. Our use of electrospray ionization
(ESI) coupled with Fourier transform ion cyclotron resonance mass
spectrometry (FTICR MS) and a suite of solvents with varying polarity
significantly expands the inventory of the types of organic molecules
present in soils. Specifically, we found that hexane is selective
for lipid-like compounds with very low O/C ratios (<0.1); water
(H<sub>2</sub>O) was selective for carbohydrates with high O/C ratios;
acetonitrile (ACN) preferentially extracts lignin, condensed structures,
and tannin polyphenolic compounds with O/C > 0.5; methanol (MeOH)
has higher selectivity toward compounds characterized with low O/C
< 0.5; and hexane, MeOH, ACN, and H<sub>2</sub>O solvents increase
the number and types of organic molecules extracted from soil for
a broader range of chemically diverse soil types. Our study of SOM
molecules by ESI FTICR MS revealed new insight into the molecular-level
complexity of organics contained in soils. We present the first comparative
study of the molecular composition of SOM from different ecosystems
using ultra high-resolution mass spectrometry
Formularity: Software for Automated Formula Assignment of Natural and Other Organic Matter from Ultrahigh-Resolution Mass Spectra
Ultrahigh
resolution mass spectrometry, such as Fourier transform
ion cyclotron resonance mass spectrometry (FT ICR MS), can resolve
thousands of molecular ions in complex organic matrices. A Compound
Identification Algorithm (CIA) was previously developed for automated
elemental formula assignment for natural organic matter (NOM). In
this work, we describe software Formularity with a user-friendly interface
for CIA function and newly developed search function Isotopic Pattern
Algorithm (IPA). While CIA assigns elemental formulas for compounds
containing C, H, O, N, S, and P, IPA is capable of assigning formulas
for compounds containing other elements. We used halogenated organic
compounds (HOC), a chemical class that is ubiquitous in nature as
well as anthropogenic systems, as an example to demonstrate the capability
of Formularity with IPA. A HOC standard mix was used to evaluate the
identification confidence of IPA. Tap water and HOC spike in Suwannee
River NOM were used to assess HOC identification in complex environmental
samples. Strategies for reconciliation of CIA and IPA assignments
were discussed. Software and sample databases with documentation are
freely available
De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra
De novo sequencing of proteins and
peptides is one of the most
important problems in mass spectrometry-driven proteomics. A variety
of methods have been developed to accomplish this task from a set
of bottom-up tandem (MS/MS) mass spectra. However, a more recently
emerged top-down technology, now gaining more and more popularity,
opens new perspectives for protein analysis and characterization,
implying a need for efficient algorithms to process this kind of MS/MS
data. Here, we describe a method that allows for the retrieval, from
a set of top-down MS/MS spectra, of long and accurate sequence fragments
of the proteins contained in the sample. To this end, we outline a
strategy for generating high-quality sequence tags from top-down spectra,
and introduce the concept of a <i>T</i>-Bruijn graph by
adapting to the case of tags the notion of an <i>A</i>-Bruijn
graph widely used in genomics. The output of the proposed approach
represents the set of amino acid strings spelled out by optimal paths
in the connected components of a <i>T</i>-Bruijn graph.
We illustrate its performance on top-down data sets acquired from
carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab
<i>De Novo</i> Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra
There
are two approaches for <i>de novo</i> protein sequencing:
Edman degradation and mass spectrometry (MS). Existing MS-based methods
characterize a novel protein by assembling tandem mass spectra of
overlapping peptides generated from multiple proteolytic digestions
of the protein. Because each tandem mass spectrum covers only a short
peptide of the target protein, the key to high coverage protein sequencing
is to find spectral pairs from overlapping peptides in order to assemble
tandem mass spectra to long ones. However, overlapping regions of
peptides may be too short to be confidently identified. High-resolution
mass spectrometers have become accessible to many laboratories. These
mass spectrometers are capable of analyzing molecules of large mass
values, boosting the development of top-down MS. Top-down tandem mass
spectra cover whole proteins. However, top-down tandem mass spectra,
even combined, rarely provide full ion fragmentation coverage of a
protein. We propose an algorithm, TBNovo, for <i>de novo</i> protein sequencing by combining top-down and bottom-up MS. In TBNovo,
a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up
tandem mass spectra are aligned to the scaffold to increase sequence
coverage. Experiments on data sets of two proteins showed that TBNovo
achieved high sequence coverage and high sequence accuracy
De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra
De novo sequencing of proteins and
peptides is one of the most
important problems in mass spectrometry-driven proteomics. A variety
of methods have been developed to accomplish this task from a set
of bottom-up tandem (MS/MS) mass spectra. However, a more recently
emerged top-down technology, now gaining more and more popularity,
opens new perspectives for protein analysis and characterization,
implying a need for efficient algorithms to process this kind of MS/MS
data. Here, we describe a method that allows for the retrieval, from
a set of top-down MS/MS spectra, of long and accurate sequence fragments
of the proteins contained in the sample. To this end, we outline a
strategy for generating high-quality sequence tags from top-down spectra,
and introduce the concept of a <i>T</i>-Bruijn graph by
adapting to the case of tags the notion of an <i>A</i>-Bruijn
graph widely used in genomics. The output of the proposed approach
represents the set of amino acid strings spelled out by optimal paths
in the connected components of a <i>T</i>-Bruijn graph.
We illustrate its performance on top-down data sets acquired from
carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab