6 research outputs found
DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead,
DBDigger determines which spectra can be compared to
each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization
also reduces the number of times a spectrum must be
predicted for a particular candidate sequence and charge
state. As a result, DBDigger can accelerate some database
searches by more than an order of magnitude. In addition,
the software offers features to reduce the performance
degradation introduced by posttranslational modification
(PTM) searching. DBDigger allows researchers to specify
the sequence context in which each PTM is possible. In
the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini
of peptides. Use of “context-dependent” PTM searching
reduces the performance penalty relative to traditional
PTM searching. We characterize the performance possible
with DBDigger, showcasing MASPIC, a new statistical
scorer. We describe the implementation of these innovations in the hope that other researchers will employ them
for rapid and highly flexible proteomic database search
DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead,
DBDigger determines which spectra can be compared to
each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization
also reduces the number of times a spectrum must be
predicted for a particular candidate sequence and charge
state. As a result, DBDigger can accelerate some database
searches by more than an order of magnitude. In addition,
the software offers features to reduce the performance
degradation introduced by posttranslational modification
(PTM) searching. DBDigger allows researchers to specify
the sequence context in which each PTM is possible. In
the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini
of peptides. Use of “context-dependent” PTM searching
reduces the performance penalty relative to traditional
PTM searching. We characterize the performance possible
with DBDigger, showcasing MASPIC, a new statistical
scorer. We describe the implementation of these innovations in the hope that other researchers will employ them
for rapid and highly flexible proteomic database search
DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead,
DBDigger determines which spectra can be compared to
each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization
also reduces the number of times a spectrum must be
predicted for a particular candidate sequence and charge
state. As a result, DBDigger can accelerate some database
searches by more than an order of magnitude. In addition,
the software offers features to reduce the performance
degradation introduced by posttranslational modification
(PTM) searching. DBDigger allows researchers to specify
the sequence context in which each PTM is possible. In
the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini
of peptides. Use of “context-dependent” PTM searching
reduces the performance penalty relative to traditional
PTM searching. We characterize the performance possible
with DBDigger, showcasing MASPIC, a new statistical
scorer. We describe the implementation of these innovations in the hope that other researchers will employ them
for rapid and highly flexible proteomic database search
DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead,
DBDigger determines which spectra can be compared to
each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization
also reduces the number of times a spectrum must be
predicted for a particular candidate sequence and charge
state. As a result, DBDigger can accelerate some database
searches by more than an order of magnitude. In addition,
the software offers features to reduce the performance
degradation introduced by posttranslational modification
(PTM) searching. DBDigger allows researchers to specify
the sequence context in which each PTM is possible. In
the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini
of peptides. Use of “context-dependent” PTM searching
reduces the performance penalty relative to traditional
PTM searching. We characterize the performance possible
with DBDigger, showcasing MASPIC, a new statistical
scorer. We describe the implementation of these innovations in the hope that other researchers will employ them
for rapid and highly flexible proteomic database search
MASPIC: Intensity-Based Tandem Mass Spectrometry Scoring Scheme That Improves Peptide Identification at High Confidence
Algorithmic search engines bridge the gap between large
tandem mass spectrometry data sets and the identification
of proteins associated with biological samples. Improvements in these tools can greatly enhance biological
discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral
Profile-based Intensity Comparison) scorer converts an
experimental tandem mass spectrum into a m/z profile
of probability and then scores peak lists from potential
candidate peptides using a multinomial distribution model.
The MASPIC scoring scheme incorporates intensity,
spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial
distribution. The scoring scheme was validated on two
standard protein mixtures and an additional set of spectra
collected on a complex ribosomal protein mixture from
Rhodopseudomonas palustris. The results indicate a
5−15% improvement over Sequest for high-confidence
identifications. The performance gap grows as sequence
database size increases. Additional tests on spectra from
proteinase-K digest data showed similar performance
improvements demonstrating the advantages in using
MASPIC for studying proteins digested with less specific
proteases. All these investigations show MASPIC to be a
versatile and reliable system for peptide tandem mass
spectral identification
MASPIC: Intensity-Based Tandem Mass Spectrometry Scoring Scheme That Improves Peptide Identification at High Confidence
Algorithmic search engines bridge the gap between large
tandem mass spectrometry data sets and the identification
of proteins associated with biological samples. Improvements in these tools can greatly enhance biological
discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral
Profile-based Intensity Comparison) scorer converts an
experimental tandem mass spectrum into a m/z profile
of probability and then scores peak lists from potential
candidate peptides using a multinomial distribution model.
The MASPIC scoring scheme incorporates intensity,
spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial
distribution. The scoring scheme was validated on two
standard protein mixtures and an additional set of spectra
collected on a complex ribosomal protein mixture from
Rhodopseudomonas palustris. The results indicate a
5−15% improvement over Sequest for high-confidence
identifications. The performance gap grows as sequence
database size increases. Additional tests on spectra from
proteinase-K digest data showed similar performance
improvements demonstrating the advantages in using
MASPIC for studying proteins digested with less specific
proteases. All these investigations show MASPIC to be a
versatile and reliable system for peptide tandem mass
spectral identification
