35 research outputs found
Proceedings of the EuBIC Winter School 2019
The 2019 European Bioinformatics Community (EuBIC) Winter School was held from January 15th to January 18th 2019 in Zakopane, Poland. This year’s meeting was the third of its kind and gathered international researchers in the field of (computational) proteomics to discuss (mainly) challenges in proteomics quantification and data independent acquisition (DIA). Here, we present an overview of the scientific program of the 2019 EuBIC Winter School. Furthermore, we can already give a small outlook to the upcoming EuBIC 2020 Developer’s Meeting
Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis
<p>Abstract</p> <p>Background</p> <p>Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for PPI validation.</p> <p>Results</p> <p>To validate an experimentally observed PPI, we combine FASTA and PSI-BLAST to perform a sensitive sequence-based search for pairs of interacting homologous proteins within a large, integrated PPI database. A novel scoring scheme that incorporates both quality and quantity of all observed matches allows us (1) to consider also tentative paralogs and orthologs in this analysis and (2) to combine search results from more than one homology detection method. ROC curves illustrate the high efficacy of this approach and its improvement over other homology-based validation methods.</p> <p>Conclusion</p> <p>New PPIs are primarily derived from preexisting PPIs and not invented <it>de novo</it>. Thus, the hallmark of true PPIs is the existence of homologous PPIs. The sensitive search for homologous PPIs within a large body of known PPIs is an efficient strategy to separate biologically relevant PPIs from the many spurious PPIs reported by high-throughput experiments.</p
Measurement of the Splitting Function in &ITpp &ITand Pb-Pb Collisions at root&ITsNN&IT=5.02 TeV
Data from heavy ion collisions suggest that the evolution of a parton shower is modified by interactions with the color charges in the dense partonic medium created in these collisions, but it is not known where in the shower evolution the modifications occur. The momentum ratio of the two leading partons, resolved as subjets, provides information about the parton shower evolution. This substructure observable, known as the splitting function, reflects the process of a parton splitting into two other partons and has been measured for jets with transverse momentum between 140 and 500 GeV, in pp and PbPb collisions at a center-of-mass energy of 5.02 TeV per nucleon pair. In central PbPb collisions, the splitting function indicates a more unbalanced momentum ratio, compared to peripheral PbPb and pp collisions.. The measurements are compared to various predictions from event generators and analytical calculations.Peer reviewe
Inclusive Search for a Highly Boosted Higgs Boson Decaying to a Bottom Quark-Antiquark Pair
© 2018 CERN. An inclusive search for the standard model Higgs boson (H) produced with large transverse momentum (pT) and decaying to a bottom quark-antiquark pair (bb) is performed using a data set of pp collisions at s=13 TeV collected with the CMS experiment at the LHC. The data sample corresponds to an integrated luminosity of 35.9 fb-1. A highly Lorentz-boosted Higgs boson decaying to bb is reconstructed as a single, large radius jet, and it is identified using jet substructure and dedicated b tagging techniques. The method is validated with Z→bb decays. The Z→bb process is observed for the first time in the single-jet topology with a local significance of 5.1 standard deviations (5.8 expected). For a Higgs boson mass of 125 GeV, an excess of events above the expected background is observed (expected) with a local significance of 1.5 (0.7) standard deviations. The measured cross section times branching fraction for production via gluon fusion of H→bb with reconstructed pT > 450 GeV and in the pseudorapidity range -2.5 < η < 2.5 is 74±48(stat)-10+17(syst) fb, which is consistent within uncertainties with the standard model prediction
Identification of Peptides and Proteins in High Resolution Tandem Mass Spectrometry Data
Mass spectrometry has emerged as the leading technology for the identification and quantification of proteins in biological samples, playing an indispensable role in proteomic research. Even first steps towards clinical routine especially in terms of personalized medicine have been taken. Due to the complexity and the amount of generated data, specific software solutions are necessary to be able analyze them. Especially the identification of proteins and peptides from mass spectrometry data, one of the first but also one of the most important steps, is a challenging task. This doctoral thesis describes several algorithms for the analysis of such data sets, specifically designed to exploit the power of recent developments in instrument design, revealing high resolution and high accuracy data sets. A main part of this thesis is a new algorithm for database search, MS Amanda, capable of identifying peptides in mass spectrometry data. Applying MS Amanda leads to a higher number of identified peptides at the same false discovery rate compared to established software solutions. Additionally, algorithms have been designed for the identification of chimeric spectra -- spectra, carrying more than a single peptide --, revealing a potential, that otherwise remains unexploited. Even for data sets with instrument settings to avoid the occurrence of chimeric spectra, up to 30% of such spectra are measured, rising to 60% for complex samples. Up to 50% additional unique peptides, that would have remained unidentified, can be found at no extra measurement time applying a chimeric search. All results of this doctoral thesis have been disseminated through publication in internationally renowned journals and presentations at various conferences relevant for the proteomics community. All algorithms have been made available free of charge and are integrated in various software packages, enabling further downstream analyses of the identified spectra. These efforts had great impact on the international awareness of the algorithms presented in this thesis, also revealed by the number of citations and the number of downloads of the software.submitted by DI(FH) Viktoria Dorfer MSc.Universität Linz, Dissertation, 2019OeBB(VLID)356411
PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search
Standard
proteomics workflows use tandem mass spectrometry followed
by sequence database search to analyze complex biological samples.
The identification of proteins carrying post-translational modifications,
for example, phosphorylation, is typically addressed by allowing variable
modifications in the searched sequences. Accounting for these variations
exponentially increases the combinatorial space in the database, which
leads to increased processing times and more false positive identifications.
The here-presented tool PhoStar identifies spectra that originate
from phosphorylated peptides before database search using a supervised
machine learning approach. The model for the prediction of phosphorylation
was trained and validated with an accuracy of 97.6% on a large set
of high-confidence spectra collected from publicly available experimental
data. Its power was further validated by predicting phosphorylation
in the complete NIST human and mouse high collision-dissociation spectral
libraries, achieving an accuracy of 98.2 and 97.9%, respectively.
We demonstrate the application of PhoStar by using it for spectra
filtering before database search. In database search of HeLa samples
the peptide search space was reduced by 27–66% while finding
at least 97% of total peptide identifications (at 1% FDR) compared
with a standard workflow
Data for "MS²Rescore 3.0 is a modular, flexible, and user-friendly platform to boost peptide identifications, as showcased with MS Amanda 3.0"
<p>This data set contains reanalyzed data from the "MS2 500 ms" and "MS2 750 ms" workflow <a href="https://doi.org/10.1016/j.mcpro.2022.100219">originally published by Furtwängler et al. 2022</a>. </p><p>The full data set is available on ProteomeXchange via the PRIDE partner repository using ID <a href="https://www.ebi.ac.uk/pride/archive/projects/PXD029320">PXD029320</a>. <br> </p><p>This data set contains, for both workflows </p><ul><li>mzML files</li><li>Output files from analysis with MS Amanda + settings file</li><li>Output files from analysis with MS Amanda in combination with Percolator + settings file</li><li>Percolator pin files</li><li>Output files from analysis with MS²Rescore</li></ul>
MS²Rescore 3.0 is a modular, flexible, and user-friendly platform to boost peptide identifications, as showcased with MS Amanda 3.0
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We here introduce MS²Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS²Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation.
To showcase this new version, we connected MS²Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS²Rescore offers a powerful solution to boost peptide identifications.
MS²Rescore\u27s modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect MS²Rescore to be a valuable tool for the wider proteomics community. MS²Rescore is available at https://github.com/compomics/ms2rescore