Search CORE

89 research outputs found

Computational Methods for Mass Spectrometry-based Study of Protein-RNA or Protein-DNA Complexes and Quantitative Metaproteomics

Author: Sachsenberg Timo
Publication venue: Universität Tübingen
Publication date: 01/01/2017
Field of study

In the last decade, the use of high-throughput methods has become increasingly popular in various fields of life sciences. Today, a wide range of technologies exist that allow gathering detailed quantitative insights into biological systems. With improved instrumentation and technological advances, a massive growth in data volume from these techniques has been observed. Bioinformatics copes with these heaps of data by providing computational methods that process raw data to extract biological knowledge. Computational mass spectrometry is a research field in bioinformatics that collects and analyzes data from mass-spectrometric high-throughput experiments. In this thesis, we present two new methods as well as a new data format for computational mass spectrometry. The first method applies to a scientific problem from the field of structural biology: to determine spatial interactions between protein and nucleic acids. For this purpose, we develop experimental protocols, programs, and analysis workflows that allow identifying UV-induced cross-links in (ribo-)nucleoprotein complexes from mass spectrometry data. An outstanding feature of our method is the ability to exactly localize amino acids and (ribo-)nucleotides in contact with each other. Applied to data from yeast and human we identify new interaction partners with, to date, unmatched resolution. The second method applies to metaproteomic studies of complex communities of microorganisms. In an unmanageable number, bacteria, simple fungi, or plants populate the most varied habitats. They are found in a high number of symbiotic or parasitic relationships which serve predominantly for the uptake of nutrients. Organisms differ in their biochemical repertoire allowing them to decompose a wide range of substrates. Remarkably, this enables functional groups of soil bacteria to even nourish themselves from environmental toxins. We present a method from the field of metaproteomics, which allows for identification of organisms involved in substrate degradation as well as methods to group them according to their function in the degradation process. To this end, we use substrates labeled with stable isotopes, which are metabolized by the organisms. The isotope abundance in proteins serves as an indicator for the conversion of the substrate. This abundance is automatically determined by our novel computational method and assigned to the individual organisms. The automation of this process reduces the manual work from several months to a few minutes and, thus, enables large study sizes. The third part of this work contributes to the better communication and processing of results from metabolomics and proteomics studies. We present a tabular, standardized, human-readable and machine-processable data format mzTab as a complement to existing data formats. We provide software components that allow processing of the format and demonstrate how the format can be integrated into complex proteomic and metabolomic workflows. The recent acceptance of mzTab by the largest proteomic data repositories represents a significant success. Also, we see an already widespread adoption by academic software developers and the first support by a commercial software vendor. Our novel format facilitates meta-analyses and makes research results from the field of proteomics and metabolomics available to scientists from other research areas

Publikationsserver der Universität Tübingen

LFQ-Based Peptide and Protein Intensity Differential Expression Analysis

Author: Bai Mingze
Dai Chengxin
Deng Jingwen
Perez-Riverol Yasset
Pfeuffer Julianus
Sachsenberg Timo
Publication venue
Publication date: 01/01/2023
Field of study

Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings

Institutional Repository of the Freie Universität Berlin

Tissue-based absolute quantification using large-scale TMT and LFQ experiments

Author: Bai Mingze
Dai Chengxin
Perez-Riverol Yasset
Pfeuffer Julianus
Sachsenberg Timo
Sanchez Aniel
Wang Hong
Publication venue
Publication date: 01/01/2023
Field of study

Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows

Institutional Repository of the Freie Universität Berlin

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides

Author: Audain Enrique
Branca Rui
Lehtiö Janne
Perez-Riverol Yasset
Pfeuffer Julianus
Sachsenberg Timo
Umer Husen M.
Zhu Yafeng
Publication venue
Publication date: 14/12/2021
Field of study

We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified

Institutional Repository of the Freie Universität Berlin

PubMed Central

Differential Enzymatic <sup>16</sup>O/<sup>18</sup>O Labeling for the Detection of Cross-Linked Nucleic Acid-Protein Heteroconjugates

Author: Flett Fiona J
Interthal Heidrun
Kohlbacher Oliver
Mackay C Logan
Sachsenberg Timo
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Cross-linking of nucleic acids to proteins in combination with mass spectrometry permits the precise identification of interacting residues between nucleic acid-protein complexes. However, the mass spectrometric identification and characterization of cross-linked nucleic acid-protein heteroconjugates within a complex sample is challenging. Here we establish a novel enzymatic differential O-16/O-18-labeling approach, which uniquely labels heteroconjugates. We have developed an automated data analysis workflow based on OpenMS for the identification of differentially isotopically labeled heteroconjugates against a complex background. We validated our method using synthetic model DNA oligonucleotide-peptide heteroconjugates, which were subjected to the labeling reaction and analyzed by high-resolution FTICR mass spectrometry

Crossref

Publikationsserver der Universität Tübingen

Edinburgh Research Explorer

MPG.PuRe

Recommended from our members

A computational platform for high-throughput analysis of RNA sequences and modifications by mass spectrometry

Author: Andrews Byron
Garcia Benjamin A.
Kohlbacher Oliver
Kouzarides Tony
Sachsenberg Timo
Santos-Rosa Helena
Wein Samuel
Weisser Hendrik
Publication venue: Nature Communications
Publication date: 01/02/2020
Field of study

Abstract: The field of epitranscriptomics continues to reveal how post-transcriptional modification of RNA affects a wide variety of biological phenomena. A pivotal challenge in this area is the identification of modified RNA residues within their sequence contexts. Mass spectrometry (MS) offers a comprehensive solution by using analogous approaches to shotgun proteomics. However, software support for the analysis of RNA MS data is inadequate at present and does not allow high-throughput processing. Existing software solutions lack the raw performance and statistical grounding to efficiently handle the numerous modifications found on RNA. We present a free and open-source database search engine for RNA MS data, called NucleicAcidSearchEngine (NASE), that addresses these shortcomings. We demonstrate the capability of NASE to reliably identify a wide range of modified RNA sequences in four original datasets of varying complexity. In human tRNA, we characterize over 20 different modification types simultaneously and find many cases of incomplete modification

Apollo (Cambridge)

MPG.PuRe

Recommended from our members

A computational platform for high-throughput analysis of RNA sequences and modifications by mass spectrometry

Author: Andrews Byron
Garcia Benjamin A.
Kohlbacher Oliver
Kouzarides Tony
Sachsenberg Timo
Santos-Rosa Helena
Wein Samuel
Weisser Hendrik
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 16/02/2021
Field of study

Apollo (Cambridge)

Ten Simple Rules for Taking Advantage of Git and GitHub.

Author: Blin Kai
Eglen Stephen J
Flight Robert M
Fufezan Christian
Gatto Laurent
Katz Daniel S
Konovalov Alexander
Leprevost Felipe da Veiga
Perez-Riverol Yasset
Pollard Tom J
Sachsenberg Timo
Ternent Tobias
Uszkoreit Julian
Vizcaíno Juan Antonio
Wang Rui
Publication venue: PLoS Comput Biol
Publication date: 01/01/2016
Field of study

Bioinformatics is a broad discipline in which one common denominator is the need to produce and/or use software that can be applied to biological data in different contexts. To enable and ensure the replicability and traceability of scientific claims, it is essential that the scientific publication, the corresponding datasets, and the data analysis are made publicly available [1,2]. All software used for the analysis should be either carefully documented (e.g., for commercial software) or, better yet, openly shared and directly accessible to others [3,4]. The rise of openly available software and source code alongside concomitant collaborative development is facilitated by the existence of several code repository services such as SourceForge, Bitbucket, GitLab, and GitHub, among others. These resources are also essential for collaborative software projects because they enable the organization and sharing of programming tasks between different remote contributors. Here, we introduce the main features of GitHub, a popular web-based platform that offers a free and integrated environment for hosting the source code, documentation, and project-related web content for open-source projects. GitHub also offers paid plans for private repositories (see Box 1) for individuals and businesses as well as free plans including private repositories for research and educational use.Biotechnology and Biological Sciences Research CouncilThis is the final version of the article. It first appeared from Public Library of Science via https://doi.org/10.1371/journal.pcbi.1004947

Crossref

Directory of Open Access Journals

PubMed Central

University of Kentucky

Apollo (Cambridge)

Online Research Database In Technology

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

BioContainers: An open-source and community-driven framework for software standardization

Author: Alves Aflitos Saulo
Bai Mingze
Barsnes Harald
Gatto Laurent
Griss Johannes
Grüning Björn A.
Jimenez Rafael C.
Leprevost Felipe da Veiga
Moreno Pablo
Nesvizhskii Alexey I.
Perez-Riverol Yasset
Pfeuffer Julianus
Röst Hannes L.
Sachsenberg Timo
Uszkoreit Julian
Vaudel Marc
Vera Alvarez Roberto
Weber Jonas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/11/2019
Field of study

Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/.publishedVersio

University of Bergen