20 research outputs found

    Finding approximate gene clusters with GECKO 3

    Get PDF
    Winter S, Jahn K, Wehner S, et al. Finding approximate gene clusters with GECKO 3. Nucleic Acids Research. 2016;44(20):9600-9610.Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min

    HLA Ligand Atlas: a benign reference of HLA-presented peptides to improve T-cell-based cancer immunotherapy

    Full text link
    BACKGROUND The human leucocyte antigen (HLA) complex controls adaptive immunity by presenting defined fractions of the intracellular and extracellular protein content to immune cells. Understanding the benign HLA ligand repertoire is a prerequisite to define safe T-cell-based immunotherapies against cancer. Due to the poor availability of benign tissues, if available, normal tissue adjacent to the tumor has been used as a benign surrogate when defining tumor-associated antigens. However, this comparison has proven to be insufficient and even resulted in lethal outcomes. In order to match the tumor immunopeptidome with an equivalent counterpart, we created the HLA Ligand Atlas, the first extensive collection of paired HLA-I and HLA-II immunopeptidomes from 227 benign human tissue samples. This dataset facilitates a balanced comparison between tumor and benign tissues on HLA ligand level. METHODS Human tissue samples were obtained from 16 subjects at autopsy, five thymus samples and two ovary samples originating from living donors. HLA ligands were isolated via immunoaffinity purification and analyzed in over 1200 liquid chromatography mass spectrometry runs. Experimentally and computationally reproducible protocols were employed for data acquisition and processing. RESULTS The initial release covers 51 HLA-I and 86 HLA-II allotypes presenting 90,428 HLA-I- and 142,625 HLA-II ligands. The HLA allotypes are representative for the world population. We observe that immunopeptidomes differ considerably between tissues and individuals on source protein and HLA-ligand level. Moreover, we discover 1407 HLA-I ligands from non-canonical genomic regions. Such peptides were previously described in tumors, peripheral blood mononuclear cells (PBMCs), healthy lung tissues and cell lines. In a case study in glioblastoma, we show that potential on-target off-tumor adverse events in immunotherapy can be avoided by comparing tumor immunopeptidomes to the provided multi-tissue reference. CONCLUSION Given that T-cell-based immunotherapies, such as CAR-T cells, affinity-enhanced T cell transfer, cancer vaccines and immune checkpoint inhibition, have significant side effects, the HLA Ligand Atlas is the first step toward defining tumor-associated targets with an improved safety profile. The resource provides insights into basic and applied immune-associated questions in the context of cancer immunotherapy, infection, transplantation, allergy and autoimmunity. It is publicly available and can be browsed in an easy-to-use web interface at https://hla-ligand-atlas.org

    Analyse von mittels Hochdurchsatzsequenzierung erfassten Antigen-Rezeptor Repertoires

    No full text
    In vertebrate species, the main mechanisms of defence against various types of pathogens are divided into the innate and the adaptive immune system. While the former relies on generic mechanisms, for example to detect the presence of bacterial cells, the latter features mechanisms that allow the individual to acquire defenses against specific, potentially novel features of pathogens and to maintain them throughout life. In a simplified sense, the adaptive immune system continuously generates new defenses against all kinds of structures randomly, carefully selecting them not to be reactive against the hosts own cells. The underlying generative mechanism is a unique somatic recombination process modifying the genes encoding the proteins responsible for the recognition of such foreign structures, the so-called antigen receptors. With the advances of high throughput DNA sequencing, we have gained the ability to capture the repertoire of different antigen receptor genes that an individual has acquired by selectively sequencing the recombined loci from a cell sample. This enables us to examine and explore the development and behaviour of the adaptive immune system in a new way, with a variety of potential medical applications. The main focus of this thesis is on two computational problems related to immune repertoire sequencing. Firstly, we developed a method to properly annotate the raw sequencing data that is generated in such experiments, taking into account various sources of biases and errors that either generally occur in the context of DNA sequencing or are specific for immune repertoire sequencing experiments. We will describe the algorithmic details of this method and then demonstrate its superiority in comparison with previously published methods on various datasets. Secondly, we developed a machine learning based workflow to interpret this data in the sense that we attempted to classify such recombined genes functionally using a previously trained model. We implemented alternative models within this workflow, which we will first describe formally and then assess their performances on real data in the context of a binary functional feature in T cells, namely whether they have differentiated into cytotoxic or helper T cells.Die Fähigkeit von Wirbeltieren, Pathogene abzuwehren, basiert auf einer Reihe von Mechanismen, die sich in zwei Bereiche unterteilen lassen: Das adaptive und das angeborene Immunsystem. Während angeborene Immunität auf generischen Mechanismen beruht, welche z.B. das Vorhandensein von Bakterienzellen anhand von allgemeinen Parametern erkennen, sind die adaptiven Mechanismen in der Lage, neue Wege zu erlernen, bisher unbekannte Pathogene zu erkennen und zu bekämpfen. Vereinfacht gesagt werden immer neue Strategien auf zufällige Weise generiert, wobei das einzige Kriterium ist, dass sie nicht gegen den Wirtsorganismus selbst reaktiv sind. Der dem adaptiven Charakter zugrundeliegende Prozess ist eine einzigartige, somatische Rekombination der Gene, welche für die Proteine kodieren, die diese pathogenen Strukturen erkennen: die Antigen-Rezeptoren. Durch die mittlerweile verfügbaren Hochdurchsatz-DNA-Sequenziermethoden ist es uns heute möglich, das Repertoire an Antigen-Rezeptor Genen, welches ein Individuum im Laufe der Zeit gebildet hat, ausgehend von einer Zell-Probe sichtbar zu machen (Immun-Repertoire- Sequenzierung). Dies ermöglicht uns, das adaptive Immunsystem auf eine neue Art und Weise zu untersuchen, woraus sich eine Reihe möglicher medizinischer Anwendungen ergeben. Im Kontext der Immun-Repertoire-Sequenzierung wurde im Rahmen dieser Arbeit zunächst eine Methode entwickelt, um die Rohdaten, die bei dieser Methode anfallen möglichst fehlerfrei zu annotieren. Hierbei wurde ein besonderes Augenmerk auf die verschiedenen technischen Fehlerquellen gelegt, sowohl auf solche, die allgemein im Kontext von DNA-Sequenzierung auftreten, als auch auf solche, die spezifisch für die Immun-Repertoire- Sequenzierung sind. Die Methode wird in dieser Arbeit zunächst inhaltlich beschrieben, bevor anschließend im Rahmen einer Evaluation ihre Überlegenheit im Vergleich zu zuvor veröffentlichten Methoden dargestellt wird. Des Weiteren wurde ein auf maschinellem Lernen basierter Workflow entworfen, um die annotierten Daten zu interpretieren. Ziel hierbei ist es, unter Verwendung eines zuvor trainierten Modells eine gemessene Gensequenz funktional zu klassifizieren. Innerhalb des Workflows wurden verschiedene Modelle implementiert, welche in dieser Arbeit zunächst formal beschrieben werden. Anhand von realen Daten aus dem Kontext eines binären Merkmals von T-Zellen, der erfolgten Differenzierung in T-Helferzellen und zytotoxische T-Zellen, werden anschließend die Fähigkeiten der Modelle, korrekte Klassifikationen vorzunehmen, evaluiert

    Analysis of Antigen Receptor Repertoires Captured by High Throughput Sequencing

    No full text
    In vertebrate species, the main mechanisms of defence against various types of pathogens are divided into the innate and the adaptive immune system. While the former relies on generic mechanisms, for example to detect the presence of bacterial cells, the latter features mechanisms that allow the individual to acquire defenses against specific, potentially novel features of pathogens and to maintain them throughout life. In a simplified sense, the adaptive immune system continuously generates new defenses against all kinds of structures randomly, carefully selecting them not to be reactive against the hosts own cells. The underlying generative mechanism is a unique somatic recombination process modifying the genes encoding the proteins responsible for the recognition of such foreign structures, the so-called antigen receptors. With the advances of high throughput DNA sequencing, we have gained the ability to capture the repertoire of different antigen receptor genes that an individual has acquired by selectively sequencing the recombined loci from a cell sample. This enables us to examine and explore the development and behaviour of the adaptive immune system in a new way, with a variety of potential medical applications. The main focus of this thesis is on two computational problems related to immune repertoire sequencing. Firstly, we developed a method to properly annotate the raw sequencing data that is generated in such experiments, taking into account various sources of biases and errors that either generally occur in the context of DNA sequencing or are specific for immune repertoire sequencing experiments. We will describe the algorithmic details of this method and then demonstrate its superiority in comparison with previously published methods on various datasets. Secondly, we developed a machine learning based workflow to interpret this data in the sense that we attempted to classify such recombined genes functionally using a previously trained model. We implemented alternative models within this workflow, which we will first describe formally and then assess their performances on real data in the context of a binary functional feature in T cells, namely whether they have differentiated into cytotoxic or helper T cells

    DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics

    Get PDF
    Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics, a multifunctional, automated, high-throughput pipeline implemented in the Nextflow workflow management system that allows one to easily process proteomics and peptidomics DIA data sets on diverse compute infrastructures. The central components are well-established tools such as the OpenSwathWorkflow for the DIA spectral library search and PyProphet for the false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and to carry out the retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from the statistical postprocessing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is well documented open-source software and is available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/

    Comment on "Tracking donor-reactive T cells: Evidence for clonal deletion in tolerant kidney transplant patients"

    Full text link
    Difficulties in tracking of bona fide alloreactive clones may limit understanding of the mechanisms of spontaneous tolerance.</jats:p

    MHCquant: Automated and Reproducible Data Analysis for Immunopeptidomics

    No full text
    Personalized multipeptide vaccines are currently being discussed intensively for tumor immunotherapy. In order to identify epitopes-short, immunogenic peptides-suitable for eliciting a tumor-specific immune response, human leukocyte antigen-presented peptides are isolated by immunoaffinity purification from cancer tissue samples and analyzed by liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS). Here, we present MHCquant, a fully automated, portable computational pipeline able to process LC-MS/MS data automatically and generate annotated, false discovery rate-controlled lists of (neo-)epitopes with associated relative quantification information. We could show that MHCquant achieves higher sensitivity than established methods. While obtaining the highest number of unique peptides, the rate of predicted MHC binders remains still comparable to other tools. Reprocessing of the data from a previously published study resulted in the identification of several neoepitopes not detected by previously applied methods. MHCquant integrates tailor-made pipeline components with existing open-source software into a coherent processing workflow. Container-based virtualization permits execution of this workflow without complex software installation, execution on cluster/cloud infrastructures, and full reproducibility of the results. Integration with the data analysis workbench KNIME enables easy mining of large-scale immunopeptidomics data sets. MHCquant is available as open-source software along with accompanying documentation on our website at https://www.openms.de/mhcquant/
    corecore