Search CORE

301 research outputs found

Digestiflow: from BCL to FASTQ with ease

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/03/2020
Field of study

Management of raw-sequencing data and its pre-processing (conversion into sequences and demultiplexing) remains a challenging topic for groups running sequencing devices. They face many challenges in such efforts and solutions ranging from manual management of spreadsheets to very complex and customized laboratory information management systems handling much more than just sequencing raw data. In this article, we describe the software package DigestiFlow that focuses on the management of Illumina flow cell sample sheets and raw data. It allows for automated extraction of information from flow cell data and management of sample sheets. Furthermore, it allows for the automated and reproducible conversion of Illumina base calls to sequences and the demultiplexing thereof using bcl2fastq and Picard Tools, followed by quality control report generation. Availability and implementation: The software is available under the MIT license at https://github.com/bihealth/digestiflow-server. The client software components are available via Bioconda

Crossref

MDC Repository

AltamISA: a Python API for ISA-Tab files

Author: Beule D.
Holtgrewe M.
Kirwan J.
Kuhring M.
Nieminen M.
Publication venue: 'The Open Journal'
Publication date: 20/08/2019
Field of study

MDC Repository

SCelVis: Powerful explorative single cell data analysis on the desktop and in the cloud

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Obermayer B.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 24/07/2019
Field of study

Background: Single cell omics technologies present unique opportunities for biomedical and life sciences from lab to clinic, but the high dimensional nature of such data poses challenges for computational analysis and interpretation. Furthermore, FAIR data management as well as data privacy and security become crucial when working with clinical data, especially in cross-institutional and translational settings. Existing solutions are either bound to the desktop of one researcher or come with dependencies on vendor-specific technology for cloud storage or user authentication. Results: To facilitate analysis and interpretation of single-cell data by users without bioinformatics expertise, we present SCelVis, a flexible, interactive and user-friendly app for web-based visualization of pre-processed single-cell data. Users can survey multiple interactive visualizations of their single cell expression data and cell annotation, and download raw or processed data for further offline analysis. SCelVis can be run both on the desktop and cloud systems, accepts input from local and various remote sources using standard and open protocols, and allows for hosting data in the cloud and locally. Methods: SCelVis is implemented in Python using Dash by Plotly. It is available as a standalone application as a Python package, via Conda/Bioconda and as a Docker image. All components are available as open source under the permissive MIT license and are based on open standards and interfaces, enabling further development and integration with third party pipelines and analysis components. The GitHub repository is https://github.com/bihealth/scelvis

MDC Repository

SCelVis: exploratory single cell data analysis on the desktop and in the cloud

Author: Beule D.
Holtgrewe M.
Messerschmidt C.
Nieminen M.
Obermayer B.
Publication venue: 'PeerJ'
Publication date: 19/02/2020
Field of study

BACKGROUND: Single cell omics technologies present unique opportunities for biomedical and life sciences from lab to clinic, but the high dimensional nature of such data poses challenges for computational analysis and interpretation. Furthermore, FAIR data management as well as data privacy and security become crucial when working with clinical data, especially in cross-institutional and translational settings. Existing solutions are either bound to the desktop of one researcher or come with dependencies on vendor-specific technology for cloud storage or user authentication. RESULTS: To facilitate analysis and interpretation of single-cell data by users without bioinformatics expertise, we present SCelVis, a flexible, interactive and user-friendly app for web-based visualization of pre-processed single-cell data. Users can survey multiple interactive visualizations of their single cell expression data and cell annotation, define cell groups by filtering or manual selection and perform differential gene expression, and download raw or processed data for further offline analysis. SCelVis can be run both on the desktop and cloud systems, accepts input from local and various remote sources using standard and open protocols, and allows for hosting data in the cloud and locally. We test and validate our visualization using publicly available scRNA-seq data. METHODS: SCelVis is implemented in Python using Dash by Plotly. It is available as a standalone application as a Python package, via Conda/Bioconda and as a Docker image. All components are available as open source under the permissive MIT license and are based on open standards and interfaces, enabling further development and integration with third party pipelines and analysis components. The GitHub repository is https://github.com/bihealth/scelvis

MDC Repository

SigsPack, a package for cancer mutational signatures

Author: Beule D.
Blanc E.
Blankenstein T.
Busse A.
Messerschmidt C.
Schumann F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/09/2019
Field of study

BACKGROUND: Mutational signatures are specific patterns of somatic mutations introduced into the genome by oncogenic processes. Several mutational signatures have been identified and quantified from multiple cancer studies, and some of them have been linked to known oncogenic processes. Identification of the processes contributing to mutations observed in a sample is potentially informative to understand the cancer etiology. RESULTS: We present here SigsPack, a Bioconductor package to estimate a sample's exposure to mutational processes described by a set of mutational signatures. The package also provides functions to estimate stability of these exposures, using bootstrapping. The performance of exposure and exposure stability estimations have been validated using synthetic and real data. Finally, the package provides tools to normalize the mutation frequencies with respect to the tri-nucleotide contents of the regions probed in the experiment. The importance of this effect is illustrated in an example. CONCLUSION: SigsPack provides a complete set of tools for individual sample exposure estimation, and for mutation catalogue & mutational signatures normalization

MDC Repository

Deep learning-assisted peak curation for large-scale LC-MS metabolomics

Author: Beule D.
Gloaguen Y.
Kirwan J.A.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 29/03/2022
Field of study

Available automated methods for peak detection in untargeted metabolomics suffer from poor precision. We present NeatMS, which uses machine learning based on a convoluted neural network to reduce the number and fraction of false peaks. NeatMS comes with a pre-trained model representing expert knowledge in the differentiation of true chemical signal from noise. Furthermore, it provides all necessary functions to easily train new models or improve existing ones by transfer learning. Thus, the tool improves peak curation and contributes to the robust and scalable analysis of large-scale experiments. We show how to integrate it into different liquid chromatography–mass spectrometry (LC-MS) analysis workflows, quantify its performance, and compare it to various other approaches. NeatMS software is available as open source on github under permissive MIT license and is also provided as easy-to-install PyPi and Bioconda packages

PubMed Central

MDC Repository

Identification and ranking of recurrent neo-epitopes in cancer

Author: Beule D.
Blanc E.
Blankenstein T.
Dhamodaran A.
Holtgrewe M.
Messerschmidt C.
Willimsky G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2019
Field of study

BACKGROUND: Immune escape is one of the hallmarks of cancer and several new treatment approaches attempt to modulate and restore the immune system’s capability to target cancer cells. At the heart of the immune recognition process lies antigen presentation from somatic mutations. These neo-epitopes are emerging as attractive targets for cancer immunotherapy and new strategies for rapid identification of relevant candidates have become a priority. METHOS: We carefully screen TCGA data sets for recurrent somatic amino acid exchanges and apply MHC class I binding predictions. RESULTS: We propose a method for in silico selection and prioritization of candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. While the percentage of patients carrying a specific neo-epitope and HLA-type combination is relatively small, the sheer number of new patients leads to surprisingly high reoccurence numbers. We identify 769 epitopes which are expected to occur in 77629 patients per year. CONCLUSION: While our candidate list will definitely contain false positives, the results provide an objective order for wet-lab testing of reusable neo-epitopes. Thus recurrent neo-epitopes may be suitable to supplement existing personalized T cell treatment approaches with precision treatment options

MDC Repository

Identification and ranking of recurrent neo-epitopes in cancer

Author: Beule D.
Blanc E.
Blankenstein T.
Dhamodaran A.
Holtgrewe M.
Messerschmidt C.
Willimsky G.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 10/08/2018
Field of study

Neo-epitopes are emerging as attractive targets for cancer immunotherapy and new strategies for rapid identification of relevant candidates have become a priority. We propose a method for in silico selection of candidates which have a high potential for neo-antigen generation and are likely to appear in multiple patients. This is achieved by carefully screening 33 TCGA data sets for recurrent somatic amino acid exchanges and, for the 1,055 resulting recurrent variants, applying MHC class I binding prediction algorithms. A preliminary confirmation of epitope binding and recognition by CD8 T cells has been carried out for a couple of candidates in humanized mice. Recurrent neo-epitopes may be suitable to supplement existing personalized T cell treatment approaches with precision treatment options

MDC Repository

Deep learning assisted peak curation for large scale LC-MS Metabolomics

Author: Beule D.
Gloaguen Y.
Kirwan J.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 10/08/2020
Field of study

Available automated methods for peak detection in untargeted metabolomics suffer from poor precision. We present NeatMS which uses machine learning to replace peak curation by human experts. We show how to integrate our open source module into different LC-MS analysis workflows and quantify its performance. NeatMS is designed to be suitable for large scale studies and improves the robustness of the final peak list

MDC Repository