Search CORE

24 research outputs found

Comparison of normalization methods for the analysis of metagenomic gene abundance data

Author: Buongermino Pereira Mariana
Jonsson Viktor
Kristiansson Erik
Wallroth Mikael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: In shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated. Results: Here, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance. Conclusions: This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation

Directory of Open Access Journals

Chalmers Research

A comprehensive survey of integron-associated genes present in metagenomes

Author: 6sterlund Tobias
Axelson-Fisk Marina
Backhaus Thomas
Buongermino Pereira Mariana
Eriksson Martin
Kristiansson Erik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Background: Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. Previous studies have started to characterize the wide range of biological functions present in integrons. However, the efforts have so far mainly been limited to genomes from cultivable bacteria and amplicons generated by PCR, thus targeting only a small part of the total integron diversity. Metagenomic data, generated by direct sequencing of environmental and clinical samples, provides a more holistic and unbiased analysis of integron-associated genes. However, the fragmented nature of metagenomic data has previously made such analysis highly challenging. Results: Here, we present a systematic survey of integron-associated genes in metagenomic data. The analysis was based on a newly developed computational method where integron-associated genes were identified by detecting their associated recombination sites. By processing contiguous sequences assembled from more than 10 terabases of metagenomic data, we were able to identify 13,397 unique integron-associated genes. Metagenomes from marine microbial communities had the highest occurrence of integron-associated genes with levels more than 100-fold higher than in the human microbiome. The identified genes had a large functional diversity spanning over several functional classes. Genes associated with defense mechanisms and mobility facilitators were most overrepresented and more than five times as common in integrons compared to other bacterial genes. As many as two thirds of the genes were found to encode proteins of unknown function. Less than 1% of the genes were associated with antibiotic resistance, of which several were novel, previously undescribed, resistance gene variants. Conclusions: Our results highlight the large functional diversity maintained by integrons present in unculturable bacteria and significantly expands the number of described integron-associated genes

Chalmers Research

Prognosis and oncogenomic profiling of patients with tropomyosin receptor kinase fusion cancer in the 100,000 genomes project

Author: Bazhenova Lyudmila
Bridgewater John
Bruno Amanda
Fellous Marc
Flach Clare
Jiao Xiaolong
Kamburov Atanas
Keating Karen
Parimi Mounika
Pereira Mariana Buongermino
Reeves John A
Schmitz Arndt A
Stratford Jeran
Zong Jihong
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

INTRODUCTION: Neurotrophic tyrosine receptor kinase (NTRK) gene fusions are oncogenic drivers in various tumor types. Limited data exist on the overall survival (OS) of patients with tumors with NTRK gene fusions and on the co-occurrence of NTRK fusions with other oncogenic drivers. MATERIALS AND METHODS: This retrospective study included patients enrolled in the Genomics England 100,000 Genomes Project who had linked clinical data from UK databases. Patients who had undergone tumor whole genome sequencing between March 2016 and July 2019 were included. Patients with and without NTRK fusions were matched. OS was analyzed along with oncogenic alterations in ALK, BRAF, EGFR, ERBB2, KRAS, and ROS1, and tumor mutation burden (TMB) and microsatellite instability (MSI). RESULTS: Of 15,223 patients analyzed, 38 (0.25%) had NTRK gene fusions in 11 tumor types, the most common were breast cancer, colorectal cancer (CRC), and sarcoma. Median OS was not reached in both the NTRK gene fusion-positive and -negative groups (hazard ratio 1.47, 95% CI 0.39-5.57, P = 0.572). A KRAS mutation was identified in two (5%) patients with NTRK gene fusions, and both had hepatobiliary cancer. High TMB and MSI were both more common in patients with NTRK gene fusions, due to the CRC subset. While there was a higher risk of death in patients with NTRK gene fusions compared to those without, the difference was not statistically significant. CONCLUSION: This study supports the hypothesis that NTRK gene fusions are primary oncogenic drivers and the co-occurrence of NTRK gene fusions with other oncogenic alterations is rare

UCL Discovery

Recommended from our members

Spectrum of mutational signatures in T-cell lymphoma reveals a key role for UV radiation in cutaneous T-cell lymphoma

Author: Amarante Tauanne D.
Ambrose John C.
Arumugam Prabhu
Baple Emma L.
Bleda Marta
Boardman-Pretty Freya
Boissiere Jeanne M.
Boustred Christopher R.
Brittain Helen
Caulfield Mark J.
Chan Georgia C.
Craig Clare E. H.
Daugherty Louise C.
de Burca Anna
Degasperi Andrea
Devereau Andrew
Elgar Greg
Foulger Rebecca E.
Fowler Tom
Furió-Tarí Pedro
Giess Adam
Grandi Vieri
Hackett Joanne M.
Halai Dina
Hamblin Angela
Henderson Shirley
Holman James E.
Hubbard Tim J. P.
Ibáñez Kristina
Jackson Rob
Jones Christine L.
Jones Louise J.
Kasperaviciute Dalia
Kayikci Melis
Kousathanas Athanasios
Lahnstein Lea
Lawson Kay
Leigh Sarah E. A.
Leong Ivonne U. S.
Lopez Javier F.
Maleady-Crowe Fiona
Mason Joanne
McDonagh Ellen M.
Mitchell Tracey J.
Moutsianas Loukas
Mueller Michael
Murugaesu Nirupa
Need Anna C.
Nik-Zainal Serena
Odhams Chris A.
Orioli Andrea
O’Donovan Peter
Patch Christine
Pereira Mariana Buongermino
Perez-Gil Daniel
Polychronopoulos Dimitris
Pullinger John
Rahim Tahrima
Rendon Augusto
Riesgo-Ferreiro Pablo
Rogers Tim
Ryten Mina
Savage Kevin
Sawant Kushmita
Scott Richard H.
Siddiq Afshan
Sieghart Alexander
Smedley Damian
Smith Katherine R.
Smith Samuel C.
Sosinsky Alona
Spooner William
Stevens Helen E.
Stuckey Alexander
Sultana Razvan
Tanguy Mélanie
Thomas Ellen R. A.
Thompson Simon R.
Tregidgo Carolyn
Tucci Arianna
Walsh Emma
Watters Sarah A.
Welland Matthew J.
Whittaker Sean J.
Williams Eleanor
Witkowska Katarzyna
Wood Suzanne M.
Zarowiecki Magdalena
Publication venue: Scientific Reports
Publication date: 17/02/2021
Field of study

Funder: Galderma; doi: http://dx.doi.org/10.13039/501100009754Funder: NIHR-BRC Cambridge core grantFunder: National Institute for Health Research; doi: http://dx.doi.org/10.13039/501100000272Funder: NHS EnglandAbstract: T-cell non-Hodgkin’s lymphomas develop following transformation of tissue resident T-cells. We performed a meta-analysis of whole exome sequencing data from 403 patients with eight subtypes of T-cell non-Hodgkin’s lymphoma to identify mutational signatures and associated recurrent gene mutations. Signature 1, indicative of age-related deamination, was prevalent across all T-cell lymphomas, reflecting the derivation of these malignancies from memory T-cells. Adult T-cell leukemia-lymphoma was specifically associated with signature 17, which was found to correlate with the IRF4 K59R mutation that is exclusive to Adult T-cell leukemia-lymphoma. Signature 7, implicating UV exposure was uniquely identified in cutaneous T-cell lymphoma (CTCL), contributing 52% of the mutational burden in mycosis fungoides and 23% in Sezary syndrome. Importantly this UV signature was observed in CD4 + T-cells isolated from the blood of Sezary syndrome patients suggesting extensive re-circulation of these T-cells through skin and blood. Analysis of non-Hodgkin’s T-cell lymphoma cases submitted to the national 100,000 WGS project confirmed that signature 7 was only identified in CTCL strongly implicating UV radiation in the pathogenesis of cutaneous T-cell lymphoma

Apollo (Cambridge)

Modeling of bacterial DNA patterns important in horizontal gene transfer using stochastic grammars

Author: Buongermino Pereira Mariana
Publication venue
Publication date: 01/01/2015
Field of study

DNA contains genes which carry the blueprints for all processes necessary to maintain life. In addition to genes, DNA also contains a wide range of functional patterns, which governs many of these processes. These functional patterns have typically a high variability, both within and between species, which makes them hard to detect. Stochastic models, such as hidden Markov models and conditional random fields, offer flexible frameworks that can be used to describe these patterns, their variability and dependencies. In this thesis, we describe two such models for identification of attC sites, patterns necessary for the sharing of genes between bacteria, in a process known as horizontal gene transfer. Acquired genes causing bacteria to become resistant to antibiotics are often associated with attC sites, which make their identification highly relevant. In the first paper we develop a stochastic regular grammar defined by an eight-state generalized hidden Markov model that describes the sequence conservation and length distribution of the different parts of an attC site. The different model assumptions were evaluated and improved using cross-validation experiments, which resulted in a high sensitivity in detecting attC sites. The model was applied to a real dataset in the form of a well-studied plasmid and was able to find the majority of the present attC sites. In addition, six metagenomic samples from polluted and pristine environments were analysed. The model predicted a 15-fold higher abundance of attC sites in the polluted environments compared to the pristine ones. The model implementation, HattCI, was done in R and is freely available at http://bioinformatics.math.chalmers.se/HattCI. AttC sites fold into a three-dimensional structure that is crucial for the horizontal transfer of genes. In the second paper, we extend our previous model to include specific information about this folding. We develop a stochastic context-free grammar, which is suited to describe the nested dependencies induced by the structure. The grammar includes features that describe thermodynamic properties of the folding. The model is formulated in the framework of conditional random fields, with parameter estimation done numerically using structured support vector machines. A first implementation of the model has been completed; further experiments, such as evaluation of the performance using cross-validation is planned. This thesis demonstrates the flexibility of stochastic grammars for modelling the variability and dependencies in DNA patterns. It also emphasizes the value of the use of stochastic methods in the field of microbiology and infectious diseases

Chalmers Research

Modeling of bacterial DNA patterns important in horizontal gene transfer using stochastic grammars

Author: Buongermino Pereira Mariana
Publication venue
Publication date: 01/01/2015
Field of study

Chalmers Research

Chalmers Publication Library

Statistical modelling and analyses of DNA sequence data with applications to metagenomics

Author: Buongermino Pereira Mariana
Publication venue
Publication date: 01/01/2017
Field of study

Microorganisms are organised in complex communities and are ubiquitous in all ecosystems, including natural environments and inside the human gut. Metagenomics, which is the direct sequencing of DNA from a sample, enables studying the collective genomes of the organisms that are there present. However, the resulting data is highly variable, and statistical models are therefore necessary to assure correct biological interpretations.This thesis aims to develop statistical models that provide an increased understanding of metagenomics data. In Paper I, we develop, implement and evaluate HattCI, which is a high-performance generalised hidden Markov model for the identification of integron-associated attC sites in DNA sequence data. In Paper II, we implement HattCI and other bioinformatics tools into a computational method to identify and characterise the biological functions of integron-mediated genes. The method is used to identify 13,397 integron-mediated genes present in metagenomic data. In Paper III, we provide a conceptual overview of the computational and statistical challenges involved in analysing gene abundance data. In Paper IV, we perform a comprehensive evaluation of nine normalisation methods for metagenomic gene abundance data. Our results highlight the importance of using a suitable method to avoid introducing an unacceptably high rate of false positives.The methods presented in this thesis improve the analysis of metagenomic data and thereby the understanding of microbial communities. Specifically, this thesis highlights the importance of statistical modelling in addressing the large variability of high-dimensional biological data and ensuring its sound interpretation

Chalmers Research

Chalmers Publication Library

Computational and Statistical Considerations in the Analysis of Metagenomic Data

Author: Boulund Fredrik
Buongermino Pereira Mariana
Jonsson Viktor
Kristiansson Erik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

In shotgun metagenomics, microbial communities are studied by random DNA fragments sequenced directly from environmental and clinical samples. The resulting data is massive, potentially consisting of billions of sequence reads describing millions of microbial genes. The data interpretation is therefore nontrivial and dependent on dedicated computational and statistical methods. In this chapter we discuss the many challenges associated with the analysis of shotgun metagenomic data. First, we address computational issues related to the quantification of genes in metagenomes. We describe algorithms for efficient sequence comparisons, recommended practices for setting up data workflows and modern high-performance computer resources that can be used to perform the analysis. Next, we outline the statistical aspects, including removal of systematic errors and how to identify differences between microbial communities from different experimental conditions. We conclude by underlining the increasing importance of efficient and reliable computational and statistical solutions in the analysis of large metagenomic datasets

Chalmers Research

HattCI: Fast and Accurate attC site Identification Using Hidden Markov Models.

Author: Axelson-Fisk Marina
Buongermino Pereira Mariana
Kristiansson Erik
Wallroth Mikael
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2016
Field of study

Integrons are genetic elements that facilitate the horizontal gene transfer in bacteria and are known to harbor genes associated with antibiotic resistance. The gene mobility in the integrons is governed by the presence of attC sites, which are 55 to 141-nucleotide-long imperfect inverted repeats. Here we present HattCI, a new method for fast and accurate identification of attC sites in large DNA data sets. The method is based on a generalized hidden Markov model that describes each core component of an attC site individually. Using twofold cross-validation experiments on a manually curated reference data set of 231 attC sites from class 1 and 2 integrons, HattCI showed high sensitivities of up to 91.9% while maintaining satisfactory false-positive rates. When applied to a metagenomic data set of 35 microbial communities from different environments, HattCI found a substantially higher number of attC sites in the samples that are known to contain more horizontally transferred elements. HattCI will significantly increase the ability to identify attC sites and thus integron-mediated genes in genomic and metagenomic data. HattCI is implemented in C and is freely available at http://bioinformatics.math.chalmers.se/HattCI

Crossref

Chalmers Research

Chalmers Publication Library

A novel method to discover fluoroquinolone antibiotic resistance (qnr) genes in fragmented nucleotide sequences

Author: Boulund Fredrik
Johnning Anna
Kristiansson Erik
Larsson DG Joakim
Pereira Mariana Buongermino
Publication venue: BMC
Publication date: 01/01/2012
Field of study

Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered <it>qnr</it> genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, <it>qnr</it> genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of <it>qnr</it> genes in more detail. Results In this paper we describe a new method to identify <it>qnr</it> genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of <it>qnr</it> genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated <it>qnr</it> genes. In addition, several fragments from novel putative <it>qnr</it> genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of <it>qnr</it> genes in nucleotide sequence data. The predicted novel putative <it>qnr</it> genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at <url>http://bioinformatics.math.chalmers.se/qnr/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Chalmers Research