421 research outputs found

    A Hidden Markov Model for identifying essential and growth-defect regions in bacterial genomes from transposon insertion sequencing data

    Get PDF
    BACKGROUND: Knowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential’ or 'non-essential’ categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions. RESULTS: We evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted. CONCLUSIONS: A 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth

    TRANSIT - A Software Tool for Himar1 TnSeq Analysis

    Get PDF
    TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit

    ARTIST: High-Resolution Genome-Wide Assessment of Fitness Using Transposon-Insertion Sequencing

    Get PDF
    Transposon-insertion sequencing (TIS) is a powerful approach for deciphering genetic requirements for bacterial growth in different conditions, as it enables simultaneous genome-wide analysis of the fitness of thousands of mutants. However, current methods for comparative analysis of TIS data do not adjust for stochastic experimental variation between datasets and are limited to interrogation of annotated genomic elements. Here, we present ARTIST, an accessible TIS analysis pipeline for identifying essential regions that are required for growth under optimal conditions as well as conditionally essential loci that participate in survival only under specific conditions. ARTIST uses simulation-based normalization to model and compensate for experimental noise, and thereby enhances the statistical power in conditional TIS analyses. ARTIST also employs a novel adaptation of the hidden Markov model to generate statistically robust, high-resolution, annotation-independent maps of fitness-linked loci across the entire genome. Using ARTIST, we sensitively and comprehensively define Mycobacterium tuberculosis and Vibrio cholerae loci required for host infection while limiting inclusion of false positive loci. ARTIST is applicable to a broad range of organisms and will facilitate TIS-based dissection of pathways required for microbial growth and survival under a multitude of conditions

    In Vivo Gene Essentiality and Metabolism in Bordetella pertussis

    Get PDF
    Bordetella pertussis is the causative agent of whooping cough, a serious respiratory illness affecting children and adults, associated with prolonged cough and potential mortality. Whooping cough has reemerged in recent years, emphasizing a need for increased knowledge of basic mechanisms of B. pertussis growth and pathogenicity. While previous studies have provided insight into in vitro gene essentiality of this organism, very little is known about in vivo gene essentiality, a critical gap in knowledge, since B. pertussis has no previously identified environmental reservoir and is isolated from human respiratory tract samples. We hypothesize that the metabolic capabilities of B. pertussis are especially tailored to the respiratory tract and that many of the genes involved in B. pertussis metabolism would be required to establish infection in vivo. In this study, we generated a diverse library of transposon mutants and then used it to probe gene essentiality in vivo in a murine model of infection. Using the CON-ARTIST pipeline, 117 genes were identified as conditionally essential at 1 day postinfection, and 169 genes were identified as conditionally essential at 3 days postinfection. Most of the identified genes were associated with metabolism, and we utilized two existing genome-scale metabolic network reconstructions to probe the effects of individual essential genes on biomass synthesis. This analysis suggested a critical role for glucose metabolism and lipooligosaccharide biosynthesis in vivo. This is the first genome-wide evaluation of in vivo gene essentiality in B. pertussis and provides tools for future exploration. IMPORTANCE Our study describes the first in vivo transposon sequencing (Tn-seq) analysis of B. pertussis and identifies genes predicted to be essential for in vivo growth in a murine model of intranasal infection, generating key resources for future investigations into B. pertussis pathogenesis and vaccine design

    Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis Genome via Saturating Transposon Mutagenesis

    Get PDF
    For decades, identifying the regions of a bacterial chromosome that are necessary for viability has relied on mapping integration sites in libraries of random transposon mutants to find loci that are unable to sustain insertion. To date, these studies have analyzed subsaturated libraries, necessitating the application of statistical methods to estimate the likelihood that a gap in transposon coverage is the result of biological selection and not the stochasticity of insertion. As a result, the essentiality of many genomic features, particularly small ones, could not be reliably assessed. We sought to overcome this limitation by creating a completely saturated transposon library in Mycobacterium tuberculosis. In assessing the composition of this highly saturated library by deep sequencing, we discovered that a previously unknown sequence bias of the Himar1 element rendered approximately 9% of potential TA dinucleotide insertion sites less permissible for insertion. We used a hidden Markov model of essentiality that accounted for this unanticipated bias, allowing us to confidently evaluate the essentiality of features that contained as few as 2 TA sites, including open reading frames (ORF), experimentally identified noncoding RNAs, methylation sites, and promoters. In addition, several essential regions that did not correspond to known features were identified, suggesting uncharacterized functions that are necessary for growth. This work provides an authoritative catalog of essential regions of the M. tuberculosis genome and a statistical framework for applying saturating mutagenesis to other bacteria

    A Noise Trimming and Positional Significance of Transposon Insertion System to Identify Essential Genes in Yersinia pestis

    Get PDF
    This is the final version of the article. Available from Springer Nature via the DOI in this record.Massively parallel sequencing technology coupled with saturation mutagenesis has provided new and global insights into gene functions and roles. At a simplistic level, the frequency of mutations within genes can indicate the degree of essentiality. However, this approach neglects to take account of the positional significance of mutations - the function of a gene is less likely to be disrupted by a mutation close to the distal ends. Therefore, a systematic bioinformatics approach to improve the reliability of essential gene identification is desirable. We report here a parametric model which introduces a novel mutation feature together with a noise trimming approach to predict the biological significance of Tn5 mutations. We show improved performance of essential gene prediction in the bacterium Yersinia pestis, the causative agent of plague. This method would have broad applicability to other organisms and to the identification of genes which are essential for competitiveness or survival under a broad range of stresses.This work was supported by the Defence Science and Technology Laboratory under contract DSTLX-1000060221 (WP1)

    Statistical Analysis of Transposon Sequencing Data to Determine Essential Genes

    Get PDF
    Transposon Sequencing (TnSeq) has become a popular biological tool for assessing the phenotypes of large libraries of bacterial mutants at the same time. This allows for high-throughput identification of genes which are essential for growth, thus providing valuable information about the function of those genes and the discovery of potential drug targets that could lead to treatments. However, analysis of data obtained from TnSeq is challenging as it requires estimating unknown parameters from data that is often noisy and likely coming from a mixture of different phenotypes. In addition, the classification of essentiality is not known a priori, requiring unsupervised methods capable of identifying key features in the data to confidently determine essentiality. We present several models capable of identifying essential genes while overcoming the difficulties that are present in analyzing TnSeq data. Together, these methods provide ways to analyze TnSeq data in one or multiple conditions, confined within gene boundaries or across the entire genome, and while reducing the impact of noise and outliers that are often present in this type of data

    Analyzing TnSeq Data to Predict Insertion Counts in M. tuberculosis

    Get PDF
    TnSeq is a genetic method used to evaluate the essentiality of genes in bacteria, such as Mycobacterium tuberculosis. It uses random insertions by the Himar1 transposon and high throughput sequencing to determine the most essential genes. The Himar1 transposon only inserts at TA dinucleotide sites in the genome, and it was thought that the surrounding sequence did not affect its insertion preferences. However, recent studies have shown that the sequence surrounding the TA site does affect how likely Himar1 is to insert there. Our goal was to determine whether a model that predicts the insertion count of a TA site in the M. tuberculosis given its surrounding nucleotide sequence could be created. To do this machine learning algorithms, including artificial neural networks and naïve bayes classifiers were tuned and tested to make the most accurate predictions. Also, the input and output encodings were adjusted, and supplemental information was added to increase the accuracy of the predictions. In the end, by considering the relative difference between the mean insertion counts of each TA site and the expected counts of surrounding TA sites in addition to the surrounding sequence itself, we were able to use simple linear regression to create a model that has predictive power. We achieved an R^2 value of 0.28, and the scatter plot of the predicted and actual insertion counts showed a linear trend. Our model used the novel approach of considering the context of the surrounding TA sites to generate a more accurate prediction. The model can help scientists better interpret the results of TnSeq experiments. This bioinformatic analysis can help us learn more about bacterial evolution and could help us find essential genes to target when developing drugs to treat tuberculosis
    • …
    corecore