129 research outputs found

    A Hidden Markov Model for identifying essential and growth-defect regions in bacterial genomes from transposon insertion sequencing data

    Get PDF
    BACKGROUND: Knowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential’ or 'non-essential’ categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions. RESULTS: We evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted. CONCLUSIONS: A 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth

    Bayesian Analysis of Transposon Mutagenesis Data

    Get PDF
    Determining which genes are essential for growth of a bacterial organism is an important question to answer as it is useful for the discovery of drugs that inhibit critical biological functions of a pathogen. To evaluate essentiality, biologists often use transposon mutagenesis to disrupt genomic regions within an organism, revealing which genes are able to withstand disruption and are therefore not required for growth. The development of next-generation sequencing technology augments transposon mutagenesis by providing high-resolution sequence data that identifies the exact location of transposon insertions in the genome. Although this high-resolution information has already been used to assess essentiality at a genome-wide scale, no formal statistical model has been developed capable of quantifying significance. This thesis presents a formal Bayesian framework for analyzing sequence information obtained from transposon mutagenesis experiments. Our method assesses the statistical significance of gaps in transposon coverage that are indicative of essential regions through a Gumbel distribution, and utilizes a Metropolis-Hastings sampling procedure to obtain posterior estimates of the probability of essentiality for each gene. We apply our method to libraries of M. tuberculosis transposon mutants, to identify genes essential for growth in vitro, and show concordance with previous essentiality results based on hybridization. Furthermore, we show how our method is capable of identifying essential domains within genes, by detecting significant sub-regions of open-reading frames unable to withstand disruption. We show that several genes involved in PG biosynthesis have essential domains

    Statistical analysis of genetic interactions in Tn-Seq data

    Get PDF
    Tn-Seq is an experimental method for probing the functions of genes through construction of complex random transposon insertion libraries and quantification of each mutant\u27s abundance using next-generation sequencing. An important emerging application of Tn-Seq is for identifying genetic interactions, which involves comparing Tn mutant libraries generated in different genetic backgrounds (e.g. wild-type strain versus knockout strain). Several analytical methods have been proposed for analyzing Tn-Seq data to identify genetic interactions, including estimating relative fitness ratios and fitting a generalized linear model. However, these have limitations which necessitate an improved approach. We present a hierarchical Bayesian method for identifying genetic interactions through quantifying the statistical significance of changes in enrichment. The analysis involves a four-way comparison of insertion counts across datasets to identify transposon mutants that differentially affect bacterial fitness depending on genetic background. Our approach was applied to Tn-Seq libraries made in isogenic strains of Mycobacterium tuberculosis lacking three different genes of unknown function previously shown to be necessary for optimal fitness during infection. By analyzing the libraries subjected to selection in mice, we were able to distinguish several distinct classes of genetic interactions for each target gene that shed light on their functions and roles during infection

    Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

    Get PDF
    BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model

    TRANSIT - A Software Tool for Himar1 TnSeq Analysis

    Get PDF
    TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit

    The Mycobacterium tuberculosis transposon sequencing database (MtbTnDB): a large-scale guide to genetic conditional essentiality [preprint]

    Get PDF
    Characterization of gene essentiality across different conditions is a useful approach for predicting gene function. Transposon sequencing (TnSeq) is a powerful means of generating genome-wide profiles of essentiality and has been used extensively in Mycobacterium tuberculosis (Mtb) genetic research. Over the past two decades, dozens of TnSeq screens have been published, yielding valuable insights into the biology of Mtb in vitro, inside macrophages, and in model host organisms. However, these Mtb TnSeq profiles are distributed across dozens of research papers within supplementary materials, which makes querying them cumbersome and assembling a complete and consistent synthesis of existing data challenging. Here, we address this problem by building a central repository of publicly available TnSeq screens performed in M. tuberculosis, which we call the Mtb transposon sequencing database (MtbTnDB). The MtbTnDB encompasses 64 published and unpublished TnSeq screens, and is standardized, open-access, and allows users easy access to data, visualizations, and functional predictions through an interactive web-app (www.mtbtndb.app). We also present evidence that (i) genes in the same genomic neighborhood tend to have similar TnSeq profiles, and (ii) clusters of genes with similar TnSeq profiles tend to be enriched for genes belonging to the same functional categories. Finally, we test and evaluate machine learning models trained on TnSeq profiles to guide functional annotation of orphan genes in Mtb. In addition to facilitating the exploration of conditional genetic essentiality in this important human pathogen via a centralized TnSeq data repository, the MtbTnDB will enable hypothesis generation and the extraction of meaningful patterns by facilitating the comparison of datasets across conditions. This will provide a basis for insights into the functional organization of Mtb genes as well as gene function prediction

    Cataclysmic Variables in the First Year of the Zwicky Transient Facility

    Get PDF
    Using selection criteria based on amplitude, time, and color, we have identified 329 objects as known or candidate cataclysmic variables (CVs) during the first year of testing and operation of the Zwicky Transient Facility. Of these, 90 are previously confirmed CVs, 218 are strong candidates based on the shape and color of their light curves obtained during 3–562 days of observation, and the remaining 21 are possible CVs but with too few data points to be listed as good candidates. Almost half of the strong candidates are within 10 deg of the galactic plane, in contrast to most other large surveys that have avoided crowded fields. The available Gaia parallaxes are consistent with sampling the low mass transfer CVs, as predicted by population models. Our follow-up spectra have confirmed Balmer/helium emission lines in 27 objects, with four showing high-excitation He ii emission, including candidates for an AM CVn, a polar, and an intermediate polar. Our results demonstrate that a complete survey of the Galactic plane is needed to accomplish an accurate determination of the number of CVs existing in the Milky Way

    CRISPR-Cas9 screens in human cells and primary neurons identify modifiers of C9ORF72 dipeptide-repeat-protein toxicity.

    Get PDF
    Hexanucleotide-repeat expansions in the C9ORF72 gene are the most common cause of amyotrophic lateral sclerosis and frontotemporal dementia (c9ALS/FTD). The nucleotide-repeat expansions are translated into dipeptide-repeat (DPR) proteins, which are aggregation prone and may contribute to neurodegeneration. We used the CRISPR-Cas9 system to perform genome-wide gene-knockout screens for suppressors and enhancers of C9ORF72 DPR toxicity in human cells. We validated hits by performing secondary CRISPR-Cas9 screens in primary mouse neurons. We uncovered potent modifiers of DPR toxicity whose gene products function in nucleocytoplasmic transport, the endoplasmic reticulum (ER), proteasome, RNA-processing pathways, and chromatin modification. One modifier, TMX2, modulated the ER-stress signature elicited by C9ORF72 DPRs in neurons and improved survival of human induced motor neurons from patients with C9ORF72 ALS. Together, our results demonstrate the promise of CRISPR-Cas9 screens in defining mechanisms of neurodegenerative diseases

    Abnormal Expression Of Homeobox Genes And Transthyretin In C9Orf72 Expansion Carriers

    Get PDF
    Objective: We performed a genome-wide brain expression study to reveal the underpinnings of diseases linked to a repeat expansion in chromosome 9 open reading frame 72 (C9ORF72). Methods: The genome-wide expression profile was investigated in brain tissue obtained from C9ORF72 expansion carriers (n = 32), patients without this expansion (n = 30), and controls (n = 20). Using quantitative real-time PCR, findings were confirmed in our entire pathologic cohort of expansion carriers (n = 56) as well as nonexpansion carriers (n = 31) and controls (n = 20). Results: Our findings were most profound in the cerebellum, where we identified 40 differentially expressed genes, when comparing expansion carriers to patients without this expansion, including 22 genes that have a homeobox (e.g., HOX genes) and/or are located within the HOX gene cluster (top hit: homeobox A5 [HOXA5]). In addition to the upregulation of multiple homeobox genes that play a vital role in neuronal development, we noticed an upregulation of transthyretin (TTR), an extracellular protein that is thought to be involved in neuroprotection. Pathway analysis aligned with these findings and revealed enrichment for gene ontology processes involved in (anatomic) development (e.g., organ morphogenesis). Additional analyses uncovered that HOXA5 and TTR levels are associated with C9ORF72 variant 2 levels as well as with intron-containing transcript levels, and thus, disease-related changes in those transcripts may have triggered the upregulation of HOXA5 and TTR. Conclusions: In conclusion, our identification of genes involved in developmental processes and neuroprotection sheds light on potential compensatory mechanisms influencing the occurrence, presentation, and/or progression of C9ORF72-related diseases
    • …
    corecore