184 research outputs found

    A Hidden Markov Model for identifying essential and growth-defect regions in bacterial genomes from transposon insertion sequencing data

    Get PDF
    BACKGROUND: Knowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential’ or 'non-essential’ categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions. RESULTS: We evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted. CONCLUSIONS: A 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth

    Bayesian Analysis of Transposon Mutagenesis Data

    Get PDF
    Determining which genes are essential for growth of a bacterial organism is an important question to answer as it is useful for the discovery of drugs that inhibit critical biological functions of a pathogen. To evaluate essentiality, biologists often use transposon mutagenesis to disrupt genomic regions within an organism, revealing which genes are able to withstand disruption and are therefore not required for growth. The development of next-generation sequencing technology augments transposon mutagenesis by providing high-resolution sequence data that identifies the exact location of transposon insertions in the genome. Although this high-resolution information has already been used to assess essentiality at a genome-wide scale, no formal statistical model has been developed capable of quantifying significance. This thesis presents a formal Bayesian framework for analyzing sequence information obtained from transposon mutagenesis experiments. Our method assesses the statistical significance of gaps in transposon coverage that are indicative of essential regions through a Gumbel distribution, and utilizes a Metropolis-Hastings sampling procedure to obtain posterior estimates of the probability of essentiality for each gene. We apply our method to libraries of M. tuberculosis transposon mutants, to identify genes essential for growth in vitro, and show concordance with previous essentiality results based on hybridization. Furthermore, we show how our method is capable of identifying essential domains within genes, by detecting significant sub-regions of open-reading frames unable to withstand disruption. We show that several genes involved in PG biosynthesis have essential domains

    Chitosanase-based method for RNA isolation from cells transfected with chitosan/siRNA nanocomplexes for real-time RT-PCR in gene silencing

    Get PDF
    Chitosan, a well known natural cationic polysaccharide, has been successfully implemented in vitro and in vivo as a nonviral delivery system for both plasmid DNA and siRNA. While using chitosan/siRNA polyplexes to knock down specific targets, we have underestimated the effect of nucleic acids binding to chitosan when extracting RNA for subsequent quantitative PCR evaluation of silencing. In vitro transfection using chitosan/siRNA-based polyplexes reveals a very poor recovery of total RNA especially when using low cell numbers in 96 well plates. Here, we describe a method that dramatically enhances RNA extraction from chitosan/siRNA-treated cells by using an enzymatic treatment with a type III chitosanase. We show that chitosanase treatment prior to RNA extraction greatly enhances the yield and the integrity of extracted RNA. This method will therefore eliminate the bias associated with lower RNA yield and integrity when quantifying gene silencing of chitosan-based systems using quantitative real time PCR

    Low molecular weight chitosan nanoparticulate system at low N:P ratio for nontoxic polynucleotide delivery

    Get PDF
    Chitosan, a natural polymer, is a promising system for the therapeutic delivery of both plasmid DNA and synthetic small interfering RNA. Reports attempting to identify the optimal parameters of chitosan for synthetic small interfering RNA delivery were inconclusive with high molecular weight at high amine-to-phosphate (N:P) ratios apparently required for efficient transfection. Here we show, for the first time, that low molecular weight chitosan (LMW-CS) formulations at low N:P ratios are suitable for the in vitro delivery of small interfering RNA. LMW-CS nanoparticles at low N:P ratios were positively charged (ζ-potential ~20 mV) with an average size below 100 nm as demonstrated by dynamic light scattering and environmental scanning electron microscopy, respectively. Nanoparticles were spherical, a shape promoting decreased cytotoxicity and enhanced cellular uptake. Nanoparticle stability was effective for at least 20 hours at N:P ratios above two in a slightly acidic pH of 6.5. At a higher basic pH of 8, these nanoparticles were unravelled due to chitosan neutralization, exposing their polynucleotide cargo. Cellular uptake ranged from 50% to 95% in six different cell lines as measured by cytometry. Increasing chitosan molecular weight improved nanoparticle stability as well as the ability of nanoparticles to protect the oligonucleotide cargo from nucleases at supraphysiological concentrations. The highest knockdown efficiency was obtained with the specific formulation 92-10-5 that combines sufficient nuclease protection with effective intracellular release. This system attained >70% knockdown of the messenger RNA, similar to commercially available lipoplexes, without apparent cytotoxicity. Contrary to previous reports, our data demonstrate that LMW-CS at low N:P ratios are efficient and nontoxic polynucleotide delivery systems capable of transfecting a plethora of cell lines

    Statistical analysis of genetic interactions in Tn-Seq data

    Get PDF
    Tn-Seq is an experimental method for probing the functions of genes through construction of complex random transposon insertion libraries and quantification of each mutant\u27s abundance using next-generation sequencing. An important emerging application of Tn-Seq is for identifying genetic interactions, which involves comparing Tn mutant libraries generated in different genetic backgrounds (e.g. wild-type strain versus knockout strain). Several analytical methods have been proposed for analyzing Tn-Seq data to identify genetic interactions, including estimating relative fitness ratios and fitting a generalized linear model. However, these have limitations which necessitate an improved approach. We present a hierarchical Bayesian method for identifying genetic interactions through quantifying the statistical significance of changes in enrichment. The analysis involves a four-way comparison of insertion counts across datasets to identify transposon mutants that differentially affect bacterial fitness depending on genetic background. Our approach was applied to Tn-Seq libraries made in isogenic strains of Mycobacterium tuberculosis lacking three different genes of unknown function previously shown to be necessary for optimal fitness during infection. By analyzing the libraries subjected to selection in mice, we were able to distinguish several distinct classes of genetic interactions for each target gene that shed light on their functions and roles during infection

    Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

    Get PDF
    BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model

    The Mycobacterium tuberculosis transposon sequencing database (MtbTnDB): a large-scale guide to genetic conditional essentiality [preprint]

    Get PDF
    Characterization of gene essentiality across different conditions is a useful approach for predicting gene function. Transposon sequencing (TnSeq) is a powerful means of generating genome-wide profiles of essentiality and has been used extensively in Mycobacterium tuberculosis (Mtb) genetic research. Over the past two decades, dozens of TnSeq screens have been published, yielding valuable insights into the biology of Mtb in vitro, inside macrophages, and in model host organisms. However, these Mtb TnSeq profiles are distributed across dozens of research papers within supplementary materials, which makes querying them cumbersome and assembling a complete and consistent synthesis of existing data challenging. Here, we address this problem by building a central repository of publicly available TnSeq screens performed in M. tuberculosis, which we call the Mtb transposon sequencing database (MtbTnDB). The MtbTnDB encompasses 64 published and unpublished TnSeq screens, and is standardized, open-access, and allows users easy access to data, visualizations, and functional predictions through an interactive web-app (www.mtbtndb.app). We also present evidence that (i) genes in the same genomic neighborhood tend to have similar TnSeq profiles, and (ii) clusters of genes with similar TnSeq profiles tend to be enriched for genes belonging to the same functional categories. Finally, we test and evaluate machine learning models trained on TnSeq profiles to guide functional annotation of orphan genes in Mtb. In addition to facilitating the exploration of conditional genetic essentiality in this important human pathogen via a centralized TnSeq data repository, the MtbTnDB will enable hypothesis generation and the extraction of meaningful patterns by facilitating the comparison of datasets across conditions. This will provide a basis for insights into the functional organization of Mtb genes as well as gene function prediction

    TRANSIT - A Software Tool for Himar1 TnSeq Analysis

    Get PDF
    TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit

    Cataclysmic Variables in the First Year of the Zwicky Transient Facility

    Get PDF
    Using selection criteria based on amplitude, time, and color, we have identified 329 objects as known or candidate cataclysmic variables (CVs) during the first year of testing and operation of the Zwicky Transient Facility. Of these, 90 are previously confirmed CVs, 218 are strong candidates based on the shape and color of their light curves obtained during 3–562 days of observation, and the remaining 21 are possible CVs but with too few data points to be listed as good candidates. Almost half of the strong candidates are within 10 deg of the galactic plane, in contrast to most other large surveys that have avoided crowded fields. The available Gaia parallaxes are consistent with sampling the low mass transfer CVs, as predicted by population models. Our follow-up spectra have confirmed Balmer/helium emission lines in 27 objects, with four showing high-excitation He ii emission, including candidates for an AM CVn, a polar, and an intermediate polar. Our results demonstrate that a complete survey of the Galactic plane is needed to accomplish an accurate determination of the number of CVs existing in the Milky Way
    corecore