90 research outputs found

    Design of Machine Learning Models for the Prediction of Transcription Factor Binding Regions in Bacterial DNA

    Get PDF
    Presented at the 4th XoveTIC Conference, A CoruΓ±a, Spain, 7–8 October 2021.[Abstract] Transcription Factors (TFs) are proteins that regulate the expression of genes by binding to their promoter regions. There is great interest in understanding in which regions TFs will bind to the DNA sequence of an organism and the possible genetic implications that this entails. Occasionally, the sequence patterns (motifs) that a TF binds are not well defined. In this work, machine learning (ML) models were applied to TF binding data from ChIP-seq experiments. The objective was to detect patterns in TF binding regions that involved structural (DNAShapeR) and compositional (kmers) characteristics of the DNA sequence. After the application of random forest and Glmnet ML techniques with both internal and external validation, it was observed that two types of generated descriptors (HelT and tetramers) were significantly better than the others in terms of prediction, achieving values of more than 90%.This work has received financial support from the Xunta de Galicia and the European Union (European Social Fund (ESF)). This project was also supported by the General Directorate of Culture, Education and University Management of Xunta de Galicia (Ref. ED431G/01, ED431D 2017/16).Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/1

    Comparative analysis of Ralstonia solanacearum methylomes

    Get PDF
    Ralstonia solanacearum is an important soil-borne plant pathogen with broad geographical distribution and the ability to cause wilt disease in many agriculturally important crops. Genome sequencing of multiple R. solanacearum strains has identified both unique and shared genetic traits influencing their evolution and ability to colonize plant hosts. Previous research has shown that DNA methylation can drive speciation and modulate virulence in bacteria, but the impact of epigenetic modifications on the diversification and pathogenesis of R. solanacearum is unknown. Sequencing of R. solanacearum strains GMI1000 and UY031 using Single Molecule Real-Time technology allowed us to perform a comparative analysis of R. solanacearum methylomes. Our analysis identified a novel methylation motif associated with a DNA methylase that is conserved in all complete Ralstonia spp. genomes and across the Burkholderiaceae, as well as a methylation motif associated to a phage-borne methylase unique to R. solanacearum UY031. Comparative analysis of the conserved methylation motif revealed that it is most prevalent in gene promoter regions, where it displays a high degree of conservation detectable through phylogenetic footprinting. Analysis of hyper- and hypo-methylated loci identified several genes involved in global and virulence regulatory functions whose expression may be modulated by DNA methylation. Analysis of genome-wide modification patterns identified a significant correlation between DNA modification and transposase genes in R. solanacearum UY031, driven by the presence of a high copy number of ISrso3 insertion sequences in this genome and pointing to a novel mechanism for regulation of transposition. These results set a firm foundation for experimental investigations into the role of DNA methylation in R. solanacearum evolution and its adaptation to different plants

    RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes

    Get PDF
    The RegPrecise database (http://regprecise.lbl.gov) was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes that were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria. The reconstructed regulons include transcription factors, their cognate DNA motifs and regulated genes/operons linked to the candidate transcription factor binding sites. The RegPrecise allows for browsing the regulon collections for: (i) conservation of DNA binding sites and regulated genes for a particular regulon across diverse taxonomic lineages; (ii) sets of regulons for a family of transcription factors; (iii) repertoire of regulons in a particular taxonomic group of species; (iv) regulons associated with a metabolic pathway or a biological process in various genomes. The initial release of the database includes ∼11 500 candidate binding sites for ∼400 orthologous groups of transcription factors from over 350 prokaryotic genomes. Majority of these data are represented by genome-wide regulon reconstructions in Shewanella and Streptococcus genera and a large-scale prediction of regulons for the LacI family of transcription factors. Another section in the database represents the results of accurate regulon propagation to the closely related genomes

    Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO).

    Get PDF
    Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills

    A reexamination of information theory-based methods for DNA-binding site identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Searching for transcription factor binding sites in genome sequences is still an open problem in bioinformatics. Despite substantial progress, search methods based on information theory remain a standard in the field, even though the full validity of their underlying assumptions has only been tested in artificial settings. Here we use newly available data on transcription factors from different bacterial genomes to make a more thorough assessment of information theory-based search methods.</p> <p>Results</p> <p>Our results reveal that conventional benchmarking against artificial sequence data leads frequently to overestimation of search efficiency. In addition, we find that sequence information by itself is often inadequate and therefore must be complemented by other cues, such as curvature, in real genomes. Furthermore, results on skewed genomes show that methods integrating skew information, such as <it>Relative Entropy</it>, are not effective because their assumptions may not hold in real genomes. The evidence suggests that binding sites tend to evolve towards genomic skew, rather than against it, and to maintain their information content through increased conservation. Based on these results, we identify several misconceptions on information theory as applied to binding sites, such as negative entropy, and we propose a revised paradigm to explain the observed results.</p> <p>Conclusion</p> <p>We conclude that, among information theory-based methods, the most unassuming search methods perform, on average, better than any other alternatives, since heuristic corrections to these methods are prone to fail when working on real data. A reexamination of information content in binding sites reveals that information content is a compound measure of search and binding affinity requirements, a fact that has important repercussions for our understanding of binding site evolution.</p

    Evidence for Induction of Integron-Based Antibiotic Resistance by the SOS Response in a Clinical Setting

    Get PDF
    Bacterial resistance to Ξ²-lactams may rely on acquired Ξ²-lactamases encoded by class 1 integron-borne genes. Rearrangement of integron cassette arrays is mediated by the integrase IntI1. It has been previously established that integrase expression can be activated by the SOS response in vitro, leading to speculation that this is an important clinical mechanism of acquiring resistance. Here we report the first in vivo evidence of the impact of SOS response activated by the antibiotic treatment given to a patient and its output in terms of resistance development. We identified a new mechanism of modulation of antibiotic resistance in integrons, based on the insertion of a genetic element, the gcuF1 cassette, upstream of the integron-borne cassette blaOXA-28 encoding an extended spectrum Ξ²-lactamase. This insertion creates the fused protein GCUF1-OXA-28 and modulates the transcription, the translation, and the secretion of the Ξ²-lactamase in a Pseudomonas aeruginosa isolate (S-Pae) susceptible to the third generation cephalosporin ceftazidime. We found that the metronidazole, not an anti-pseudomonal antibiotic given to the first patient infected with S-Pae, triggered the SOS response that subsequently activated the integrase IntI1 expression. This resulted in the rearrangement of the integron gene cassette array, through excision of the gcuF1 cassette, and the full expression the Ξ²-lactamase in an isolate (R-Pae) highly resistant to ceftazidime, which further spread to other patients within our hospital. Our results demonstrate that in human hosts, the antibiotic-induced SOS response in pathogens could play a pivotal role in adaptation process of the bacteria

    Evolution of a Bacterial Regulon Controlling Virulence and Mg2+ Homeostasis

    Get PDF
    Related organisms typically rely on orthologous regulatory proteins to respond to a given signal. However, the extent to which (or even if) the targets of shared regulatory proteins are maintained across species has remained largely unknown. This question is of particular significance in bacteria due to the widespread effects of horizontal gene transfer. Here, we address this question by investigating the regulons controlled by the DNA-binding PhoP protein, which governs virulence and Mg2+ homeostasis in several bacterial species. We establish that the ancestral PhoP protein directs largely different gene sets in ten analyzed species of the family Enterobacteriaceae, reflecting both regulation of species-specific targets and transcriptional rewiring of shared genes. The two targets directly activated by PhoP in all ten species (the most distant of which diverged >200 million years ago), and coding for the most conserved proteins are the phoPQ operon itself and the lipoprotein-encoding slyB gene, which decreases PhoP protein activity. The Mg2+-responsive PhoP protein dictates expression of Mg2+ transporters and of enzymes that modify Mg2+-binding sites in the cell envelope in most analyzed species. In contrast to the core PhoP regulon, which determines the amount of active PhoP and copes with the low Mg2+ stress, the variable members of the regulon contribute species-specific traits, a property shared with regulons controlled by dissimilar regulatory proteins and responding to different signals

    Key Role of Mfd in the Development of Fluoroquinolone Resistance in Campylobacter jejuni

    Get PDF
    Campylobacter jejuni is a major food-borne pathogen and a common causative agent of human enterocolitis. Fluoroquinolones are a key class of antibiotics prescribed for clinical treatment of enteric infections including campylobacteriosis, but fluoroquinolone-resistant Campylobacter readily emerges under the antibiotic selection pressure. To understand the mechanisms involved in the development of fluoroquinolone-resistant Campylobacter, we compared the gene expression profiles of C. jejuni in the presence and absence of ciprofloxacin using DNA microarray. Our analysis revealed that multiple genes showed significant changes in expression in the presence of a suprainhibitory concentration of ciprofloxacin. Most importantly, ciprofloxacin induced the expression of mfd, which encodes a transcription-repair coupling factor involved in strand-specific DNA repair. Mutation of the mfd gene resulted in an approximately 100-fold reduction in the rate of spontaneous mutation to ciprofloxacin resistance, while overexpression of mfd elevated the mutation frequency. In addition, loss of mfd in C. jejuni significantly reduced the development of fluoroquinolone-resistant Campylobacter in culture media or chickens treated with fluoroquinolones. These findings indicate that Mfd is important for the development of fluoroquinolone resistance in Campylobacter, reveal a previously unrecognized function of Mfd in promoting mutation frequencies, and identify a potential molecular target for reducing the emergence of fluoroquinolone-resistant Campylobacter

    Evolution of a Bacterial Regulon Controlling Virulence and Mg2+ Homeostasis

    Get PDF
    Related organisms typically rely on orthologous regulatory proteins to respond to a given signal. However, the extent to which (or even if) the targets of shared regulatory proteins are maintained across species has remained largely unknown. This question is of particular significance in bacteria due to the widespread effects of horizontal gene transfer. Here, we address this question by investigating the regulons controlled by the DNA-binding PhoP protein, which governs virulence and Mg2+ homeostasis in several bacterial species. We establish that the ancestral PhoP protein directs largely different gene sets in ten analyzed species of the family Enterobacteriaceae, reflecting both regulation of species-specific targets and transcriptional rewiring of shared genes. The two targets directly activated by PhoP in all ten species (the most distant of which diverged >200 million years ago), and coding for the most conserved proteins are the phoPQ operon itself and the lipoprotein-encoding slyB gene, which decreases PhoP protein activity. The Mg2+-responsive PhoP protein dictates expression of Mg2+ transporters and of enzymes that modify Mg2+-binding sites in the cell envelope in most analyzed species. In contrast to the core PhoP regulon, which determines the amount of active PhoP and copes with the low Mg2+ stress, the variable members of the regulon contribute species-specific traits, a property shared with regulons controlled by dissimilar regulatory proteins and responding to different signals
    • …
    corecore