551 research outputs found

    Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes

    Get PDF
    Abstract Background Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. Results We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I σ70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the α subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the σ70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. Conclusion The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.</p

    PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics.</p> <p>Findings</p> <p>Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS).</p> <p>Conclusion</p> <p>PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web <url>http://nucleix.mbu.iisc.ernet.in/prombase/</url>.</p

    PromoterPredict: sequence-based modelling of Escherichia coli σ70 promoter strength yields logarithmic dependence between promoter strength and sequence

    Get PDF
    We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ70 factor of RNA polymerase. σ70 promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the −35 and −10 hexamer regions of σ70-binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the −35 and −10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the −10 and −35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts −10 and −35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service (https://promoterpredict.com) and standalone tool (https://github.com/PromoterPredict). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes

    Characterization of labrenzin biosynthesis in marine alphaproteobacterium Labrenzia sp. PHM005

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología. Fecha de lectura: 02-03-202

    Characterization of the Small RNA Transcriptome of the Marine Coccolithophorid, Emiliania huxleyi

    Get PDF
    Small RNAs (smRNAs) control a variety of cellular processes by silencing target genes at the transcriptional or post-transcription level. While extensively studied in plants, relatively little is known about smRNAs and their targets in marine phytoplankton, such as Emiliania huxleyi (E. huxleyi). Deep sequencing was performed of smRNAs extracted at different time points as E. huxleyi cells transition from logarithmic to stationary phase growth in batch culture. Computational analyses predicted 18 E. huxleyi specific miRNAs. The 18 miRNA candidates and their precursors vary in length (18-24 nt and 71-252 nt, respectively), genome copy number (3-1,459), and the number of genes targeted (2-107). Stem-loop real time reverse transcriptase (RT) PCR was used to validate miRNA expression which varied by nearly three orders of magnitude when growth slows and cells enter stationary phase. Stem-loop RT PCR was also used to examine the expression profiles of miRNA in calcifying and non-calcifying cultures, and a small subset was found to be differentially expressed when nutrients become limiting and calcification is enhanced. In addition to miRNAs, endogenous small RNAs such as ra-siRNAs, ta-siRNAs, nat-siRNAs, and piwiRNAs were predicted along with the machinery for the biogenesis and processing of si-RNAs. This study is the first genome-wide investigation smRNAs pathways in E. huxleyi. Results provide new insights into the importance of smRNAs in regulating aspects of physiological growth and adaptation in marine phytoplankton and further challenge the notion that smRNAs evolved with multicellularity, expanding our perspective of these ancient regulatory pathways

    Role of mobile genetic elements in the global network of bacterial horizontal gene transfer

    Get PDF
    Many bacteria can exchange genetic material through horizontal gene transfer (HGT) mediated by plasmids and plasmid-borne transposable elements. One grave consequence of this exchange is the rapid spread of antibiotic resistance determinants among bacterial communities across the world. In this thesis, I make use of large datasets of publicly available bacterial genomes and various analytical approaches to improve our understanding of the nature and the impact of HGT at a global scale. In the first part, I study the population structure and dynamics of over 10,000 bacterial plasmids. By reconstructing and analysing a network of plasmids based on their shared k-mer content, I was able to sort them into biologically meaningful clusters. This network-based analysis allowed me to make further inferences into global network of HGT and opened up prospect for a natural and exhaustive classification framework of bacterial plasmids. The second part focuses on global spreading of blaNDM – an important antibiotic resistance gene. To this end, I compiled a dataset of over 6000 bacterial genomes harbouring this element and developed a novel computational approach to track structural variants surrounding blaNDM across bacterial genomes. This facilitated identification of prevalent genomic contexts of blaNDM and reconstruction of key mobile genetic elements and events which led to its global dissemination. Taken together, my results highlight transposable elements as the main drivers of HGT at broad phylogenetic and geographical scales with plasmid exchange being much more spatially restricted due to the adaptation to specific bacterial hosts and evolutionary pressures

    Synthesis of Biological and Mathematical Methods for Gene Network Control

    Get PDF
    abstract: Synthetic biology is an emerging field which melds genetics, molecular biology, network theory, and mathematical systems to understand, build, and predict gene network behavior. As an engineering discipline, developing a mathematical understanding of the genetic circuits being studied is of fundamental importance. In this dissertation, mathematical concepts for understanding, predicting, and controlling gene transcriptional networks are presented and applied to two synthetic gene network contexts. First, this engineering approach is used to improve the function of the guide ribonucleic acid (gRNA)-targeted, dCas9-regulated transcriptional cascades through analysis and targeted modification of the RNA transcript. In so doing, a fluorescent guide RNA (fgRNA) is developed to more clearly observe gRNA dynamics and aid design. It is shown that through careful optimization, RNA Polymerase II (Pol II) driven gRNA transcripts can be strong enough to exhibit measurable cascading behavior, previously only shown in RNA Polymerase III (Pol III) circuits. Second, inherent gene expression noise is used to achieve precise fractional differentiation of a population. Mathematical methods are employed to predict and understand the observed behavior, and metrics for analyzing and quantifying similar differentiation kinetics are presented. Through careful mathematical analysis and simulation, coupled with experimental data, two methods for achieving ratio control are presented, with the optimal schema for any application being dependent on the noisiness of the system under study. Together, these studies push the boundaries of gene network control, with potential applications in stem cell differentiation, therapeutics, and bio-production.Dissertation/ThesisDoctoral Dissertation Biomedical Engineering 201

    Breakdown of keratin-laden biomass waste by the thermophilic bacterium Fervidobacterium pennivorans strain T

    Get PDF
    Developing a more sustainable agro-industry has become a necessity in light of the current environmental crisis. Biocatalysts are already adopted in many industrial applications and have quickly optimized, and in some cases replaced, existing biochemical reactions within the modern agro-industry. Extremozymes, in particular, are valuable tools for processes requiring harsh industrial conditions where, for example, increased temperature may be beneficial for the bioavailability and solubility of organic compounds as well as for improvement in degradation of substrates. In this regard, alternatives to landfill disposal or incineration of keratinous materials such as feathers, wool, hides, hair etc. are emerging and efforts in exploiting thermo-stable keratinolytic biocatalysts have been attempted. Nonetheless, keratin degradation remains a complex process poorly understood and thus limiting the current toolbox of useful enzymes and organisms needed to meet all demands. In this study, a newly isolated strain of an anaerobic, thermophilic microorganism belonging to the Thermotogae phylum, Fervidobacterium pennivorans strain T, was assessed for its capability of degrading native chicken feathers. By following a multiomics approach, its proteolytic system was explored in the attempt to isolate new keratinase candidates. First, the physiology of F. pennivorans strain T was further investigated in batch cultures and the first growth curve of an organism of this species was described, showing a generation time of 150 minutes and a long stationary phase. Then, the complete genome of the organism was sequenced and analysed, revealing interesting molecular features, such as inverted genomic blocks, when compared to its most closely related organisms: F. pennivorans DSM9078T and F. islandicum AW-1. The strain T genome was slightly shorter (2002515 base pair) and had ANI values of 97.65 % and 80.90% to the compared organisms, respectively, but the same number of predicted protease-encoding genes (55) were found by gene mining analysis. Next, feather degradation by the organism was up-scaled using a bioreactor to further evaluate its potential in industrial applications and cells were sampled for transcriptomics purposes. F. pennivorans strain T performed mediocrely in the fermenter, but RNA extraction was, however, not successful. From secretomics analysis of growing cultures, an extracellular serine protease named Peg_1025 was identified, showing high sequence conservation with the subtilisin type proteases, especially with subtilisin Ak1 from Geobacillus stearothermophilus strain AK1. By multiple sequencing alignment, the catalytic triad His, Asp, Ser, as well as a signal peptide and a propeptide domain were predicted. Three dimensional structural modelling using subtilisin Ak1 as template, showed Peg_1025 to possess several insertions of unknown functions compared to subtilisin Ak1, only one conserved Ca2+ binding site as well as lack of a disulphide bond in the active cleft. Nonetheless, important structural motifs remained conserved. The enzyme was successfully expressed in E. coli using N- and C-terminal His-tag and soluble proteins were active at 70°C in proteolytic activity assays that used casein as substrate. Phylogenetic analyses revealed that Peg_1025 belongs to a distinct clade of Thermotogae peptidases separated from fervidolysin and Ak1, and as such, it represents the first characterized member of this phylogenetic group. Although the specific role of the serine protease in feather degradation remains unclear, the general results from this study confirm that F. pennivorans strain T possesses a complex machinery with keratinolytic power. The biology of this extremophile remains an intriguing field of exploration, further encouraged by its biotechnological potential that is still left to unfold.Master's Thesis in BiologyBIO399MAMN-BI

    Study of the complete genome sequence of Streptomyces scabies (or scabiei) 87.22

    Get PDF
    A study of the complete genome sequence of Streptomyces scabies 87.22, a common causative agent of scab disease of tubers including potato (Solanum tuberosum), is described. This work includes annotation of the genome and in-depth description of gene clusters likely to encode biosynthetic pathways for complex natural products and not also found in either “Streptomyces coelicolor” A3(2) or Streptomyces avermitilis MA-4680. Twenty-eight gene clusters were identified as likely to encode enzymes for the biosynthesis of complex natural products. Substances predicted by this work, not previously known to be made by S. scabies 87.22, were confirmed by collaborators as products - desferrioxamines, germicidins, and hopene. Of the clusters identified, fourteen gene clusters are not conserved in the other two streptomycete genome sequences for which comparisons have been undertaken. The Streptomyces genus is a reservoir of producer organisms from which many complex natural products of therapeutic importance have been isolated. These findings suggest that the cargo of cryptic and silent gene clusters amongst other members of this genus may add significantly to previous estimates of undiscovered bioactive natural products. Methods developed in this work could enable other researchers to rapidly identify gene clusters likely to encode enzymes involved in biosynthesis of complex natural products from complete genome sequences. De-replication is a problem for approaches to drug discovery based on activity screening and isolation of wild producer organisms. Computational methods in this work allow rapid de-replication of gene clusters following sequencing which may lead to discovery of many new natural products with therapeutic benefit. Sequences predicted to be involved in scab disease pathogenicity are not found in only one ‘pathogenicity island’ location as expected, but at several loci. Two possible mechanisms were identified from sequence data which it is suggested could be involved in regulation of pathogenicity traits: an MbtH-like protein family and an iron box sequence likely to be triggered response to low iron conditions
    corecore