80 research outputs found

    Riboswitch Detection Using Profile Hidden Markov Models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We propose a method for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain.</p> <p>Results</p> <p>Our method can detect riboswitches in genomic databases rapidly and accurately. Its sensitivity is comparable to the method based on the Covariance Model (CM). For six out of ten riboswitch classes, our method detects more than 99.5% of the candidates identified by the much slower CM method while being several hundred times faster. For three riboswitch classes, our method detects 97-99% of the candidates relative to the CM method. Our method works very well for those classes of riboswitches that are characterized by distinct and conserved sequence motifs.</p> <p>Conclusion</p> <p>Riboswitches play a crucial role in controlling the expression of several prokaryotic genes involved in metabolism and transport processes. As more and more new classes of riboswitches are being discovered, it is important to understand the patterns of their intra and inter genomic distribution. Understanding such patterns will enable us to better understand the evolutionary history of these genetic regulatory elements. However, a complete picture of the distribution pattern of riboswitches will emerge only after accurate identification of riboswitches across genomes. We believe that the riboswitch detection method developed in this paper will aid in that process. The significant advantage in terms of speed, of our pHMM-based approach over the method based on CM allows us to scan entire databases (rather than 5'UTRs only) in a relatively short period of time in order to accurately identify riboswitch candidates.</p

    RegRNA: an integrated web server for identifying regulatory RNA motifs and elements

    Get PDF
    Numerous regulatory structural motifs have been identified as playing essential roles in transcriptional and post-transcriptional regulation of gene expression. RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5â€Č-untranslated region (5â€Č-UTR) and 3â€Č-UTR; (ii) motifs involved in mRNA splicing; (iii) motifs involved in transcriptional regulation; (iv) riboswitches; (v) splicing donor/acceptor sites; (vi) inverted repeats; and (vii) miRNA target sites. The experimentally validated regulatory RNA motifs are extracted from literature survey and several regulatory RNA motif databases, such as UTRdb, TRANSFAC, alternative splicing database (ASD) and miRBase. A variety of computational programs are integrated for identifying the homologs of the regulatory RNA motifs. An intuitive user interface is designed to facilitate the comprehensive annotation of user-submitted mRNA sequences. The RegRNA web server is now available at

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

    RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements

    Get PDF
    We present RibEx (riboswitch explorer), a web server capable of searching any sequence for known riboswitches as well as other predicted, but highly conserved, bacterial regulatory elements. It allows the visual inspection of the identified motifs in relation to attenuators and open reading frames (ORFs). Any of the ORF's or regulatory elements' sequence can be obtained with a click and submitted to NCBI's BLAST. Alternatively, the genome context of all other genes regulated by the same element can be explored with our genome context tool (GeConT). RibEx is available at

    Maximum expected accuracy structural neighbors of an RNA secondary structure

    Get PDF
    International audienceBACKGROUND: Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detect general (i.e., not family specific) entire riboswitches (both aptamer and expression platform) with accuracy. Thus, the development of additional algorithms to detect conformational switches seems important, especially since the difference in free energy between the two metastable secondary structures may be as large as 15-20 kcal/mol. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accuracy (MEA) structure, rather than the minimum free energy (MFE) structure. RESULTS: Given an arbitrary RNA secondary structure S₀ for an RNA nucleotide sequence a = a₁,..., a(n), we say that another secondary structure S of a is a k-neighbor of S₀, if the base pair distance between S₀ and S is k. In this paper, we prove that the Boltzmann probability of all k-neighbors of the minimum free energy structure S₀ can be approximated with accuracy Δ and confidence 1 - p, simultaneously for all 0 ≀ k N(Δ,p,K)=Ί⁻Âč(p/2K)ÂČ/4ΔÂČ, where Ί(z) is the cumulative distribution function (CDF) for the standard normal distribution. We go on to describe the algorithm RNAborMEA, which for an arbitrary initial structure S₀ and for all values 0 ≀ k < K, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbors of S₀. Computation time is O(nÂł * KÂČ), and memory requirements are O(nÂČ * K). We analyze a sample TPP riboswitch, and apply our algorithm to the class of purine riboswitches. CONCLUSIONS: The approximation of RNAbor by sampling, with rigorous bound on accuracy, together with the computation of maximum expected accuracy k-neighbors by RNAborMEA, provide additional tools toward conformational switch detection. Results from RNAborMEA are quite distinct from other tools, such as RNAbor, RNAshapes and paRNAss, hence may provide orthogonal information when looking for suboptimal structures or conformational switches. Source code for RNAborMEA can be downloaded from http://sourceforge.net/projects/rnabormea/ or http://bioinformatics.bc.edu/clotelab/RNAborMEA/

    Investigating the mechanism of a novel glycine-dependent riboswitch and a putative non-coding regulatory RNA in Streptococcus pyogenes

    Get PDF
    We investigated gene expression regulation by a putative glycine(Gly) riboswitch located in the 5â€Č-UTR of a SAF protein gene in the S.pyogenes. Gly-dependency was studied using a luciferase reporter gene system. Maximal reporter gene expression happened in the presence of low Gly concentrations. RT-qPCR showed that in the presence of Gly (≄1 mM), expression of the gene was downregulated. Growth in the presence of 0.1 mM Gly led to the production of a full-length transcript. We conclude that the Gly riboswitch in S. pyogenes represses gene expression in the presence of high Gly concentrations

    Detection of small RNAs in Bordetella pertussis and identification of a novel repeated genetic element

    Get PDF
    Background: Small bacterial RNAs (sRNAs) have been shown to participate in the regulation of gene expression and have been identified in numerous prokaryotic species. Some of them are involved in the regulation of virulence in pathogenic bacteria. So far, little is known about sRNAs in Bordetella, and only very few sRNAs have been identified in the genome of Bordetella pertussis, the causative agent of whooping cough. Results: An in silico approach was used to predict sRNAs genes in intergenic regions of the B. pertussis genome. The genome sequences of B. pertussis, Bordetella parapertussis, Bordetella bronchiseptica and Bordetella avium were compared using a Blast, and significant hits were analyzed using RNAz. Twenty-three candidate regions were obtained, including regions encoding the already documented 6S RNA, and the GCVT and FMN riboswitches. The existence of sRNAs was verified by Northern blot analyses, and transcripts were detected for 13 out of the 20 additional candidates. These new sRNAs were named Bordetella pertussis RNAs, bpr. The expression of 4 of them differed between the early, exponential and late growth phases, and one of them, bprJ2, was found to be under the control of BvgA/BvgS two-component regulatory system of Bordetella virulence. A phylogenetic study of the bprJ sequence revealed a novel, so far undocumented repeat of ~90 bp, found in numerous copies in the Bordetella genomes and in that of other Betaproteobacteria. This repeat exhibits certain features of mobil

    Subtle genetic changes enhance virulence of methicillin resistant and sensitive Staphylococcus aureus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Community acquired (CA) methicillin-resistant <it>Staphylococcus aureus </it>(MRSA) increasingly causes disease worldwide. USA300 has emerged as the predominant clone causing superficial and invasive infections in children and adults in the USA. Epidemiological studies suggest that USA300 is more virulent than other CA-MRSA. The genetic determinants that render virulence and dominance to USA300 remain unclear.</p> <p>Results</p> <p>We sequenced the genomes of two pediatric USA300 isolates: one CA-MRSA and one CA-methicillin susceptible (MSSA), isolated at Texas Children's Hospital in Houston. DNA sequencing was performed by Sanger dideoxy whole genome shotgun (WGS) and 454 Life Sciences pyrosequencing strategies. The sequence of the USA300 MRSA strain was rigorously annotated. In USA300-MRSA 2658 chromosomal open reading frames were predicted and 3.1 and 27 kilobase (kb) plasmids were identified. USA300-MSSA contained a 20 kb plasmid with some homology to the 27 kb plasmid found in USA300-MRSA. Two regions found in US300-MRSA were absent in USA300-MSSA. One of these carried the arginine deiminase operon that appears to have been acquired from <it>S. epidermidis</it>. The USA300 sequence was aligned with other sequenced <it>S. aureus </it>genomes and regions unique to USA300 MRSA were identified.</p> <p>Conclusion</p> <p>USA300-MRSA is highly similar to other MRSA strains based on whole genome alignments and gene content, indicating that the differences in pathogenesis are due to subtle changes rather than to large-scale acquisition of virulence factor genes. The USA300 Houston isolate differs from another sequenced USA300 strain isolate, derived from a patient in San Francisco, in plasmid content and a number of sequence polymorphisms. Such differences will provide new insights into the evolution of pathogens.</p

    Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315

    Get PDF
    Background: Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale. Methods: RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes. Results: Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered. Conclusions: Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation
    • 

    corecore