4,188 research outputs found

    Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA

    Get PDF
    BACKGROUND: RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified. RESULTS: Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations. CONCLUSIONS: Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions

    PREP-Mt: predictive RNA editor for plant mitochondrial genes

    Get PDF
    BACKGROUND: In plants, RNA editing is a process that converts specific cytidines to uridines and uridines to cytidines in transcripts from virtually all mitochondrial protein-coding genes. There are thousands of plant mitochondrial genes in the sequence databases, but sites of RNA editing have not been determined for most. Accurate methods of RNA editing site prediction will be important in filling in this information gap and could reduce or even eliminate the need for experimental determination of editing sites for many sequences. Because RNA editing tends to increase protein conservation across species by "correcting" codons that specify unconserved amino acids, this principle can be used to predict editing sites by identifying positions where an RNA editing event would increase the conservation of a protein to homologues from other plants. PREP-Mt takes this approach to predict editing sites for any protein-coding gene in plant mitochondria. RESULTS: To test the general applicability of the PREP-Mt methodology, RNA editing sites were predicted for 370 full-length or nearly full-length DNA sequences and then compared to the known sites of RNA editing for these sequences. Of 60,263 cytidines in this test set, PREP-Mt correctly classified 58,994 as either an edited or unedited site (accuracy = 97.9%). PREP-Mt properly identified 3,038 of the 3,698 known sites of RNA editing (sensitivity = 82.2%) and 55,956 of the 56,565 known unedited sites (specificity = 98.9%). Accuracy and sensitivity increased to 98.7% and 94.7%, respectively, after excluding the 489 silent editing sites (which have no effect on protein sequence or function) from the test set. CONCLUSION: These results indicate that PREP-Mt is effective at identifying C to U RNA editing sites in plant mitochondrial protein-coding genes. Thus, PREP-Mt should be useful in predicting protein sequences for use in molecular, biochemical, and phylogenetic analyses. In addition, PREP-Mt could be used to determine functionality of a mitochondrial gene or to identify particular sequences with unusual editing properties. The PREP-Mt methodology should be applicable to any system where RNA editing increases protein conservation across species

    Is plant mitochondrial RNA editing a source of phylogenetic incongruence? An answer from in silico and in vivo data sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In plant mitochondria, the post-transcriptional RNA editing process converts C to U at a number of specific sites of the mRNA sequence and usually restores phylogenetically conserved codons and the encoded amino acid residues. Sites undergoing RNA editing evolve at a higher rate than sites not modified by the process. As a result, editing sites strongly affect the evolution of plant mitochondrial genomes, representing an important source of sequence variability and potentially informative characters.</p> <p>To date no clear and convincing evidence has established whether or not editing sites really affect the topology of reconstructed phylogenetic trees. For this reason, we investigated here the effect of RNA editing on the tree building process of twenty different plant mitochondrial gene sequences and by means of computer simulations.</p> <p>Results</p> <p>Based on our simulation study we suggest that the editing ‘noise’ in tree topology inference is mainly manifested at the cDNA level. In particular, editing sites tend to confuse tree topologies when artificial genomic and cDNA sequences are generated shorter than 500 bp and with an editing percentage higher than 5.0%. Similar results have been also obtained with genuine plant mitochondrial genes. In this latter instance, indeed, the topology incongruence increases when the editing percentage goes up from about 3.0 to 14.0%. However, when the average gene length is higher than 1,000 bp (<it>rps3</it>, <it>matR</it> and <it>atp1</it>) no differences in the comparison between inferred genomic and cDNA topologies could be detected.</p> <p>Conclusions</p> <p>Our findings by the here reported <it>in silico</it> and <it>in vivo</it> computer simulation system seem to strongly suggest that editing sites contribute in the generation of misleading phylogenetic trees if the analyzed mitochondrial gene sequence is highly edited (higher than 3.0%) and reduced in length (shorter than 500 bp).</p> <p>In the current lack of direct experimental evidence the results presented here encourage, thus, the use of genomic mitochondrial rather than cDNA sequences for reconstructing phylogenetic events in land plants.</p

    Characteristics and Prediction of RNA Editing Sites in Transcripts of the Moss Takakia lepidozioides Chloroplast

    Get PDF
    RNA editing in land plant organelles is a process primarily involving the conversion of cytidine to uridine in pre-mRNAs. The process is required for gene expression in plant organelles, because this conversion alters the encoded amino acid residues and improves the sequence identity to homologous proteins. A recent study uncovered that proteins encoded in the nuclear genome are essential for editing site recognition in chloroplasts; the mechanisms by which this recognition occurs remain unclear. To understand these mechanisms, we determined the genomic and cDNA sequences of moss Takakia lepidozioides chloroplast genes, then computationally analyzed the sequences within −30 to +10 nucleotides of RNA editing sites (neighbor sequences) likely to be recognized by trans-factors. As the T. lepidozioides chloroplast has many RNA editing sites, the analysis of these sequences provides a unique opportunity to perform statistical analyses of chloroplast RNA editing sites. We divided the 302 obtained neighbor sequences into eight groups based on sequence similarity to identify group-specific patterns. The patterns were then applied to predict novel RNA editing sites in T. lepidozioides transcripts; ∼60% of these predicted sites are true editing sites. The success of this prediction algorithm suggests that the obtained patterns are indicative of key sites recognized by trans-factors around editing sites of T. lepidozioides chloroplast genes

    Genetic algorithm learning as a robust approach to RNA editing site prediction

    Get PDF
    BACKGROUND: RNA editing is one of several post-transcriptional modifications that may contribute to organismal complexity in the face of limited gene complement in a genome. One form, known as C → U editing, appears to exist in a wide range of organisms, but most instances of this form of RNA editing have been discovered serendipitously. With the large amount of genomic and transcriptomic data now available, a computational analysis could provide a more rapid means of identifying novel sites of C → U RNA editing. Previous efforts have had some success but also some limitations. We present a computational method for identifying C → U RNA editing sites in genomic sequences that is both robust and generalizable. We evaluate its potential use on the best data set available for these purposes: C → U editing sites in plant mitochondrial genomes. RESULTS: Our method is derived from a machine learning approach known as a genetic algorithm. REGAL (RNA Editing site prediction by Genetic Algorithm Learning) is 87% accurate when tested on three mitochondrial genomes, with an overall sensitivity of 82% and an overall specificity of 91%. REGAL's performance significantly improves on other ab initio approaches to predicting RNA editing sites in this data set. REGAL has a comparable sensitivity and higher specificity than approaches which rely on sequence homology, and it has the advantage that strong sequence conservation is not required for reliable prediction of edit sites. CONCLUSION: Our results suggest that ab initio methods can generate robust classifiers of putative edit sites, and we highlight the value of combinatorial approaches as embodied by genetic algorithms. We present REGAL as one approach with the potential to be generalized to other organisms exhibiting C → U RNA editing

    The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments

    Get PDF
    RNA editing alters plant mitochondrial and chloroplast transcripts by converting specific cytidines to uridines, which usually results in a change in the amino acid sequence of the translated protein. Systematic studies have experimentally identified sites of RNA editing in organellar transcriptomes from several species, but these analyses have not kept pace with rate of genome sequencing. The PREP (predictive RNA editors for plants) suite was developed to computationally predict sites of RNA editing based on the well-known principle that editing in plant organelles increases the conservation of proteins across species. The PREP suite provides predictive RNA editors for plant mitochondrial genes (PREP-Mt), for chloroplast genes (PREP-Cp), and for alignments submitted by the user (PREP-Aln). These servers require minimal input, are very fast, and are highly accurate on all seed plants examined to date. PREP-Mt has proved useful in several research studies and the newly developed PREP-Cp and PREP-Aln servers should be of further assistance for analyses that require knowledge of the location of sites of RNA editing. The PREP suite is freely available at http://prep.unl.edu/

    Editing site analysis in a gymnosperm mitochondrial genome reveals similarities with angiosperm mitochondrial genomes

    Get PDF
    Sequence analysis of organelle genomes and comprehensive analysis of C-to-U editing sites from flowering and non-flowering plants have provided extensive sequence information from diverse taxa. This study includes the first comprehensive analysis of RNA editing sites from a gymnosperm mitochondrial genome, and utilizes informatics analyses to determine conserved features in the RNA sequence context around editing sites. We have identified 565 editing sites in 21 full-length and 4 partial cDNAs of the 39 protein-coding genes identified from the mitochondrial genome of Cycas taitungensis. The information profiles and RNA sequence context of C-to-U editing sites in the Cycas genome exhibit similarity in the immediate flanking nucleotides. Relative entropy analyses indicate that similar regions in the 5′ flanking 20 nucleotides have information content compared to angiosperm mitochondrial genomes. These results suggest that evolutionary constraints exist on the nucleotide sequences immediately adjacent to C-to-U editing sites, and similar regions are utilized in editing site recognition

    A population study of the minicircles in Trypanosoma cruzi: predicting guide RNAs in the absence of empirical RNA editing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The structurally complex network of minicircles and maxicircles comprising the mitochondrial DNA of kinetoplastids mirrors the complexity of the RNA editing process that is required for faithful expression of encrypted maxicircle genes. Although a few of the guide RNAs that direct this editing process have been discovered on maxicircles, guide RNAs are mostly found on the minicircles. The nuclear and maxicircle genomes have been sequenced and assembled for <it>Trypanosoma cruzi</it>, the causative agent of Chagas disease, however the complement of 1.4-kb minicircles, carrying four guide RNA genes per molecule in this parasite, has been less thoroughly characterised.</p> <p>Results</p> <p>Fifty-four CL Brener and 53 Esmeraldo strain minicircle sequence reads were extracted from <it>T. cruzi </it>whole genome shotgun sequencing data. With these sequences and all published <it>T. cruzi </it>minicircle sequences, 108 unique guide RNAs from all known <it>T. cruzi </it>minicircle sequences and two guide RNAs from the CL Brener maxicircle were predicted using a local alignment algorithm and mapped onto predicted or experimentally determined sequences of edited maxicircle open reading frames. For half of the sequences no statistically significant guide RNA could be assigned. Likely positions of these unidentified gRNAs in <it>T. cruzi </it>minicircle sequences are estimated using a simple Hidden Markov Model. With the local alignment predictions as a standard, the HMM had an ~85% chance of correctly identifying at least 20 nucleotides of guide RNA from a given minicircle sequence. Inter-minicircle recombination was documented. Variable regions contain species-specific areas of distinct nucleotide preference. Two maxicircle guide RNA genes were found.</p> <p>Conclusion</p> <p>The identification of new minicircle sequences and the further characterization of all published minicircles are presented, including the first observation of recombination between minicircles. Extrapolation suggests a level of 4% recombinants in the population, supporting a relatively high recombination rate that may serve to minimize the persistence of gRNA pseudogenes. Characteristic nucleotide preferences observed within variable regions provide potential clues regarding the transcription and maturation of <it>T. cruzi </it>guide RNAs. Based on these preferences, a method of predicting <it>T. cruzi </it>guide RNAs using only primary minicircle sequence data was created.</p
    corecore