BACKGROUND: For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates. RESULTS: Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels. CONCLUSIONS: Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool

Holmes, Ian

BMC Bioinformatics

English

PubMed

Abstract Background For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates. Results Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels. Conclusions Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool.</p

Holmes Ian

Directory of Open Access Journals

A probabilistic model for the evolution of RNA structure

BackgroundFor the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates.ResultsHere, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels.ConclusionsAlthough all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool

eScholarship - University of California

Springer - Publisher Connector

A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics

An Algorithm for Statistical Alignment of Sequences Related by a Binary Tree. In

An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences.

BC: A Model of Evolutionary Change in Proteins. In

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge, UK:

DH: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences.

Eddy SR: Noncoding RNA gene detection using comparative sequence analysis.

Eddy SR: RESEARCH: Finding homologs of single structured RNA sequences.

Eddy SR: Rfam: an RNA family database.

Finding the common structure shared by two homologous RNAs. Bioinformatics

Gavis ER: Overlapping but distinct RNA elements control repression and activation of nanos translation. Molecular cell

GD: Finding the most significant common sequence and structure motifs in a set of RNA sequences.

Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics

Haussler D: Phylogenetic estimation of contextdependent substitution rates by maximum likelihood. Molecular Biology and Evolution

Henikoff JG: Amino acid substitution matrices from protein blocks.

I: A long indel model for evolutionary sequence alignment. Molecular Biology and Evolution

Inching Toward Reality: an Improved Likelihood Model of Sequence Evolution.

Inferring Phylogenies Sinauer Associates, Inc;

Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution

NR: Phylogenetic-comparative analysis of the eukaryal ribonuclease P RNA. RNA

RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics

RNA-protein intermolecular recognition. Accounts of chemical research

RR: Riboswitches Control Fundamental Biochemical Pathways in Bacillus subtilis and Other Bacteria. Cell

Rubin GM: Expectation Maximization algorithm for training hidden substitution models.

Rubin GM: Pairwise RNA structure comparison using stochastic context-free grammars. Pacific Symposium on Biocomputing

SC: Crystal structure of a model branchpoint-U2 snRNA duplex containing bulged adenosines. RNA

SE: SCOR: a structural classification of RNA database.

Sequence Alignments and Pair Hidden Markov Models Using Evolutionary History.

Simultaneous solution of the RNA folding, alignment, and protosequence problems.

SR: Computational identification of noncoding RNAs in E. coli by comparative genomics. Current Biology

Structural basis for recognition of the AGNN tetraloop RNA fold by the doublestranded RNA-binding domain of Rnt1p RNase III.

Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Research

Topological bias and inconsistency of maximum likelihood using wrong models. Molecular Biology and Evolution

Toroczkai Z: An Improved Model for Statistical Alignment. In

Using guide trees to construct multiple-sequence evolutionary HMMs. In

https://escholarship.org/uc/item/8m45x32w

A probabilistic model for the evolution of RNA structure

Abstract

Similar works

Full text

Available Versions

Directory of Open Access Journals

eScholarship - University of California

Springer - Publisher Connector