78 research outputs found

    Automated alignment of RNA sequences to pseudoknotted structures

    Get PDF
    Abstract Seq7 is a new program for generating multiple structure-based alignments of RNA sequences. By using a variant of Dijkstra's algorithm to find the shortest path through a specially constructed graph, Seq7 is able to align RNA sequences to pseudoknotted structures in polynomial time. In this paper, we describe the operation of Seq7 and demonstrate the program's abilities. We also describe the use of Sex/7 in an Expectation-Maximization procedure that automates the process of structural modeling and alignment of RNA sequences

    High sensitivity RNA pseudoknot prediction

    Get PDF
    Most ab initio pseudoknot predicting methods provide very few folding scenarios for a given RNA sequence and have low sensitivities. RNA researchers, in many cases, would rather sacrifice the specificity for a much higher sensitivity for pseudoknot detection. In this study, we introduce the Pseudoknot Local Motif Model and Dynamic Partner Sequence Stacking (PLMM_DPSS) algorithm which predicts all PLM model pseudoknots within an RNA sequence in a neighboring-region-interference-free fashion. The PLM model is derived from the existing Pseudobase entries. The innovative DPSS approach calculates the optimally lowest stacking energy between two partner sequences. Combined with the Mfold, PLMM_DPSS can also be used in predicting complicated pseudoknots. The test results of PLMM_DPSS, PKNOTS, iterated loop matching, pknotsRG and HotKnots with Pseudobase sequences have shown that PLMM_DPSS is the most sensitive among the five methods. PLMM_DPSS also provides manageable pseudoknot folding scenarios for further structure determination

    Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

    Get PDF
    International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its nn positions. Assuming independently contributing base-pairs, the problem can be solved in Θ(n3)\Theta(n^3)-time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical Θ(n3)\Theta(n^3) algorithm that achieves at least a 55-approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless P=NPP=NP.La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille nn est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps Θ(n3)\Theta(n^3) grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une 1/51/5-approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si N=NPN=NP

    Sequence determinants in human polyadenylation site selection

    Get PDF
    BACKGROUND: Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. RESULTS: We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. CONCLUSION: The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm

    RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences

    Get PDF
    Motivation: RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions

    Thermodynamics of RNA structures by Wang–Landau sampling

    Get PDF
    Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures

    MicroRNA-mediated up-regulation of an alternatively polyadenylated variant of the mouse cytoplasmic β-actin gene

    Get PDF
    Actin is a major cytoskeletal protein in eukaryotes. Recent studies suggest more diverse functional roles for this protein. Actin mRNA is known to be localized to neuronal synapses and undergoes rapid deadenylation during early developmental stages. However, its 3′-untranslated region (UTR) is not characterized and there are no experimentally determined polyadenylation (polyA) sites in actin mRNA. We have found that the cytoplasmic β-actin (Actb) gene generates two alternative transcripts terminated at tandem polyA sites. We used 3′-RACE, EST end analysis and in situ hybridization to unambiguously establish the existence of two 3′-UTRs of varying length in Actb transcript in mouse neuronal cells. Further analyses showed that these two tandem polyA sites are used in a tissue-specific manner. Although the longer 3′-UTR was expressed at a relatively lower level, it conferred higher translational efficiency to the transcript. The longer transcript harbours a conserved mmu-miR-34a/34b-5p target site. Sequence-specific anti-miRNA molecule, mutations of the miRNA target region in the 3′-UTR resulted in reduced expression. The expression was restored by a mutant miRNA complementary to the mutated target region implying that miR-34 binding to Actb 3′-UTR up-regulates target gene expression. Heterogeneity of the Actb 3′-UTR could shed light on the mechanism of miRNA-mediated regulation of messages in neuronal cells

    Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

    Get PDF
    BACKGROUND: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION: Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses

    Phylogenetic analysis of mRNA polyadenylation sites reveals a role of transposable elements in evolution of the 3′-end of genes

    Get PDF
    mRNA polyadenylation is an essential step for the maturation of almost all eukaryotic mRNAs, and is tightly coupled with termination of transcription in defining the 3′-end of genes. Large numbers of human and mouse genes harbor alternative polyadenylation sites [poly(A) sites] that lead to mRNA variants containing different 3′-untranslated regions (UTRs) and/or encoding distinct protein sequences. Here, we examined the conservation and divergence of different types of alternative poly(A) sites across human, mouse, rat and chicken. We found that the 3′-most poly(A) sites tend to be more conserved than upstream ones, whereas poly(A) sites located upstream of the 3′-most exon, also termed intronic poly(A) sites, tend to be much less conserved. Genes with longer evolutionary history are more likely to have alternative polyadenylation, suggesting gain of poly(A) sites through evolution. We also found that nonconserved poly(A) sites are associated with transposable elements (TEs) to a much greater extent than conserved ones, albeit less frequently utilized. Different classes of TEs have different characteristics in their association with poly(A) sites via exaptation of TE sequences into polyadenylation elements. Our results establish a conservation pattern for alternative poly(A) sites in several vertebrate species, and indicate that the 3′-end of genes can be dynamically modified by TEs through evolution

    Why Does the Giant Panda Eat Bamboo? A Comparative Analysis of Appetite-Reward-Related Genes among Mammals

    Get PDF
    Background: The giant panda has an interesting bamboo diet unlike the other species in the order of Carnivora. The umami taste receptor gene T1R1 has been identified as a pseudogene during its genome sequencing project and confirmed using a different giant panda sample. The estimated mutation time for this gene is about 4.2 Myr. Such mutation coincided with the giant panda’s dietary change and also reinforced its herbivorous life style. However, as this gene is preserved in herbivores such as cow and horse, we need to look for other reasons behind the giant panda’s diet switch. Methodology/Principal Findings: Since taste is part of the reward properties of food related to its energy and nutrition contents, we did a systematic analysis on those genes involved in the appetite-reward system for the giant panda. We extracted the giant panda sequence information for those genes and compared with the human sequence first and then with seven other species including chimpanzee, mouse, rat, dog, cat, horse, and cow. Orthologs in panda were further analyzed based on the coding region, Kozak consensus sequence, and potential microRNA binding of those genes. Conclusions/Significance: Our results revealed an interesting dopamine metabolic involvement in the panda’s food choice
    corecore