Search CORE

457 research outputs found

Derivation of Context-free Stochastic L-Grammar Rules for Promoter Sequence Modeling Using Support Vector Machine

Author: Damaševičius Robertas
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences

Bulgarian Digital Mathematics Library at IMI-BAS

RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences

Author: Bernhart
Bindewald
Carvalho
Cary
Charles E. Lawrence
Chenna
Ding
Ding
Do
Do
Do
Donglai Wei
Eddy
Gardner
Geman
Giegerich
Griffiths-Jones
Gutell
Hamada
Hamada
Hofacker
Hofacker
Ji
Kiryu
Kiryu
Knudsen
Lauren V. Alpert
Lindgreen
Liu
Mathews
Mathews
Meyer
Nawrocki
Nawrocki
Newberg
Sakakibara
Sankoff
Seemann
Siebert
Steffen
Tabaska
Torarinsson
Webb
Webb-Robertson
Will
Xing
Yao
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions

CiteSeerX

Crossref

PubMed Central

Recommended from our members

Foundations of statistical methods for multiple sequence alignment and structure prediction

Author: Lawrence C.
Publication venue: 'Stanford University Press'
Publication date: 31/12/1995
Field of study

Statistical algorithms have proven to be useful in computational molecular biology. Many statistical problems are most easily addressed by pretending that critical missing data are available. For some problems statistical inference in facilitated by creating a set of latent variables, none of whose variables are observed. A key observation is that conditional probabilities for the values of the missing data can be inferred by application of Bayes theorem to the observed data. The statistical framework described in this paper employs Boltzmann like models, permutated data likelihood, EM, and Gibbs sampler algorithms. This tutorial reviews the common statistical framework behind all of these algorithms largely in tabular or graphical terms, illustrates its application, and describes the biological underpinnings of the models used

UNT Digital Library

Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

Author: Dowell Robin D
Eddy Sean R
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? RESULTS: Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. CONCLUSIONS: Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework

Author: Meyer Irmtraud M
Miklós István
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses

Public Library of Science (PLOS)

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

MDC Repository

XRate: a fast prototyping, training and annotation tool for phylo-grammars

Author: Bendaña Yuri R
Bradley Robert K
Chao Sharon
Goldman Nick
Holmes Ian
Klosterman Peter S
Kosiol Carolin
Uzilov Andrew V
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Tools for simulating evolution of aligned genomic regions with integrated parameter estimation

Author: Avinash Varadarajan
Ian H Holmes
Robert K Bradley
Publication venue: Springer Nature
Publication date
Field of study

Springer - Publisher Connector

Learning the Language of Biological Sequences

Author: Coste François
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

International audienceLearning the language of biological sequences is an appealing challenge for the grammatical inference research field.While some first successes have already been recorded, such as the inference of profile hidden Markov models or stochastic context-free grammars which are now part of the classical bioinformatics toolbox, it is still a source of open and nice inspirational problems for grammatical inference, enabling us to confront our ideas to real fundamental applications. As an introduction to this field, we survey here the main ideas and concepts behind the approaches developed in pattern/motif discovery and grammatical inference to characterize successfully the biological sequences with their specificities

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1