3,692 research outputs found
RNA secondary structure prediction from multi-aligned sequences
It has been well accepted that the RNA secondary structures of most
functional non-coding RNAs (ncRNAs) are closely related to their functions and
are conserved during evolution. Hence, prediction of conserved secondary
structures from evolutionarily related sequences is one important task in RNA
bioinformatics; the methods are useful not only to further functional analyses
of ncRNAs but also to improve the accuracy of secondary structure predictions
and to find novel functional RNAs from the genome. In this review, I focus on
common secondary structure prediction from a given aligned RNA sequence, in
which one secondary structure whose length is equal to that of the input
alignment is predicted. I systematically review and classify existing tools and
algorithms for the problem, by utilizing the information employed in the tools
and by adopting a unified viewpoint based on maximum expected gain (MEG)
estimators. I believe that this classification will allow a deeper
understanding of each tool and provide users with useful information for
selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in
a chapter of the book `Methods in Molecular Biology'. Note that this version
of the manuscript may differ from the published versio
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences
Motivation: RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions
Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species
Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections
Circular RNAs in Clear Cell Renal Cell Carcinoma: Their Microarray-Based Identification, Analytical Validation, and Potential Use in a Clinico-Genomic Model to Improve Prognostic Accuracy
Circular RNAs (circRNAs) may act as novel cancer biomarkers. However, a genome-wide evaluation of circRNAs in clear cell renal cell carcinoma (ccRCC) has yet to be conducted. Therefore, the objective of this study was to identify and validate circRNAs in ccRCC tissue with a focus to evaluate their potential as prognostic biomarkers. A genome-wide identification of circRNAs in total RNA extracted from ccRCC tissue samples was performed using microarray analysis. Three relevant differentially expressed circRNAs were selected (circEGLN3, circNOX4, and circRHOBTB3), their circular nature was experimentally confirmed, and their expression-along with that of their linear counterparts-was measured in 99 malignant and 85 adjacent normal tissue samples using specifically established RT-qPCR assays. The capacity of circRNAs to discriminate between malignant and adjacent normal tissue samples and their prognostic potential (with the endpoints cancer-specific, recurrence-free, and overall survival) after surgery were estimated by C-statistics, Kaplan-Meier method, univariate and multivariate Cox regression analysis, decision curve analysis, and Akaike and Bayesian information criteria. CircEGLN3 discriminated malignant from normal tissue with 97% accuracy. We generated a prognostic for the three endpoints by multivariate Cox regression analysis that included circEGLN3, circRHOBT3 and linRHOBTB3. The predictive outcome accuracy of the clinical models based on clinicopathological factors was improved in combination with this circRNA-based signature. Bootstrapping as well as Akaike and Bayesian information criteria confirmed the statistical significance and robustness of the combined models. Limitations of this study include its retrospective nature and the lack of external validation. The study demonstrated the promising potential of circRNAs as diagnostic and particularly prognostic biomarkers in ccRCC patients
XRate: a fast prototyping, training and annotation tool for phylo-grammars
BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
- …