Search CORE

2,602 research outputs found

Computational Molecular Coevolution

Author: Dickson Russell J
Publication venue: Scholarship@Western
Publication date: 13/12/2013
Field of study

A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue contact. Thus, high-quality multiple sequence alignments and robust coevolution-inferring statistics can produce structural information from sequence alone. This work characterizes the relationship between coevolution statistics and sequence alignments and highlights the implicit assumptions and caveats associated with coevolutionary inference. An investigation of sequence alignment quality and coevolutionary-inference methods revealed that such methods are very sensitive to the systematic misalignments discovered in public databases. However, repairing the misalignments in such alignments restores the predictive power of coevolution statistics. To overcome the sensitivity to misalignments, two novel coevolution-inferring statistics were developed that show increased contact prediction accuracy, especially in alignments that contain misalignments. These new statistics were developed into a suite of coevolution tools, the MIpToolset. Because systematic misalignments produce a distinctive pattern when analyzed by coevolution-inferring statistics, a new method for detecting systematic misalignments was created to exploit this phenomenon. This new method called ``local covariation\u27\u27 was used to analyze publicly-available multiple sequence alignment databases. Local covariation detected putative misalignments in a database designed to benchmark sequence alignment software accuracy. Local covariation was incorporated into a new software tool, LoCo, which displays regions of potential misalignment during alignment editing assists in their correction. This work represents advances in multiple sequence alignment creation and coevolutionary inference

Scholarship@Western

COMPASS: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical Significance

Author: Altschul
Altschul
Altschul
Bandyopadhyay
Bateman
Berg
Dayhoff
Dembo
Dietmann
Dodd
Doolittle
Doolittle
Durbin
Eddy
Eskin
Gnedenko
Gotoh
Gounari
Gribskov
Grishin
Gronostajski
Gumbel
Heger
Heldin
Henikoff
Henikoff
Henikoff
Holm
Holm
Holm
Karlin
Karlin
Karplus
Kim
Kraulis
Krogh
Kunin
Lang
Lawrence
Letunic
Luthy
Massague
Massague
McCullagh
Mermod
Moustakas
Murzin
Pei
Pietrokovski
Rychlewski
Sauder
Schaffer
Schaffer
Schneider
Shi
Sjolander
Smith
Staden
Stormo
Sunyaev
Tatusov
Vogt
Yona
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

Author: Bauer Markus
Klau Gunnar W
Reinert Knut
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusions: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of inpu

CiteSeerX

Institutional Repository of the Freie Universität Berlin

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

ShapeSorter: a fully probabilistic method for detecting conserved RNA structure features supported by SHAPE evidence

Author: Meyer Irmtraud M.
Tsybulskyi Volodymyr
Publication venue
Publication date: 01/01/2022
Field of study

There is an increased interest in the determination of RNA structures in vivo as it is now possible to probe them in a high-throughput manner, e.g. using SHAPE protocols. By now, there exist a range of computational methods that integrate experimental SHAPE-probing evidence into computational RNA secondary structure prediction. The state-of-the-art in this field is currently provided by computational methods that employ the minimum-free energy strategy for prediction RNA secondary structures with SHAPE-probing evidence. These methods, however, rely on the assumption that transcripts in vivo fold into the thermodynamically most stable configuration and ignore evolutionary evidence for conserved RNA structure features. We here present a new computational method, ShapeSorter, that predicts RNA structure features without employing the thermodynamic strategy. Instead, ShapeSorter employs a fully probabilistic framework to identify RNA structure features that are supported by evolutionary and SHAPE-probing evidence. Our method can capture RNA structure heterogeneity, pseudo-knotted RNA structures as well as transient and mutually exclusive RNA structure features. Moreover, it estimates P-values for the predicted RNA structure features which allows for easy filtering and ranking. We investigate the merits of our method in a comprehensive performance benchmarking and conclude that ShapeSorter has a significantly superior performance for predicting base-pairs than the existing state-of-the-art methods

Institutional Repository of the Freie Universität Berlin

PubMed Central

MDC Repository

Dinucleotide controlled null models for comparative RNA gene prediction

Author: A Coventry
A Rambaut
A Siepel
AM Pedersen
AV Uzilov
C del Val
C Lanave
C Weile
C Workman
D Karolchik
D Metzler
D Rose
DM Robinson
DR Forsdyke
E Rivas
E Torarinsson
G Lunter
I Miklós
IL Hofacker
J Felsenstein
J Jensen
J Thorne
J Thorne
K Missal
K Missal
L Duret
M Blanchette
M Hasegawa
M Schöniger
M Schöniger
O Gascuel
OF Christensen
P Clote
PF Arndt
R Backofen
R Fleißner
S Griffiths-Jones
S Griffiths-Jones
S Guindon
S Tavaré
S Washietl
S Washietl
S Washietl
S Washietl
S Washietl
SF Altschul
Stefan Washietl
T Babak
T Gesell
T Mourier
T Sandmann
Tanja Gesell
YVan de Peer
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. Availability SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

AB INITIO PROTEIN STRUCTURE PREDICTION ALGORITHMS

Author: Kicinski Maciej
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2011
Field of study

Genes that encode novel proteins are constantly being discovered and added to databases, but the speed with which their structures are being determined is not keeping up with this rate of discovery. Currently, homology and threading methods perform the best for protein structure prediction, but they are not appropriate to use for all proteins. Still, the best way to determine a protein\u27s structure is through biological experimentation. This research looks into possible methods and relations that pertain to ab initio protein structure prediction. The study includes the use of positional and transitional probabilities of amino acids obtained from a non-redundant set of proteins created by Jpred for training computational methods. The methods this study focuses on are Hidden Markov Models and incorporating neighboring amino acids in the primary structure of proteins with the above-mentioned probabilities. The methods are presented to predict the secondary structure of amino acids without relying on the existence of a homolog. The main goal of this research is to be able to obtain information from an amino acid sequence that could be used for all future predictions of protein structures. Further, analysis of the performance of the methods is presented for explanation of how they could be incorporated in current and future work

SJSU ScholarWorks

Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution

Author: AA Fodor
BG Giraud
BT Korber
CH Yeang
CS Miller
CT Porter
D Juan
Daniel Y. Little
EF Pettersen
EN Baker
ER Tillier
F Pazos
F Pazos
G Shackelford
GB Gloor
H Berman
HJ Ahn
HM Berman
I Kass
JL King
KA Buss
KK Kim
KR Wollenberg
KY Yip
L Burger
LC Martin
Lu Chen
M Crisma
M Kimura
NJ Skelton
O Olmea
P Fariselli
R Gouveia-Oliveira
RD Finn
RD Finn
S Miyazawa
SA Travers
SD Dunn
Shin-Han Shiu
U Gobel
WM Fitch
Z Wang
ZO Wang
Publication venue: Public Library of Science
Publication date: 10/03/2009
Field of study

The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An analysis of the Sargasso Sea resource and the consequences for database composition

Author: Cozzetto D
Tramontano A
Tress ML
Valencia A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.Results: The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.Conclusion: These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Digital.CSIC

Discovering Sequence Motifs with Arbitrary Insertions and Deletions

Author: A Bahr
A Bairoch
A Hansson
A Reményi
A Sandelin
AF Neuwald
AF Neuwald
AF Neuwald
B Kobe
Bostjan Kobe
C Grasso
CB Do
CC Yap
CE Lawrence
D Caffrey
E de Castro
F Diella
FP Roth
G Pavesi
Gary Stormo
I Jonassen
IA Wadman
J van Helden
J van Helden
J Zhu
JG Henikoff
JJ Welch
JS Liu
JS Mattick
K Karplus
K Karplus
K Shida
K Sjölander
L Vitelli
M Ashburner
Martin C. Frith
MC Frith
MS Waterman
N Hulo
Neil F. W. Saunders
NK Kim
P Puntervoll
P Vyas
R Amanchy
R Durbin
R Hughey
R Lahlil
RC Edgar
RM Böhmer
S Sinha
SA Johnson
SR Eddy
T Beissbarth
T Lassmann
T Yada
TD Schneider
Timothy L. Bailey
TK Attwood
TL Bailey
TL Bailey
V Deleuze
V Matys
WW Wasserman
X Liu
XS Liu
Y Makita
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Role of 3′UTRs in the Translation of mRNAs Regulated by Oncogenic eIF4E—A Computational Inference

Author: A Eulalio
A Stark
AC Gingras
AC Gingras
AE Koromilas
Arti N. Santhanam
Bruce A. Shapiro
CC Chang
DH Mathews
DP Bartel
E Bindewald
Eckart Bindewald
G Mathonnet
H Hirling
IL Hofacker
Jason E. Stajich
JB Cowland
JR Graff
JW Hershey
L He
LM Shantz
LS Hon
M Harvey
M Kertesz
M Metzler
MS Kumar
Nahum Sonenberg
Nancy H. Colburn
NK Gray
O Larsson
O Larsson
O Meyuhas
Ola Larsson
PM Voorhoeve
R Sandberg
RC Gentleman
S Griffiths-Jones
S Griffiths-Jones
S Nottrott
SA Shabalina
SF Altschul
SG Zimmer
SH Bernhart
TG Lawson
TJ Macke
VA Polunovsky
Vinagolu K. Rajasekhar
VK Rajasekhar
W Filipowicz
WHWWA Kruskall
Y Akao
Y Mamane
Y Tsuchiya
Z Wu
Z Wu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Eukaryotic cap-dependent mRNA translation is mediated by the initiation factor eIF4E, which binds mRNAs and stimulates efficient translation initiation. eIF4E is often overexpressed in human cancers. To elucidate the molecular signature of eIF4E target mRNAs, we analyzed sequence and structural properties of two independently derived polyribosome recruited mRNA datasets. These datasets originate from studies of mRNAs that are actively being translated in response to cells over-expressing eIF4E or cells with an activated oncogenic AKT: eIF4E signaling pathway, respectively. Comparison of eIF4E target mRNAs to mRNAs insensitive to eIF4E-regulation has revealed surprising features in mRNA secondary structure, length and microRNA-binding properties. Fold-changes (the relative change in recruitment of an mRNA to actively translating polyribosomal complexes in response to eIF4E overexpression or AKT upregulation) are positively correlated with mRNA G+C content and negatively correlated with total and 3′UTR length of the mRNAs. A machine learning approach for predicting the fold change was created. Interesting tendencies of secondary structure stability are found near the start codon and at the beginning of the 3′UTR region. Highly upregulated mRNAs show negative selection (site avoidance) for binding sites of several microRNAs. These results are consistent with the emerging model of regulation of mRNA translation through a dynamic balance between translation initiation at the 5′UTR and microRNA binding at the 3′UTR

Crossref

Directory of Open Access Journals

PubMed Central