Search CORE

534 research outputs found

The use of information theory in evolutionary biology

Author: Adami
Adami
Adami
Adami
Adami
Adami
Ash
Atchley
Ay
Ay
Ay
Balduzzi
Balduzzi
Barrick
Basharin
Benner
Bialek
Billeter
Blount
Callahan
Carothers
Clarke
Cooper
Cover
da Silva
Darwin
Eddy
Edlund
Ewens
Federhen
Finn
Fletcher
Fletcher
Futuyma
Garcia-Horsman
Hartl
Iliopoulos
Jühling
Klyubin
Korber
Kryazhimskiy
Landauer
Lenski
Lenski
Lenski
Levy
Li
Linsker
Lungarella
Lungarella
Maynard Smith
McGill
Pauling
Polani
Queller
Rivoire
Robinson
Schneidman
Scott
Shannon
Sporns
Sporns
Taanman
Thornton
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
Tononi
van der Graaff
Waddington
Wahl
Wang
Wang
Wiener
Woods
Zahedi
Publication venue: 'Wiley'
Publication date: 16/12/2011
Field of study

Information is a key concept in evolutionary biology. Information is stored in biological organism's genomes, and used to generate the organism as well as to maintain and control it. Information is also "that which evolves". When a population adapts to a local environment, information about this environment is fixed in a representative genome. However, when an environment changes, information can be lost. At the same time, information is processed by animal brains to survive in complex environments, and the capacity for information processing also evolves. Here I review applications of information theory to the evolution of proteins as well as to the evolution of information processing in simulated agents that adapt to perform a complex task.Comment: 25 pages, 7 figures. To appear in "The Year in Evolutionary Biology", of the Annals of the NY Academy of Science

arXiv.org e-Print Archive

Crossref

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

Author: Greener Joe G
Jones David T
Kandathil Shaun M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/09/2019
Field of study

The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.Comment: JGG and SMK contributed equally to the wor

arXiv.org e-Print Archive

UCL Discovery

Recent Developments in Deep Learning Applied to Protein Structure Prediction

Author: Greener JG
Jones DT
Kandathil SM
Publication venue
Publication date: 01/01/2019
Field of study

Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result which can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls. This article is protected by copyright. All rights reserved

UCL Discovery

Conserved Geometrical Base-Pairing Patterns in RNA

Author: Leontis Neocles B.
Westhof Eric
Publication venue: ScholarWorks@BGSU
Publication date: 01/11/1998
Field of study

RNA molecules fold into a bewildering variety of complex 3D structures. Almost every new RNA structure obtained at high resolution reveals new, unanticipated structural motifs, which we are rarely able to predict at the current stage of our theoretical understanding. Even at the most basic level of specific RNA interactions – base-to-base pairing – new interactions continue to be uncovered as new structures appear. Compilations of possible non-canonical base-pairing geometries have been presented in previous reviews and monographs (Saenger, 1984; Tinoco, 1993). In these compilations, the guiding principle applied was the optimization of hydrogen-bonding. All possible pairs with two standard H-bonds were presented and these were organized according to symmetry or base type. However, many of the features of RNA base-pairing interactions that have been revealed by high-resolution crystallographic analysis could not have been anticipated and, therefore were not incorporated into these compilations. These will be described and classified in the present review. A recently presented approach for inferring basepair geometry from patterns of sequence variation (Gautheret & Gutell, 1997) relied on the 1984 compilation of basepairs (Saenger, 1984), and was extended to include all possible single H-bond combinations not subject to steric clashes. Another recent review may be consulted for a discussion of the NMR spectroscopy and thermodynamic effects of non-canonical (‘mismatched’) RNA basepairs on duplex stability (Limmer, 1997)

Crossref

Bowling Green State University: ScholarWorks@BGSU

Quantum Chemical Studies Of Nucleic Acids Can We Construct A Bridge To The Rna Structural Biology And Bioinformatics Communities?

Author: Leontis Neocles B.
Petrov Anton I.
Sponer Judit
Sponer , Jiri
Publication venue: ScholarWorks@BGSU
Publication date: 01/12/2010
Field of study

In this feature article we provide a side-by-side introduction for two research fields quantum chemical calculations of molecular interaction in nucleic acids and RNA structural bioinformatics Our main aim is to demonstrate that these research areas while largely separated in contemporary literature have substantial potential to complement each other that could significantly contribute to our understanding of the exciting world of nucleic acids We identify research questions amenable to the combined application of modern ab initio methods and bioinformatics analysis of experimental structures while also assessing the limitations of these approaches The ultimate aim is to attain valuable physicochemical insights regarding the nature of the fundamental molecular interactions and how they shape RNA structures, dynamics, function, and evolution

PubMed Central

Bowling Green State University: ScholarWorks@BGSU

Application of coevolution-based methods and deep learning for structure prediction of protein complexes

Author: Desai Nikita
Publication venue: UCL (University College London)
Publication date: 28/07/2023
Field of study

The three-dimensional structures of proteins play a critical role in determining their biological functions and interactions. Experimental determination of protein and protein complex structures can be expensive and difficult. Computational prediction of protein and protein complex structures has therefore been an open challenge for decades. Recent advances in computational structure prediction techniques have resulted in increasingly accurate protein structure predictions. These techniques include methods that leverage information about coevolving residues to predict residue interactions and that apply deep learning techniques to enable better prediction of residue contacts and protein structures. Prior to the work outlined in this thesis, coevolution-based methods and deep learning had been shown to improve the prediction of single protein domains or single protein chains. Most proteins in living organisms do not function on their own but interact with other proteins either through transient interactions or by forming stable protein complexes. Knowledge of protein complex structures can be useful for biological and disease research, drug discovery and protein engineering. Unfortunately, a large number of protein complexes do not have experimental structures or close homolog structures that can be used as templates. In this thesis, methods previously developed and applied to the de novo prediction of single protein domains or protein monomer chains were modified and leveraged for the prediction of protein heterodimer and homodimer complexes. A number of coevolution-based tools and deep learning methods are explored for the purpose of predicting inter-chain and intra-chain residue contacts in protein dimers. These contacts are combined with existing protein docking methods to explore the prediction of homodimers and heterodimers. Overall, the work in this thesis demonstrates the promise of leveraging coevolution and deep-learning for the prediction of protein complexes, shows improvements in protein complex prediction tasks achieved using coevolution based methods and deep learning methods, and demonstrates remaining challenges in protein complex prediction

UCL Discovery

In silico identification of functional divergence between the multiple groEL gene paralogs in Chlamydiae

Author: Fares Mario A
McNally David
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Heat-shock proteins are specialized molecules performing different and essential roles in the cell including protein degradation, folding and trafficking. GroEL is a 60 Kda heat-shock protein ubiquitous in bacteria and has been regarded as an important molecule implicated in chronic inflammatory processes caused by <it>Chlamydiae </it>infections. GroEL in <it>Chlamydiae </it>became duplicated at the origin of the <it>Chlamydiae </it>lineage presenting three distinct molecular chaperones, namely the original protein GroEL1 (Ct110), and its paralogous proteins GroEL2 (Ct604) and GroEL3 (Ct755). These chaperones present differential and independent expressions during the different stages of <it>Chlamydiae </it>infections and have been suggested to present differential physiological and regulatory roles. Results In this comprehensive <it>in silico </it>study we show that GroEL protein paralogs have diverged functionally after the different gene duplication events and that this divergence has occurred mainly between GroEL3 and GroEL1. GroEL2 presents an intermediate functional divergence pattern from GroEL1. Our results point to the different protein-protein interaction patterns between GroEL paralogs and known GroEL protein clients supporting their functional divergence after <it>groEL </it>gene duplication. Analysis of selective constraints identifies periods of adaptive evolution after gene duplication that led to the fixation of amino acid replacements in GroEL protein domains involved in the interaction with GroEL protein clients. Conclusion We demonstrate that GroEL protein copies in <it>Chlamydiae </it>species have diverged functionally after the gene duplication events. We also show that functional divergence has occurred in important functional regions of these GroEL proteins and that very probably have affected the ancestral GroEL regulatory role and protein-protein interaction patterns with GroEL client proteins. Most of the amino acid replacements that have affected interaction with protein clients and that were responsible for the functional divergence between GroEL paralogs were fixed by adaptive evolution after the <it>groEL </it>gene duplication events.</p

Crossref

Directory of Open Access Journals

Irish Universities

PubMed Central

Holding it together: rapid evolution and positive selection in the synaptonemal complex of Drosophila

Author: A Civetta
A Erber
A Kouznetsova
A Lorenz
A Loytynoja
A Peter
AG Clark
AMA Aguinaldo
AT Carpenter
B Charlesworth
CM Lake
D Nurminsky
D Obeso
D von Wettstein
D Zickler
DG Torgerson
DJ Begun
DJ Wilson
DW Fawcett
E Baudry
EA Manheim
EF Joyce
EL Kurdzo
EM Gertz
F Tajima
HA Webber
HS Malik
HS Malik
HS Malik
J Fraune
J Fraune
J Huerta-Cepas
J Loidl
J Loidl
J Rozas
JA Anderson
JC Hall
JE Pool
JH McDonald
JH Thomas
JM Mason
JP Demuth
JR True
Justin P. Blumenstiel
K Katoh
KA Collins
L Chmatal
L Fishman
L Fishman
LK Anderson
Lucas W. Hemmer
LW Olson
M Egelmitani
M Kearse
M Nozawa
MA Miller
ME Zwick
MJ Moses
MS Barker
N Christophorou
NG Smith
NL Clark
NS Tanneti
NW Wolfe
P Andolfatto
P Duchen
PS Carlson
R Egea
R Egel
R Nielsen
R Yan
RC Edgar
RD Finn
RL Rogers
RM Waterhouse
RS Khetani
RV Samonte
S Beisswanger
S Henikoff
S Jagadeeshan
S Takeo
SE Bickel
SE Bickel
SJ Marygold
SL Page
SL Page
SL Page
SL Pond
SL Pond
SL Pond
SW Rasmusse
T Cruickshank
T Haaf
T Tsubouchi
TF Mackay
TM Grishaeva
TT Hu
V Benassi
V Nolte
V Solovyev
W Haerty
WD Hamilton
WHE Day
WJ Swanson
WJ Swanson
WY Miyazaki
Y Brandvain
Y Costa
Z Yang
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background The synaptonemal complex (SC) is a highly conserved meiotic structure that functions to pair homologs and facilitate meiotic recombination in most eukaryotes. Five Drosophila SC proteins have been identified and localized within the complex: C(3)G, C(2)M, CONA, ORD, and the newly identified Corolla. The SC is required for meiotic recombination in Drosophila and absence of these proteins leads to reduced crossing over and chromosomal nondisjunction. Despite the conserved nature of the SC and the key role that these five proteins have in meiosis in D. melanogaster, they display little apparent sequence conservation outside the genus. To identify factors that explain this lack of apparent conservation, we performed a molecular evolutionary analysis of these genes across the Drosophila genus. Results For the five SC components, gene sequence similarity declines rapidly with increasing phylogenetic distance and only ORD and C(2)M are identifiable outside of the Drosophila genus. SC gene sequences have a higher dN/dS (ω) rate ratio than the genome wide average and this can in part be explained by the action of positive selection in almost every SC component. Across the genus, there is significant variation in ω for each protein. It further appears that ω estimates for the five SC components are in accordance with their physical position within the SC. Components interacting with chromatin evolve slowest and components comprising the central elements evolve the most rapidly. Finally, using population genetic approaches, we demonstrate that positive selection on SC components is ongoing. Conclusions SC components within Drosophila show little apparent sequence homology to those identified in other model organisms due to their rapid evolution. We propose that the Drosophila SC is evolving rapidly due to two combined effects. First, we propose that a high rate of evolution can be partly explained by low purifying selection on protein components whose function is to simply hold chromosomes together. We also propose that positive selection in the SC is driven by its sex-specificity combined with its role in facilitating both recombination and centromere clustering in the face of recurrent bouts of drive in female meiosis

Crossref

Springer - Publisher Connector

KU ScholarWorks

PubMed Central

Tertiary Alphabet for the Observable Protein Structural Universe

Author: Grigoryan Gevorg
Mackenzie Craig\ O
Zhou Jianfu
Publication venue: Dartmouth Digital Commons
Publication date: 03/11/2016
Field of study

Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure

PubMed Central

Dartmouth Digital Commons (Dartmouth College)