3,270 research outputs found
The context-dependence of mutations: a linkage of formalisms
Defining the extent of epistasis - the non-independence of the effects of
mutations - is essential for understanding the relationship of genotype,
phenotype, and fitness in biological systems. The applications cover many areas
of biological research, including biochemistry, genomics, protein and systems
engineering, medicine, and evolutionary biology. However, the quantitative
definitions of epistasis vary among fields, and its analysis beyond just
pairwise effects remains obscure in general. Here, we show that different
definitions of epistasis are versions of a single mathematical formalism - the
weighted Walsh-Hadamard transform. We discuss that one of the definitions, the
backgound-averaged epistasis, is the most informative when the goal is to
uncover the general epistatic structure of a biological system, a description
that can be rather different from the local epistatic structure of specific
model systems. Key issues are the choice of effective ensembles for averaging
and to practically contend with the vast combinatorial complexity of mutations.
In this regard, we discuss possible approaches for optimally learning the
epistatic structure of biological systems.Comment: 6 pages, 3 figures, supplementary informatio
Recommended from our members
Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily.
Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold
Variation of the adaptive substitution rate between species and within genomes
The importance of adaptive mutations in molecular evolution is extensively debated. Recent developments in population genomics allow inferring rates of adaptive mutations by fitting a distribution of fitness effects to the observed patterns of polymorphism and divergence at sites under selection and sites assumed to evolve neutrally. Here, we summarize the current state-of-the-art of these methods and review the factors that affect the molecular rate of adaptation. Several studies have reported extensive cross-species variation in the proportion of adaptive amino-acid substitutions (Ī±) and predicted that species with larger effective population sizes undergo less genetic drift and higher rates of adaptation. Disentangling the rates of positive and negative selection, however, revealed that mutations with deleterious effects are the main driver of this population size effect and that adaptive substitution rates vary comparatively little across species. Conversely, rates of adaptive substitution have been documented to vary substantially within genomes. On a genome-wide scale, gene density, recombination and mutation rate were observed to play a role in shaping molecular rates of adaptation, as predicted under models of linked selection. At the gene level, it has been reported that the gene functional category and the macromolecular structure substantially impact the rate of adaptive mutations. Here, we deliver a comprehensive review of methods used to infer the molecular adaptive rate, the potential drivers of adaptive evolution and how positive selection shapes molecular evolution within genes, across genes within species and between species
Protein function annotation using protein domain family resources
As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protein function prediction using structure and sequence based classification of protein domain family resources with a special focus on functional families in the CATH-Gene3D resource
Unravelling the determinants of the rate of adaptive evolution at the molecular level
Ever since Darwin presented natural selection as a driver of evolution, evolutionary biologists have thrived to understand how beneficial mutations shape species adaptation to their environment. Studying adaptation, however, requires an understanding of the complex dynamics between nucleotides, sequences, proteins, organisms, populations, and species. In other words, it requires assessing the interplay of evolutionary processes across systems. Here, I studied adaptation in such a way by exploring the frequency and nature of adaptive mutations within genes, within genomes, and between species.
At the intramolecular level, this project revealed that the residueās solvent accessibility acts as the primary determinant of rates of adaptive substitutions both in animals and in plants, where adaptive mutations are more frequent at the protein surface. These analyses further showed higher rates of adaptation for genes encoding proteins with central cellular functions, which are the ones usually targeted by pathogens during host infection. These findings, therefore, suggested that protein adaptive evolution proceeds through interactions between molecules, particularly at the interspecific level, where host-pathogen coevolution likely plays a central role.
By taking a step back and looking at adaptation at different time-scales within the genome, this thesis revealed the role of young genes in adaptive evolution. As these genes are further away from their fitness optimum, these findings suggested that proteins adapt in an āadaptive walkā manner. This project further highlighted that the distribution of adaptive mutations across time follows a pattern of diminishing returns.
Looking at an even broader scale by studying adaptation at the species level and considering the effect of intramolecular variation across several animal species, this thesis demonstrated a negative correlation between rates of adaptive substitutions and the effective population size (N_e). Despite the relatively weak signal, these findings contradict initial population genetics theory. Instead, they seem to agree with theoretical expectations at the phenotypic space. In turn, the results regarding negative selection confirm the N_e hypothesis, where the efficiency of selection is stronger in large-N_e species. This effect was well depicted in the differences of the distribution of fitness effects between buried and exposed residues, where the former accumulates comparatively more mild effect mutations in low-N_e species. This project further expanded our findings at the intramolecular level, by revealing the strong influence of the proteinās macromolecular structure on rates of molecular adaptation across several taxa.
By assessing the interplay of adaptive mutations across distinct organizational levels, this thesis provided a more profound understanding of rates of adaptive evolution at the molecular level, thus delivering a comprehensive view of the molecular basis of adaptation
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Revealing evolutionary constraints on proteins through sequence analysis
Statistical analysis of alignments of large numbers of protein sequences has
revealed "sectors" of collectively coevolving amino acids in several protein
families. Here, we show that selection acting on any functional property of a
protein, represented by an additive trait, can give rise to such a sector. As
an illustration of a selected trait, we consider the elastic energy of an
important conformational change within an elastic network model, and we show
that selection acting on this energy leads to correlations among residues. For
this concrete example and more generally, we demonstrate that the main
signature of functional sectors lies in the small-eigenvalue modes of the
covariance matrix of the selected sequences. However, secondary signatures of
these functional sectors also exist in the extensively-studied large-eigenvalue
modes. Our simple, general model leads us to propose a principled method to
identify functional sectors, along with the magnitudes of mutational effects,
from sequence data. We further demonstrate the robustness of these functional
sectors to various forms of selection, and the robustness of our approach to
the identification of multiple selected traits.Comment: 37 pages, 28 figure
Protein co-evolution, co-adaptation and interactions
Co-evolution has an important function in the evolution of species and it is clearly manifested in certain scenarios such as hostāparasite and predatorāprey interactions, symbiosis and mutualism. The extrapolation of the concepts and methodologies developed for the study of species co-evolution at the molecular level has prompted the development of a variety of computational methods able to predict protein interactions through the characteristics of co-evolution. Particularly successful have been those methods that predict interactions at the genomic level based on the detection of pairs of protein families with similar evolutionary histories (similarity of phylogenetic trees: mirrortree). Future advances in this field will require a better understanding of the molecular basis of the co-evolution of protein families. Thus, it will be important to decipher the molecular mechanisms underlying the similarity observed in phylogenetic trees of interacting proteins, distinguishing direct specific molecular interactions from other general functional constraints. In particular, it will be important to separate the effects of physical interactions within protein complexes (āco-adaptation') from other forces that, in a less specific way, can also create general patterns of co-evolution
- ā¦