11 research outputs found
Context dependent selection in molecular evolution
Se ha predicho teóreticamente que la epistasis, es decir, las interacciones genéticas entre diferentes mutaciones, cumple un rol sustancial en procesos evolutivos, tales como la emergencia de la reproducción sexual, la recombinación, la especiación y la evolución adaptativa. Sin embargo, existe poca evidencia experimental o estadística de la ubicuidad de las interacciones epistáticas en la naturaleza. Aquí, estudiamos la evolución de las proteínas a largo plazo, y demostramos que el modelo constante de selección independiente, no es capaz de describir las tasas y patrones de divergencia encontrados en las proteínas: las proteínas divergen mas allá de los límites teóricos y la tasa de divergencia es mucho mas lenta que la esperada. A su vez, demostramos que la evolución de las proteínas se explica mejor bajo la suposición de un intercambio rápido entre los valores de eficacia biológica asociados con aminoácidos individuales. Mas aún, extendemos nuestro estudio computacional y construimos un modelo teórico que captura el efecto de la selección inconstante sobre la evolución molecular.Epistasis, or genetic interactions between different mutations, is theoretically predicted to play a substantial role in such evolutionary processes as emergence of sexual reproduction and recombination, speciation, adaptive evolution. However, there is little experimental or statistical evidence of the ubiquity of epistatic interactions in nature. Here, we study long-term protein evolution and show that the constant independent selection model cannot describe rates and patterns of protein divergence: protein sequences diverge beyond theoretical limits and the rate of divergence is much slower than predicted. We show that protein evolution is best explained under the assumption of rapid turnover of fitness values associated with individual amino acids. We further extend this computational study and build a theoretical model to capture the effect of non-constant selection on molecular evolution
Context dependent selection in molecular evolution
Se ha predicho teóreticamente que la epistasis, es decir, las interacciones genéticas entre diferentes mutaciones, cumple un rol sustancial en procesos evolutivos, tales como la emergencia de la reproducción sexual, la recombinación, la especiación y la evolución adaptativa. Sin embargo, existe poca evidencia experimental o estadística de la ubicuidad de las interacciones epistáticas en la naturaleza. Aquí, estudiamos la evolución de las proteínas a largo plazo, y demostramos que el modelo constante de selección independiente, no es capaz de describir las tasas y patrones de divergencia encontrados en las proteínas: las proteínas divergen mas allá de los límites teóricos y la tasa de divergencia es mucho mas lenta que la esperada. A su vez, demostramos que la evolución de las proteínas se explica mejor bajo la suposición de un intercambio rápido entre los valores de eficacia biológica asociados con aminoácidos individuales. Mas aún, extendemos nuestro estudio computacional y construimos un modelo teórico que captura el efecto de la selección inconstante sobre la evolución molecular.Epistasis, or genetic interactions between different mutations, is theoretically predicted to play a substantial role in such evolutionary processes as emergence of sexual reproduction and recombination, speciation, adaptive evolution. However, there is little experimental or statistical evidence of the ubiquity of epistatic interactions in nature. Here, we study long-term protein evolution and show that the constant independent selection model cannot describe rates and patterns of protein divergence: protein sequences diverge beyond theoretical limits and the rate of divergence is much slower than predicted. We show that protein evolution is best explained under the assumption of rapid turnover of fitness values associated with individual amino acids. We further extend this computational study and build a theoretical model to capture the effect of non-constant selection on molecular evolution
Stop codons in bacteria are not selectively equivalent
Background: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. Results: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 16% TGA has a higher fitness than TAG. Conclusions: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.The work has been supported by a Plan Nacional grant from the Spanish Ministry of Science and Innovation, EMBO Young Investigator and Howard Hughes Medical Institute International Early Career Scientist awards
Rate of sequence divergence under constant selection
BACKGROUND: Divergence of two independently evolving sequences that originated from a common ancestor can be described by two parameters, the asymptotic level of divergence E and the rate r at which this level of divergence is approached. Constant negative selection impedes allele replacements and, therefore, is routinely assumed to decelerate sequence divergence. However, its impact on E and on r has not been formally investigated. RESULTS: Strong selection that favors only one allele can make E arbitrarily small and r arbitrarily large. In contrast, in the case of 4 possible alleles and equal mutation rates, the lowest value of r, attained when two alleles confer equal fitnesses and the other two are strongly deleterious, is only two times lower than its value under selective neutrality. CONCLUSIONS: Constant selection can strongly constrain the level of sequence divergence, but cannot reduce substantially the rate at which this level is approached. In particular, under any constant selection the divergence of sequences that accumulated one substitution per neutral site since their origin from the common ancestor must already constitute at least one half of the asymptotic divergence at sites under such selectio
Stop codons in bacteria are not selectively equivalent
Background: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. Results: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 16% TGA has a higher fitness than TAG. Conclusions: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.The work has been supported by a Plan Nacional grant from the Spanish Ministry of Science and Innovation, EMBO Young Investigator and Howard Hughes Medical Institute International Early Career Scientist awards
The ctenophore genome and the evolutionary origins of neural systems
The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we present the draft genome of Pleurobrachia bachei, Pacific sea gooseberry, together with ten other ctenophore transcriptomes, and show that they are remarkably distinct from other animal genomes in their content of neurogenic, immune and developmental genes. Our integrative analyses place Ctenophora as the earliest lineage within Metazoa. This hypothesis is supported by comparative analysis of multiple gene families, including the apparent absence of HOX genes, canonical microRNA machinery, and reduced immune complement in ctenophores. Although two distinct nervous systems are well recognized in ctenophores, many bilaterian neuron-specific genes and genes of 'classical' neurotransmitter pathways either are absent or, if present, are not expressed in neurons. Our metabolomic and physiological data are consistent with the hypothesis that ctenophore neural systems, and possibly muscle specification, evolved independently from those in other animals.This work was supported by NSF (NSF-0744649 and NSF CNS-0821622 to L.L.M.; NSF CHE-1111705 to J.V.S.), NIH (1R01GM097502, R01MH097062, R21RR025699 and 5R21DA030118 to L.L.M.; P30 DA018310 to J.V.S.; R01 AG029360 and 1S10RR027052 to E.I.R.), NASA/nNNX13AJ31G (to K.M.H., L.L.M. and K.M.K.), NSERC 458115 and 211598 (J.P.R.), University of Florida Opportunity Funds/McKnight Brain Research and Florida Biodiversity Institute (L.L.M.), Rostock Inc./A.V. Chikunov (E.I.R.), grant from Russian Federation Government 14.B25.31.0033 (Resolution No.220) (E.I.R.). F.A.K., I.S.P. and R.D.were supported by HHMI(55007424),EMBO and MINECO(BFU2012-31329 and Sev-2012-0208). Contributions of AU Marine Biology Program 117 and Molette laboratory 22
Copy number variation underlies complex phenotypes in domestic dog breeds and other canids
Extreme phenotypic diversity, a history of artificial selection, and socioeconomic value make domestic dog breeds a compelling subject for genomic research. Copy number variation (CNV) is known to account for a significant part of inter-individual genomic diversity in other systems. However, a comprehensive genome-wide study of structural variation as it relates to breed-specific phenotypes is lacking. We have generated whole genome CNV maps for more than 300 canids. Our data set extends the canine structural variation landscape to more than 100 dog breeds, including novel variants that cannot be assessed using microarray technologies. We have taken advantage of this data set to perform the first CNV-based genome-wide association study (GWAS) in canids. We identify 96 loci that display copy number differences across breeds, which are statistically associated with a previously compiled set of breed-specific morphometrics and disease susceptibilities. Among these, we highlight the discovery of a long-range interaction involving a CNV near MED13L and TBX3, which could influence breed standard height. Integration of the CNVs with chromatin interactions, long noncoding RNA expression, and single nucleotide variation highlights a subset of specific loci and genes with potential functional relevance and the prospect to explain trait variation between dog breeds.J.P. and E.A.O. were funded by the Intramural Program of the National Human Genome Research Institute of the National Institutes of Health. T.M.-B. was funded by European Research Council ERC-CON-2019-864203, BFU2017-86471-P (MINECO/FEDER, UE), “Unidad de Excelencia María de Maeztu,” funded by the Agencia Estatal de Investigación (CEX2018-000792-M), Howard Hughes International Early Career, Obra Social “La Caixa” and Secretaria d'Universitats i Recerca and CERCA Programme del Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880)
A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0)
The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.J.G.G. is funded by the RED-BIO project of the Spanish National Bioinformatics Institute (INB) under grant number PT13/0001/0044. The INB is funded by the Spanish National Health Institute Carlos III (ISCIII) and the Spanish Ministry of Economy and Competitiveness (MINECO). L.F.K.K. is supported by an FPI fellowship associated with BFU2014-55090-P (FEDER); L.F. is supported by the Swedish Foundation for Strategic Research F06-0045 and the Swedish Research Council; E.E.E. is an investigator of the Howard Hughes Medical Institute. A.J.S. is supported by US National Institutes of Health (NIH) grants DA033660, HG006696, HD073731, and MH097018, and research grant 6-FY13-92 from the March of Dimes. This work was supported, in part, by grants from the NIH (grants R01HG002385 and U24HG009081 to E.E.E., HG007990 and HG007234 to B.P.). T.M.B. is supported by MINECO BFU2014-55090-P (FEDER), BFU2015-7116-ERC, and BFU2015-6215-ERC, Fundacio Zoo Barcelona and Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya
An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.The work was supported by HHMI International Early Career Scientist Program (55007424), the MINECO (BFU2012-31329, BFU2012-37168, BFU2015-68351-P and BFU2015-68723-P), Spanish Ministry of Economy and Competitiveness Centro de Excelencia Severo Ochoa 2013-2017 grant (SEV-2012-0208), the Unidad de Excelencia María de Maeztu funded by the MINECO (MDM-2014-0370), Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat AGAUR program (2014 SGR 0974), the CERCA Programme of the Generalitat de Catalunya, Russian Foundation for Basic Research grant (18-04-01173), the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie programme (665385) and the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013, ERC grant agreement 335980_EinME and Synergy Grant 609989). KSS was supported by EMBO long-term fellowship (ALTF 107-2016). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta)
The rhesus macaque is an abundant species of Old World monkeys and a valuable model organism for biomedical research due to its close phylogenetic relationship to humans. Copy number variation is one of the main sources of genomic diversity within and between species and a widely recognized cause of inter-individual differences in disease risk. However, copy number differences among rhesus macaques and between the human and macaque genomes, as well as the relevance of this diversity to research involving this nonhuman primate, remain understudied. Here we present a high-resolution map of sequence copy number for the rhesus macaque genome constructed from a dataset of 198 individuals. Our results show that about one-eighth of the rhesus macaque reference genome is composed of recently duplicated regions, either copy number variable regions or fixed duplications. Comparison with human genomic copy number maps based on previously published data shows that, despite overall similarities in the genome-wide distribution of these regions, there are specific differences at the chromosome level. Some of these create differences in the copy number profile between human disease genes and their rhesus macaque orthologs. Our results highlight the importance of addressing the number of copies of target genes in the design of experiments and cautions against human-centered assumptions in research conducted with model organisms. Overall, we present a genome-wide copy number map from a large sample of rhesus macaque individuals representing an important novel contribution concerning the evolution of copy number in primate genomes.This work was supported in part by NIH grants R24-OD011173 to J.R., UM1-HG008898 to R.A.H., AGAUR (FI – DGR 2015) to M.B.-V., BFU2017-86471-P (MINECO/FEDER, UE), Howard Hughes International Early Career, Obra Social "La Caixa" and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880) to T.M.B. D.J. is supported by Juan de la Cierva fellowship (FJCI-2016-29558) from MICINN. In addition, we wish to acknowledge NIH grant R24-OD010962 to J. Capitanio which supported the development of the Biobehavioral Assessment resource and associated DNA samples from California NPRC rhesus macaques. We also acknowledge NIH grant support to specific National Primate Research Centers: California NPRC (OD011107) and Oregon NPRC (OD011092 and OD021324). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript