43 research outputs found

    Evolutionary trajectories of new duplicated and putative de novo genes

    Get PDF
    The formation of new genes during evolution is an important motor of functional innovation, but the rate at which new genes originate and the likelihood that they persist over longer evolutionary periods are still poorly understood questions. Two important mechanisms by which new genes arise are gene duplication and de novo formation from a previously noncoding sequence. Does the mechanism of formation influence the evolutionary trajectories of the genes? Proteins arisen by gene duplication retain the sequence and structural properties of the parental protein, and thus they may be relatively stable. Instead, de novo originated proteins are often species specific and thought to be more evolutionary labile. Despite these differences, here we show that both types of genes share a number of similarities, including low sequence constraints in their initial evolutionary phases, high turnover rates at the species level, and comparable persistence rates in deeper branchers, in both yeast and flies. In addition, we show that putative de novo proteins have an excess of substitutions between charged amino acids compared with the neutral expectation, which is reflected in the rapid loss of their initial highly basic character. The study supports high evolutionary dynamics of different kinds of new genes at the species level, in sharp contrast with the stability observed at later stages.We acknowledge funding from Ministerio de Ciencia e Innovación Agencia Estatal de Investigación grant PGC2018-094091-B-I00 (cofunded by Fondo Europeo de Desarrollo Regional), as well as grants PID2021-122726NB-I00 and PID2021-122830OB-C43 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF: A way of making Europe”, by the “European Union”. We also acknowledge funding from Generalitat de Catalunya, grant 2021SGR00042. The work was also funded by the European Union (ERC, NovoGenePop, project number 101052538).Peer ReviewedPostprint (published version

    Conserved regions in long non-coding RNAs contain abundant translation and protein–RNA interaction signatures

    No full text
    The mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes and that are known as long non-coding RNAs (lncRNAs). A handful of lncRNAs have well-characterized regulatory functions but the biological significance of the majority of them is not well understood. LncRNAs that are conserved between mice and humans are likely to be enriched in functional sequences. Here, we investigate the presence of different types of ribosome profiling signatures in lncRNAs and how they relate to sequence conservation. We find that lncRNA-conserved regions contain three times more ORFs with translation evidence than non-conserved ones, and identify nine cases that display significant sequence constraints at the amino acid sequence level. The study also reveals that conserved regions in intergenic lncRNAs are significantly enriched in protein–RNA interaction signatures when compared to non-conserved ones; this includes sites in well-characterized lncRNAs, such as Cyrano, Malat1, Neat1 and Meg3, as well as in tens of lncRNAs of unknown function. This work illustrates how the analysis of ribosome profiling data coupled with evolutionary analysis provides new opportunities to explore the lncRNA functional landscape

    Uncovering adaptive evolution in the human lineage

    Get PDF
    Background: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. Results: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. Conclusions: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.This work was financially supported by the Ministerio de Economía y Competitividad from the Spanish Government (Plan Nacional project BFU2012-36820), and Institució Catalana de Recerca i Estudis Avançats (ICREA) from Generalitat de Cataluny

    Uncovering adaptive evolution in the human lineage

    No full text
    Background: The recent increase in human polymorphism data, together with the availability of genome sequences from several primate species, provides an unprecedented opportunity to investigate how natural selection has shaped human evolution. Results: We compared human branch-specific substitutions with variation data in the current human population to measure the impact of adaptive evolution on human protein coding genes. The use of single nucleotide polymorphisms (SNPs) with high derived allele frequencies (DAFs) minimized the influence of segregating slightly deleterious mutations and improved the estimation of the number of adaptive sites. Using DAF ≥ 60% we showed that the proportion of adaptive substitutions is 0.2% in the complete gene set. However, the percentage rose to 40% when we focused on genes that are specifically accelerated in the human branch with respect to the chimpanzee branch, or on genes that show signatures of adaptive selection at the codon level by the maximum likelihood based branch-site test. In general, neural genes are enriched in positive selection signatures. Genes with multiple lines of evidence of positive selection include taxilin beta, which is involved in motor nerve regeneration and syntabulin, and is required for the formation of new presynaptic boutons. Conclusions: We combined several methods to detect adaptive evolution in human coding sequences at a genome-wide level. The use of variation data, in addition to sequence divergence information, uncovered previously undetected positive selection signatures in neural genes.This work was financially supported by the Ministerio de Economía y Competitividad from the Spanish Government (Plan Nacional project BFU2012-36820), and Institució Catalana de Recerca i Estudis Avançats (ICREA) from Generalitat de Cataluny

    Positional bias of general and tissue-specific regulatory motifs in mouse gene promoters

    Get PDF
    Background: The arrangement of regulatory motifs in gene promoters, or promoter/narchitecture, is the result of mutation and selection processes that have operated over many/nmillions of years. In mammals, tissue-specific transcriptional regulation is related to the presence of/nspecific protein-interacting DNA motifs in gene promoters. However, little is known about the/nrelative location and spacing of these motifs. To fill this gap, we have performed a systematic search/nfor motifs that show significant bias at specific promoter locations in a large collection of/nhousekeeping and tissue-specific genes./nResults: We observe that promoters driving housekeeping gene expression are enriched in/nparticular motifs with strong positional bias, such as YY1, which are of little relevance in promoters/ndriving tissue-specific expression. We also identify a large number of motifs that show positional/nbias in genes expressed in a highly tissue-specific manner. They include well-known tissue-specific/nmotifs, such as HNF1 and HNF4 motifs in liver, kidney and small intestine, or RFX motifs in testis,/nas well as many potentially novel regulatory motifs. Based on this analysis, we provide predictions/nfor 559 tissue-specific motifs in mouse gene promoters./nConclusion: The study shows that motif positional bias is an important feature of mammalian/nproximal promoters and that it affects both general and tissue-specific motifs. Motif positional/nconstraints define very distinct promoter architectures depending on breadth of expression and/ntype of tissue.We received financial support from Fundación/nBanco Bilbao Vizcaya Argentaria (FBBVA), Plan Nacional de I+D Ministerio/nde Educación y Ciencia (BFU2006-07120), Instituto Nacional de Bioinformática/n(INB), European Commission Infobiomed NoE and, Fundació/nICREA

    Emergence of novel domains in proteins

    Get PDF
    Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently.We received financial support from Ministerio de Educación (FPU to M.T.-R.), Ministerio de Innovación y Tecnología grant BIO2009-08160, Ministerio de Economía y Competitividad grant BFU2012-36820, and Institució Catalana de Recerca i Estudis Avançats (ICREA contract to M.M.A.)

    Dissecting the role of low-complexity regions in the evolution of vertebrate proteins

    Get PDF
    Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few aminoacids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.We received financial support from Fundación Javier Lamas (PhD fellowship to N.R-T.), Ministerio de Innovación y Tecnología at the Spanish Government (BIO2009-08160) and Institució Catalana de Recerca i Estudis Avançats (ICREA to M.M.A.)
    corecore