4 research outputs found

    State aggregation for fast likelihood computations in molecular evolution.

    Get PDF
    MOTIVATION: Codon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models. RESULTS: We show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analysed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics. AVAILABILITY: The heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml). CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Molecular Ă©volution of RuBisCO subunits

    Get PDF
    The environmental conditions of our planet have been changing since its origin. For species’ survival, adaptation to the environment is crucial, for example through the adaptive evolution of photosynthesis. The appearance of the mechanism to concentrate CO2 has given some species a selective advantage under CO2-depleted conditions. C4 plants comprise one of the main groups of such species that have diverged from classical C3 plants and adapted to depletion of CO2 by modifying the cellular structures and biochemical cascades. Ribulose- 1,5-bisphosphate carboxylase/oxygenase (RuBisCO), an enzyme which catalyzes the first step of CO2 fixation, has changed cellular location during C4 evolution. RuBisCO of C4 is surrounded by highly concentrated CO2, which prevents the loss of energy and CO2 caused by the affinity of the enzyme for both O2 and CO2. The intercellular gas composition surrounding RuBisCO directly influences the rate of photosynthesis because RuBisCO’s slow turnover rate is often the limiting factor for the rate of photosynthesis in higher plants. Therefore, RuBisCO has been considered as the determining factor of the photosynthetic rate and it has been thought to play an important role in plant adaptation to the environmental conditions. In previous studies, the evidence of adaptive evolution of RuBisCO has been detected by positive selection acting on the chloroplast rbcL gene encoding large subunits of RuBisCO (RBCL) in independent C4 lineages. The other subunit of RuBisCO, the small subunit (RBCS), has been reported to influence the catalytic efficiency, CO2 specificity, assembly, activity, and stability of RuBisCO. However, the evolution of its encoding nuclear gene rbcS is yet poorly studied. Therefore, I aimed to study the molecular evolution of rbcS in angiosperms. The rbcS gene is a multigene family and the number of gene copies is different between species. The phylogenetic tree of the rbcS gene reveals two lineages that may have originated from a duplication event before the divergence of land plants. Copies originating from ancient duplication events seem to have been removed, whereas the copies from recent events appear to be retained. This explains the observation in the rbcS tree that gene copies of the same species are more closely related to each other than ones from different species. I hypothesized that each rbcS gene copy of the same species may have different characteristics. I compared the interaction of rbcS and rbcL genes as well as the influence of different encoding RBCS subunits to the stability of RuBisCO by respectively testing coevolution between rbcS and each rbcL and by homology modelling of RuBisCO composed with a RBCS encoded by different rbcS copies. The results suggested that the interaction between RBCS and RBCL, and the influence on the overall stability of the enzyme, are the same among different rbcS copies. Therefore, I assumed that all the rbcS gene copies cannot be divergent because they need to be structurally compatible with RBCL. In general, when all the gene copies of a multigene family have the same characteristics, multiple gene copies of a species exist to maintain the number of transcripts at the same level as that of a single copy carrying species (dosage effect hypothesis). To test this hypothesis, I estimated the gene expression levels of each gene copy by using published transcriptome data. The results suggest that the gene expression level is similar between species carrying single and multiple copies. The results suggest that species carrying a higher gene copy number have a larger amount of RuBisCO. It has been reported that RuBisCO is degraded or down regulated under specific environmental stress. Thus, I conclude that plants living in such an environmental stress condition may need to synthesize more RuBisCO to prevent a shortage of the enzyme. To understand better the role of RBCS to cope with environmental changes, I tested the positive selection of the rbcS gene in species of Poaceae that have different photosynthetic types. Positive selection was detected all over the tree and the signal was not C4-specific. This suggests that the positive selection acting on the rbcS gene has not led to the shift of photosynthetic types. I assume that RBCS might be involved in the optimization of RuBisCO after the establishment of C4 photos nthesis type or after migration to new habitats that require different catalytic properties. -- Les conditions environnementales de notre planĂšte ne cessent de changer depuis son origine. Pour survivre, il est crucial pour les espĂšces de s’adapter Ă  leur environnement. Un exemple est l’évolution adaptative de la photosynthĂšse. L’apparition de mĂ©canismes permettant de concentrer le CO2 a donnĂ© Ă  certaines espĂšces un avantage sĂ©lectif lorsqu’elles font face Ă  des conditions appauvries en CO2. Les plantes C4 constituent l’un des principaux groupes d’espĂšces qui ont divergĂ© des plantes C3 classiques en s’adaptant en modifiant leurs structures cellulaires et cascades biochimiques. La ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) – une enzyme catalysant la premiĂšre Ă©tape de fixation de CO2 – a changĂ© de localisation cellulaire durant l’évolution du mode de fixation du carbone C4. La RuBisCO des plantes C4 est localisĂ©e dans un compartiment caractĂ©risĂ© par une haute concentration en CO2, Ă©vitant ainsi la perte d’énergie et de CO2 causĂ©e par l’affinitĂ© de l’enzyme pour deux substrats: le CO2 et le O2. L’environnement gazeux intracellulaire auquel est confrontĂ©e la RuBisCO influence directement le taux de photosynthĂšse, car son faible taux de renouvellement par rapport Ă  d’autres enzymes photosynthĂ©tiques constitue souvent le facteur limitant le taux de photosynthĂšse chez les plantes supĂ©rieures. De ce fait, la RuBisCO est considĂ©rĂ©e comme le facteur dĂ©terminant le taux de photosynthĂšse et jouant un rĂŽle important dans l’adaptation des plantes aux conditions environnementales. De prĂ©cĂ©dentes Ă©tudes dĂ©montrĂšrent l’évolution adaptative de la RuBisCO par sĂ©lection positive agissant sur le gĂšne chloroplastique rbcL – qui code pour la grande sous-unitĂ© de la RuBisCO (RBCL) – dans des lignĂ©es indĂ©pendantes de plantes C4. Il a Ă©tĂ© dĂ©montrĂ© que l’autre sous-unitĂ© de la RuBisCO – la petit sous-unitĂ© (RBCS) – influence l’efficacitĂ© catalytique, la spĂ©cificitĂ© de liaison au CO2, l’assemblement, l’activitĂ© et la stabilitĂ© de la RuBisCO. NĂ©anmoins, l’évolution du gĂšne codant pour cette sous-unitĂ© – le gĂšne nuclĂ©aire rbcS – n’a Ă©tĂ© que trĂšs peu Ă©tudiĂ©e jusqu’à prĂ©sent. Par consĂ©quent, le but de mon projet est d’étudier l’évolution molĂ©culaire du gĂšne rbcS chez les Angiospermes. Le gĂšne rbcS fait partie d’une famille de gĂšnes multiples et son nombre de copies varie selon les espĂšces. Des arbres phylogĂ©nĂ©tiques se basant sur rbcS ont rĂ©vĂ©lĂ© deux lignĂ©es provenant potentiellement d’un Ă©vĂ©nement de duplication ayant eu lieu avant la divergence des plantes terrestres. Les copies provenant d’anciens Ă©vĂ©nements de duplication semblent avoir Ă©tĂ© Ă©liminĂ©es, alors que les copies provenant d’évĂ©nements rĂ©cents de duplications paraissent avoir Ă©tĂ© conservĂ©es. Cela explique que les copies de rbcS provenant d’une mĂȘme espĂšce soient plus proches phylogĂ©nĂ©tiquement les unes des autres que des copies provenant d’espĂšces diffĂ©rentes. Je mets en avant l’hypothĂšse que chaque copie du gĂšne rbcS de la mĂȘme espĂšce pourrait avoir diffĂ©rentes caractĂ©ristiques. J’ai comparĂ© l’interaction entre les gĂšnes rbcS et rbcL ainsi que l’influence des diffĂ©rentes sous-unitĂ©s RBCS Ă  la stabilitĂ© de la RuBisCO en testant respectivement la coĂ©volution entre rbcS et chaque rbcL et en modĂ©lisant par homologie la RuBisCO composĂ©e par une sous-unitĂ© RBCS codĂ©e par diffĂ©rentes copies du gĂšne rbcS. Les rĂ©sultats suggĂšrent que l’interaction entre chaque rbcS et rbcL et l’influence sur la stabilitĂ© gĂ©nĂ©rale de l’enzyme est similaire entre les diffĂ©rentes copies de rbcS. En consĂ©quence, je prĂ©sume que les diffĂ©rentes copies du gĂšne rbcS ne peuvent pas ĂȘtre divergentes car il est nĂ©cessaire qu’elles soient compatibles structurellement avec la sous-unitĂ© RBCL. En gĂ©nĂ©ral, lorsque toutes les copies de gĂšnes provenant d’une mĂȘme famille de gĂšnes multiples ont les mĂȘmes caractĂ©ristiques, les diffĂ©rentes copies de gĂšnes permettent de maintenir la mĂȘme quantitĂ© d’élĂ©ments transcrits en comparaison avec une espĂšce ne possĂ©dant qu’une copie du gĂšne (hypothĂšse « d’effet de dosage »). Afin de tester cette hypothĂšse, j’ai estimĂ© le niveau d’expression pour chaque copie de gĂšne de la mĂȘme espĂšce en me basant sur des donnĂ©es transcriptomiques dĂ©jĂ  publiĂ©es. Les rĂ©sultats suggĂšrent que le niveau d’expression des gĂšnes est similaire entre les espĂšces ayant une ou plusieurs copies du gĂšne. De ce fait, l’hypothĂšse d’effet de dosage n’est pas applicable dans le cadre de l’évolution de rbcS. Les rĂ©sultats suggĂšrent que les espĂšces ayant un plus grand nombre de copies du gĂšne disposent Ă©galement d’une plus grande quantitĂ© de RuBisCO. Il a Ă©tĂ© rapportĂ© que la RuBisCO se dĂ©grade ou est rĂ©gulĂ©e nĂ©gativement dans des conditions de stress spĂ©cifiques. Par consĂ©quent, je prĂ©sume que les plantes vivant dans de telles conditions environnementales stressantes doivent synthĂ©tiser plus de RuBisCO pour Ă©viter une pĂ©nurie de l’enzyme. Pour mieux comprendre le rĂŽle de RBCS face aux changements environnementaux, j’ai testĂ© la sĂ©lection positive du gĂšne rbcS chez des espĂšces de PoacĂ©es ayant diffĂ©rents mĂ©canismes photosynthĂ©tiques. Une sĂ©lection positive a Ă©tĂ© dĂ©tectĂ©e chez toutes les espĂšces et le signal n’était pas spĂ©cifique aux espĂšces Ă  systĂšme C4. Cela suggĂšre que la sĂ©lection positive agissant sur le gĂšne rbcS n’est pas responsable du changement de type de photosynthĂšse. Je prĂ©sume que RBCS ne serait donc pas impliquĂ©e dans la transition C3 Ă  C4, mais que cette sous-unitĂ© pourrait ĂȘtre impliquĂ©e dans l’optimisation de la RuBisCO aprĂšs l’établissement de la photosynthĂšse de type C4 ou aprĂšs la migration vers de nouveaux habitats nĂ©cessitant diffĂ©rentes propriĂ©tĂ©s catalytiques

    State aggregation for fast likelihood computations in molecular evolution

    No full text
    Abstract Motivation Codon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models. Results We show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analyzed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics. Availability and Implementation The heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml). Supplementary information Supplementary data are available at Bioinformatics online
    corecore