16 research outputs found

    Testing for Associations between Loci and Environmental Gradients Using Latent Factor Mixed Models

    Get PDF
    Adaptation to local environments often occurs through natural selection acting on a large number of loci, each having a weak phenotypic effect. One way to detect these loci is to identify genetic polymorphisms that exhibit high correlation with environmental variables used as proxies for ecological pressures. Here, we propose new algorithms based on population genetics, ecological modeling, and statistical learning techniques to screen genomes for signatures of local adaptation. Implemented in the computer program "latent factor mixed model" (LFMM), these algorithms employ an approach in which population structure is introduced using unobserved variables. These fast and computationally efficient algorithms detect correlations between environmental and genetic variation while simultaneously inferring background levels of population structure. Comparing these new algorithms with related methods provides evidence that LFMM can efficiently estimate random effects due to population history and isolation-by-distance patterns when computing gene-environment correlations, and decrease the number of false-positive associations in genome scans. We then apply these models to plant and human genetic data, identifying several genes with functions related to development that exhibit strong correlations with climatic gradients.Comment: 29 pages with 8 pages of Supplementary Material (V2 revised presentation and results part

    Improved reference genome of Aedes aegypti informs arbovirus vector control

    Get PDF
    Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector

    Genome scan methods against more complex models : when and how much should we trust them?

    Get PDF
    PdV was supported by a doctoral studentship from the French Ministiere de la Recherche et de l'Enseignement Supierieur. OEG was supported by French ANR grant No 09 GENM 017 001 and by the Marine Alliance for Science and Technology for Scotland (MASTS). EF and OF were supported by a grant from la Region Rhone-Alpes. OF was further supported by Grenoble INP.The recent availability of next-generation sequencing (NGS) has made possible the use of dense genetic markers to identify regions of the genome that may be under the influence of selection. Several statistical methods have been developed recently for this purpose. Here, we present the results of an individual-based simulation study investigating the power and error rate of popular or recent genome scan methods: linear regression, Bayescan, BayEnv and LFMM. Contrary to previous studies, we focus on complex, hierarchical population structure and on polygenic selection. Additionally, we use a false discovery rate (FDR)-based framework, which provides an unified testing framework across frequentist and Bayesian methods. Finally, we investigate the influence of population allele frequencies versus individual genotype data specification for LFMM and the linear regression. The relative ranking between the methods is impacted by the consideration of polygenic selection, compared to a monogenic scenario. For strongly hierarchical scenarios with confounding effects between demography and environmental variables, the power of the methods can be very low. Except for one scenario, Bayescan exhibited moderate power and error rate. BayEnv performance was good under nonhierarchical scenarios, while LFMM provided the best compromise between power and error rate across scenarios. We found that it is possible to greatly reduce error rates by considering the results of all three methods when identifying outlier loci.PostprintPeer reviewe

    A short manual for LFMM (command-line version)

    No full text
    We proposed an integrated framework based on population genetics, ecological modeling and machine learning techniques for screening genomes for signatures of local adaptation. We implemented fast algorithms using a hierarchical Bayesian mixed model based on a variant of principal component analysis in which residual population structure is introduced via unobserved factors. These algorithms can detect correlations between environmental and genetic variation at the same time as they infer the background levels of population structure. A description of the method is available in our paper: Eric Frichot, Sean Schoville, Guillaume Bouchard, Olivier François, 2013. Landscape genomic tests for associations between loci and environmental gradients Molecular Biology and Evolution, in press. Installation We provide a set of R and perl scripts convert to LFMM format and to display manhattan plot. By consequence, R and perl are mandatory to convert to LFMM format and to display manhattan plot. they are not mandatory to execute LFMM. To install LFMM CL version, you just have to execute the install script (install.sh) in LFMM main directory. To execute it in a terminal shell, go to LFMM main directory and write "./install.sh". If the script is not executable, type "chmod +x install.sh" and then "./install.sh". A binary called LFMM should be created in LFMM main directory. Data format Input files are composed of two files: a genotype file and a variable file. The genotype file is a SNP matrix of n lines for n individuals and L columns for L loci. Each element can be 0, 1 or 2. A missing element will be notify by the value 9 or -9. Each element of the matrix is separated by one or several spaces. There should be no space after the last value of each line. A line should not contain only missing data (9 or -9). Below, an example of genotype file for n = 3 individuals and L = 5 loci

    Latent factor models for ecological association studies in population genetics

    No full text
    Nous introduisons un ensemble de modèles à facteurs latents dédié à la génomique du paysage et aux tests d'associations écologiques. Cela comprend des méthodes statistiques pour corriger des effets d'autocorrélation spatiale sur les cartes de composantes principales en génétique des populations (spFA), des méthodes pour estimer rapidement et efficacement les coefficients de métissage individuel à partir de matrices de génotypes de grande taille et évaluer le nombre de populations ancestrales (sNMF) et des méthodes pour identifier les polymorphismes génétiques qui montrent de fortes corrélations avec des gradients environnementaux ou avec des variables utilisées comme des indicateurs pour des pressions écologiques (LFMM). Nous avons aussi développé un ensemble de logiciels libres associés à ces méthodes, basés sur des programmes optimisés en C qui peuvent passer à l'échelle avec la dimension de très grand jeu de données, afin d'effectuer des analyses de structures de population et des cribles génomiques pour l'adaptation locale.We introduce a set of latent factor models dedicated to landscape genomics and ecological association tests. It includes statistical methods for correcting principal component maps for effects of spatial autocorrelation (spFA); methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (sNMF); and methods for identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (LFMM). We also developed a set of open source softwares associated with the methods, based on optimized C programs that can scale with the dimension of very large data sets, to run analyses of population structure and genome scans for local adaptation

    Detecting adaptive evolution based on association with ecological gradients : orientation matters

    Get PDF
    OEG was supported by the Marine Alliance for Science and Technology for Scotland (MASTS).Population genetic signatures of local adaptation are frequently investigated by identifying loci with allele frequencies that exhibit high correlation with ecological variables. One difficulty with this approach is that ecological associations might be confounded by geographic variation at selectively neutral loci. Here we consider populations that underwent spatial expansion from their original range, and for which geographical variation of adaptive allele frequency coincides with habitat gradients. Using range expansion simulations, we asked whether our ability to detect genomic regions involved in adaptation could be impacted by the orientation of the ecological gradients. For three ecological association methods tested, we found, counterintuitively, fewer false positive associations when ecological gradients aligned along the main axis of expansion than when they aligned along any other direction. This result has important consequences for the analysis of genomic data under non-equilibrium population genetic models. Alignment of gradients with expansion axes is likely to be common in scenarios in which expanding species track their ecological niche during climate change while adapting to changing environments at their rear edge.PostprintPeer reviewe

    Modèles à facteurs latents pour les études d'association écologique en génétique des populations

    Get PDF
    We introduce a set of latent factor models dedicated to landscape genomics and ecological association tests. It includes statistical methods for correcting principal component maps for effects of spatial autocorrelation (spFA); methods for estimating ancestry coefficients from large genotypic matrices and evaluating the number of ancestral populations (sNMF); and methods for identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures (LFMM). We also developed a set of open source softwares associated with the methods, based on optimized C programs that can scale with the dimension of very large data sets, to run analyses of population structure and genome scans for local adaptation.Nous introduisons un ensemble de modèles à facteurs latents dédié à la génomique du paysage et aux tests d'associations écologiques. Cela comprend des méthodes statistiques pour corriger des effets d'autocorrélation spatiale sur les cartes de composantes principales en génétique des populations (spFA), des méthodes pour estimer rapidement et efficacement les coefficients de métissage individuel à partir de matrices de génotypes de grande taille et évaluer le nombre de populations ancestrales (sNMF) et des méthodes pour identifier les polymorphismes génétiques qui montrent de fortes corrélations avec des gradients environnementaux ou avec des variables utilisées comme des indicateurs pour des pressions écologiques (LFMM). Nous avons aussi développé un ensemble de logiciels libres associés à ces méthodes, basés sur des programmes optimisés en C qui peuvent passer à l'échelle avec la dimension de très grand jeu de données, afin d'effectuer des analyses de structures de population et des cribles génomiques pour l'adaptation locale
    corecore