10 research outputs found

    Applications of network theory to human population genetics : from pathways to genotype networks

    No full text
    In this thesis we developed two approaches to study positive selection and genetic adaptation in the human genome. Both approaches are based on applications of network theory. In the first approach, we studied how the signals of selection are distributed among the genes of a metabolic pathway. We use a network representation of the Asparagine N-Glycosylation pathway, and determine if given positions are more likely to be involved in selection events. We determined a different distribution of signals between the upstream part of this pathway, which has a linear structure and is involved in a conserved process, and the downstream part of the pathway, which has a complex network structure and is involved in adaptation to the environment. In the second approach, we applied a network representation of the set of genotypes observed in a population (Genotype Network) to next-generation sequencing data. The main result is a genome-wide picture of how the populations of the 1000 Genomes dataset have explored the genotype space. We found that the genotype networks of coding regions tend to be more connected and more expanded in the space than non coding regions, and that simulated sweeps have similar patterns compared to simulated neutral regions.En esta tesis hemos desarrollado dos métodos para estudiar los patrones de selección positiva y adaptación genética en el genoma humano. Ambos métodos se basan en aplicaciones de teoría de redes. En la primera aplicación hemos investigado cómo las señales de selección están distribuidas a lo largo de una ruta metabólica. Hemos utilizado una representación de la ruta de N-Glicosilación, para estudiar si determinadas posiciones tienen más probabilidades de estar implicadas en eventos de selección positiva. Hemos comparado la distribución de las señales de selección entre la primera parte de la ruta metabólica, que tiene una estructura muy lineal y está involucrada en un proceso conservado, y la segunda parte de la ruta, que tiene una estructura de redes compleja y está involucrada en adaptación al ambiente. En la segunda aplicación hemos aplicado el concepto de redes de genotipos (Genotype Networks) a datos de secuencia de nueva generación. El resultado es un análisis completo de cómo las poblaciones de 1000 Genomas han explorado el espacio de genotipo. Las redes de genotipos de regiones codificantes suelen estar más conectadas y más expandidas que las regiones no-codificantes. Además, por medio de simulaciones hemos observado los patrones esperados para eventos de selección positiva

    The annotation and the usage of scientific databases could be improved with public issue tracker software

    No full text
    Since the publication of their longtime predecessor The Atlas of Protein Sequences and Structures in 1965 by Margaret Dayhoff, scientific databases have become a key factor in the organization of modern science. All the information and knowledge described in the novel scientific literature is translated into entries in many different scientific databases, making it possible to obtain very accurate information on a biological entity like genes or proteins without having to manually review the literature on it. However, even for the databases with the finest annotation procedures, errors or unclear parts sometimes appear in the publicly released version and influence the research of unaware scientists using them. The researcher that finds an error in a database is often left in a uncertain state, and often abandons the effort of reporting it because of a lack of a standard procedure to do so. In the present work, we propose that the simple adoption of a public error tracker application, as in many open software projects, could improve the quality of the annotations in many databases and encourage feedback from the scientific community on the data annotated publicly. In order to illustrate the situation, we describe a series of errors that we found and helped solve on the genes of a very well-known pathway in various biomedically relevant databases. We would like to show that, even if a majority of the most important scientific databases have procedures for reporting errors, these are usually not publicly visible, making the process of reporting errors time consuming and not useful. Also, the effort made by the user that reports the error often goes unacknowledged, putting him in a discouraging position

    VCF2Networks: applying genotype networks to single-nucleotide variants data

    No full text
    SUMMARY: A wealth of large-scale genome sequencing projects opens the doors to new approaches to study the relationship between genotype and phenotype. One such opportunity is the possibility to apply genotype networks analysis to population genetics data. Genotype networks are a representation of the set of genotypes associated with a single phenotype, and they allow one to estimate properties such as the robustness of the phenotype to mutations, and the ability of its associated genotypes to evolve new adaptations. So far, though, genotype networks analysis has rarely been applied to population genetics data. To help fill this gap, here we present VCF2Networks, a tool to determine and study genotype network structure from single-nucleotide variant data. AVAILABILITY AND IMPLEMENTATION: VCF2Networks is available at https://bitbucket.org/dalloliogm/vcf2networks. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.This study has been possible thanks the grant BFU2013-43726-P awarded by Ministerio de Economía y Competitividad (Spain) and with the support of Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2014 SGR 866). GMD was supported by a FPI fellowship BES-2009-017731. AW was supported by the Swiss National Science Foundation and by the URPP Evolutionary Biology at the University of Zurich

    Distribution of events of positive selection and population differentiation in a metabolic pathway: the case of asparagine N-glycosylation

    Get PDF
    Asparagine N-Glycosylation is one of the most important forms of protein post-translational modification in eukaryotes. This metabolic pathway can be subdivided into two parts: an upstream sub-pathway required for achieving proper folding for most of the proteins synthesized in the secretory pathway, and a downstream sub-pathway required to give variability to trans-membrane proteins, and involved in adaptation to the environment and innate immunity. Here we analyze the nucleotide variability of the genes of this pathway in human populations, identifying which genes show greater population differentiation and which genes show signatures of recent positive selection. We also compare how these signals are distributed between the upstream and the downstream parts of the pathway, with the aim of exploring how forces of population differentiation and positive selection vary among genes involved in the same metabolic pathway but subject to different functional constraints. Our results show that genes in the downstream part of the pathway are more likely to show a signature of population differentiation, while events of positive selection are equally distributed among the two parts of the pathway. Moreover, events of positive selection are frequent on genes that are known to be at bifurcation points, and that are identified as being in key position by a network-level analysis such as MGAT3 and GCS1. These findings indicate that the upstream part of the Asparagine N-Glycosylation pathway has lower diversity among populations, while the downstream part is freer to tolerate diversity among populations. Moreover, the distribution of signatures of population differentiation and positive selection can change between parts of a pathway, especially between parts that are exposed to different functional constraints. Our results support the hypothesis that genes involved in constitutive processes can be expected to show lower population differentiation, while genes involved in traits related to the environment should show higher variability. Taken together, this work broadens our knowledge on how events of population differentiation and of positive selection are distributed among different parts of a metabolic pathway.This work was funded by grant BFU2010-19443 (subprogram BMC) awarded to JB by Ministerio de Ciencia y Tecnología (Spain), and the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101). GMD is supported by a FPI fellowship (BES-2009-017731) from the Ministerio de Ciencia y Tecnología, (Spain). PL is supported by a PhD fellowship from “Acción Estratégica de Salud, 2008-2011” from Instituto de Salud Carlos III and LM is supported by a postdoctoral fellowship from the Juan de la Cierva Program of the Spanish Ministry of Science and Innovation (MICINN)

    Distribution of events of positive selection and population differentiation in a metabolic pathway: the case of asparagine N-glycosylation

    No full text
    Asparagine N-Glycosylation is one of the most important forms of protein post-translational modification in eukaryotes. This metabolic pathway can be subdivided into two parts: an upstream sub-pathway required for achieving proper folding for most of the proteins synthesized in the secretory pathway, and a downstream sub-pathway required to give variability to trans-membrane proteins, and involved in adaptation to the environment and innate immunity. Here we analyze the nucleotide variability of the genes of this pathway in human populations, identifying which genes show greater population differentiation and which genes show signatures of recent positive selection. We also compare how these signals are distributed between the upstream and the downstream parts of the pathway, with the aim of exploring how forces of population differentiation and positive selection vary among genes involved in the same metabolic pathway but subject to different functional constraints. Our results show that genes in the downstream part of the pathway are more likely to show a signature of population differentiation, while events of positive selection are equally distributed among the two parts of the pathway. Moreover, events of positive selection are frequent on genes that are known to be at bifurcation points, and that are identified as being in key position by a network-level analysis such as MGAT3 and GCS1. These findings indicate that the upstream part of the Asparagine N-Glycosylation pathway has lower diversity among populations, while the downstream part is freer to tolerate diversity among populations. Moreover, the distribution of signatures of population differentiation and positive selection can change between parts of a pathway, especially between parts that are exposed to different functional constraints. Our results support the hypothesis that genes involved in constitutive processes can be expected to show lower population differentiation, while genes involved in traits related to the environment should show higher variability. Taken together, this work broadens our knowledge on how events of population differentiation and of positive selection are distributed among different parts of a metabolic pathway.This work was funded by grant BFU2010-19443 (subprogram BMC) awarded to JB by Ministerio de Ciencia y Tecnología (Spain), and the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101). GMD is supported by a FPI fellowship (BES-2009-017731) from the Ministerio de Ciencia y Tecnología, (Spain). PL is supported by a PhD fellowship from “Acción Estratégica de Salud, 2008-2011” from Instituto de Salud Carlos III and LM is supported by a postdoctoral fellowship from the Juan de la Cierva Program of the Spanish Ministry of Science and Innovation (MICINN)

    Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations

    No full text
    MOTIVATION: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.This work was supported by Ministerio de Economía y Competitividad (Spain) [grants BFU2010-19443, BFU2013-43726-P]; and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya [GRC 2014 SGR 866] to J.B. M.P. and G.D. have been supported by a grant of the FPI program, Ministerio de Economia y Competitividad; P.L. by a grant from the Instituto de Salud Carlos III; J.E. was supported through a Postdoc scholarship from the Volkswagenstiftung [Az: I/85 198

    Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations

    No full text
    MOTIVATION: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.This work was supported by Ministerio de Economía y Competitividad (Spain) [grants BFU2010-19443, BFU2013-43726-P]; and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya [GRC 2014 SGR 866] to J.B. M.P. and G.D. have been supported by a grant of the FPI program, Ministerio de Economia y Competitividad; P.L. by a grant from the Instituto de Salud Carlos III; J.E. was supported through a Postdoc scholarship from the Volkswagenstiftung [Az: I/85 198

    1000 Genomes Selection Browser 1.0: A genome browser dedicated to signatures of natural selection in modern humans

    No full text
    Searching for Darwinian selection in natural populations has been the focus of a multitude of studies over the last decades. Here we present the 1000 Genomes Selection Browser 1.0 (http://hsb.upf.edu) as a resource for signatures of recent natural selection in modern humans. We have implemented and applied a large number of neutrality tests as well as summary statistics informative for the action of selection such as Tajima’s D, CLR, Fay and Wu’s H, Fu and Li’s F* and D*, XPEHH, ΔiHH, iHS, FST, ΔDAF and XPCLR among others to low coverage sequencing data from the 1000 genomes project (Phase 1; release April 2012). We have implemented a publicly available genome-wide browser to communicate the results from three different populations of West African, Northern European and East Asian ancestry (YRI, CEU, CHB). Information is provided in UCSC-style format to facilitate the integration with the rich UCSC browser tracks and an access page is provided with instructions and for convenient visualization. We believe that this expandable resource will facilitate the interpretation of signals of selection on different temporal, geographical and genomic scales.Ministerio de Ciencia y Tecnología (Spain); Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101); Subprogram BMC[BFU2010-19443 awarded to J.B.]; Post-doctoral scholarship from the Volkswagenstiftung [Az:I/85 198 to J.E.]; Spanish government [BFU-2008-01046; SAF2011-29239];The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to G.M.D. and M.P.,respectively]; PhD fellowship from ‘Acción Estratégica de Salud, en el marco del Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica 2008-2011’ from Instituto de Salud Carlos III (to P.L.). Funding for open access charge: Prof. Jaume Bertranpetit

    Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation

    No full text
    To shed light on the peopling of South Asia and the origins of the morphological adaptations found there, we analyzed whole-genome sequences from 10 Andamanese individuals and compared them with sequences for 60 individuals from mainland Indian populations with different ethnic histories and with publicly available data from other populations. We show that all Asian and Pacific populations share a single origin and expansion out of Africa, contradicting an earlier proposal of two independent waves of migration. We also show that populations from South and Southeast Asia harbor a small proportion of ancestry from an unknown extinct hominin, and this ancestry is absent from Europeans and East Asians. The footprints of adaptive selection in the genomes of the Andamanese show that the characteristic distinctive phenotypes of this population (including very short stature) do not reflect an ancient African origin but instead result from strong natural selection on genes related to human body size.The main funding was provided by the joint Spain–India bilateral grant PRI-PIBIN-2011-0942 from the Ministerio de Economía y Competitividad (Spain). Complementary funding was provided by grant BFU2013-43726-P from the Ministerio de Economía y Competitividad (Spain), with the support of Secretaria d'Universitats i Recerca, Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2014 SGR866

    Similarity in recombination rate estimates highly correlates with genetic differentiation in humans

    No full text
    Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pairs of 636,933 common single-nucleotide polymorphism loci in 28 worldwide human populations and analyzed them in relation to genetic distances between populations. We found a strong and highly significant correlation between similarity in the recombination rates corrected for effective population size and genetic differentiation between populations. This correlation is observed at the genome-wide level, but also for each chromosome and when genetic distances and recombination similarities are calculated independently from different parts of the genome. Moreover, and more relevant, this relationship is robustly maintained when considering presence/absence of recombination hotspots. Simulations show that this correlation cannot be explained by biases in the inference of recombination rates caused by haplotype sharing among similar populations. This result indicates a rapid pace of evolution of recombination, within the time span of differentiation of modern humansThis research was funded by grants BFU2007-63657, BFU2009-13409-C02-02 and SAF-2007-63171 awarded by Ministerio de Educación y Ciencia (Spain), by the Direcció General de Recerca of Generalitat de Catalunya (Grup de Recerca Consolidat 2005SGR/00608 and 2009 SGR 1101), and by the National Institute for Bioinformatics (www.inab.org), a platform of Genoma España. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscrip
    corecore