34 research outputs found

    VCF2Networks: applying genotype networks to single-nucleotide variants data

    Get PDF
    Summary: A wealth of large-scale genome sequencing projects opens the doors to new approaches to study the relationship between genotype and phenotype. One such opportunity is the possibility to apply genotype networks analysis to population genetics data. Genotype networks are a representation of the set of genotypes associated with a single phenotype, and they allow one to estimate properties such as the robustness of the phenotype to mutations, and the ability of its associated genotypes to evolve new adaptations. So far, though, genotype networks analysis has rarely been applied to population genetics data. To help fill this gap, here we present VCF2Networks, a tool to determine and study genotype network structure from single-nucleotide variant data. Availability and implementation: VCF2Networks is available at https://bitbucket.org/dalloliogm/vcf2networks. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    1000 Genomes Selection Browser 1.0: A genome browser dedicated to signatures of natural selection in modern humans

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.Searching for Darwinian selection in natural populations has been the focus of a multitude of studies over the last decades. Here we present the 1000 Genomes Selection Browser 1.0 (http://hsb.upf.edu) as a resource for signatures of recent natural selection in modern humans. We have implemented and applied a large number of neutrality tests as well as summary statistics informative for the action of selection such as Tajima's D, CLR, Fay and Wu's H, Fu and Li's F* and D*, XPEHH, ΔiHH, iHS, FST, ΔDAF and XPCLR among others to low coverage sequencing data from the 1000 genomes project (Phase 1; release April 2012). We have implemented a publicly available genome-wide browser to communicate the results from three different populations of West African, Northern European and East Asian ancestry (YRI, CEU, CHB). Information is provided in UCSC-style format to facilitate the integration with the rich UCSC browser tracks and an access page is provided with instructions and for convenient visualization. We believe that this expandable resource will facilitate the interpretation of signals of selection on different temporal, geographical and genomic scales. © 2013 The Author(s). Published by Oxford University Press.Ministerio de Ciencia y Tecnología (Spain); Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101); Subprogram BMC [BFU2010-19443 awarded to J.B.]; Post-doctoral scholarship from the Volkswagenstiftung [Az: I/85 198 to J.E.]; Spanish government [BFU-2008-01046; SAF2011-29239]; The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to G.M.D. and M.P., respectively]; PhD fellowship from ‘Acción Estratégica de Salud, en el marco del Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica 2008-2011’ from Instituto de Salud Carlos III (to P.L.). Funding for open access charge: Prof. Jaume Bertranpetit.Peer Reviewe

    The annotation and the usage of scientific databases could be improved with public issue tracker software

    Get PDF
    Since the publication of their longtime predecessor The Atlas of Protein Sequences and Structures in 1965 by Margaret Dayhoff, scientific databases have become a key factor in the organization of modern science. All the information and knowledge described in the novel scientific literature is translated into entries in many different scientific databases, making it possible to obtain very accurate information on a biological entity like genes or proteins without having to manually review the literature on it. However, even for the databases with the finest annotation procedures, errors or unclear parts sometimes appear in the publicly released version and influence the research of unaware scientists using them. The researcher that finds an error in a database is often left in a uncertain state, and often abandons the effort of reporting it because of a lack of a standard procedure to do so. In the present work, we propose that the simple adoption of a public error tracker application, as in many open software projects, could improve the quality of the annotations in many databases and encourage feedback from the scientific community on the data annotated publicly. In order to illustrate the situation, we describe a series of errors that we found and helped solve on the genes of a very well-known pathway in various biomedically relevant databases. We would like to show that, even if a majority of the most important scientific databases have procedures for reporting errors, these are usually not publicly visible, making the process of reporting errors time consuming and not useful. Also, the effort made by the user that reports the error often goes unacknowledged, putting him in a discouraging position

    BioStar: An Online Question & Answer Resource for the Bioinformatics Community

    Get PDF
    Parnell, Laurence D. et al.Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate while local colleagues may be overburdened or unfamiliar with specific applications. Too often, such problems create data analysis bottlenecks that hinder the progress of biological research. In order to help address this deficiency, we present BioStar, a forum based on the Stack Exchange platform where experts and those seeking solutions to problems of computational biology exchange ideas. The main strengths of BioStar are its large and active group of knowledgeable users, rapid response times, clear organization of questions and responses that limit discussion to the topic at hand, and ranking of questions and answers that help identify their usefulness. These rankings, based on community votes, also contribute to a reputation score for each user, which serves to keep expert contributors engaged. The BioStar community has helped to answer over 2,300 questions from over 1,400 users (as of June 10, 2011), and has played a critical role in enabling and expediting many research projects. BioStar can be accessed at http://www.biostars.org/.This work was partially supported by NSF grants MCB-0618402 and CCF-0643529 (CAREER), NIH grants 1R55AI065507 – 01A2 and 1 R01 GM083113-01, NIH/NCRR grant number UL1RR033184, and FPI fellowship SAF-2007-63171/BES-2009-017731 from the Ministerio de Educación y Ciencia, Spain. These funders had no role in the design of BioStar, decision to publish, or preparation of the manuscript.Peer reviewe

    Similarity in Recombination Rate Estimates Highly Correlates with Genetic Differentiation in Humans

    Get PDF
    Recombination varies greatly among species, as illustrated by the poor conservation of the recombination landscape between humans and chimpanzees. Thus, shorter evolutionary time frames are needed to understand the evolution of recombination. Here, we analyze its recent evolution in humans. We calculated the recombination rates between adjacent pairs of 636,933 common single-nucleotide polymorphism loci in 28 worldwide human populations and analyzed them in relation to genetic distances between populations. We found a strong and highly significant correlation between similarity in the recombination rates corrected for effective population size and genetic differentiation between populations. This correlation is observed at the genome-wide level, but also for each chromosome and when genetic distances and recombination similarities are calculated independently from different parts of the genome. Moreover, and more relevant, this relationship is robustly maintained when considering presence/absence of recombination hotspots. Simulations show that this correlation cannot be explained by biases in the inference of recombination rates caused by haplotype sharing among similar populations. This result indicates a rapid pace of evolution of recombination, within the time span of differentiation of modern humans

    Applications of network theory to human population genetics : from pathways to genotype networks

    No full text
    In this thesis we developed two approaches to study positive selection and genetic adaptation in the human genome. Both approaches are based on applications of network theory. In the first approach, we studied how the signals of selection are distributed among the genes of a metabolic pathway. We use a network representation of the Asparagine N-Glycosylation pathway, and determine if given positions are more likely to be involved in selection events. We determined a different distribution of signals between the upstream part of this pathway, which has a linear structure and is involved in a conserved process, and the downstream part of the pathway, which has a complex network structure and is involved in adaptation to the environment. In the second approach, we applied a network representation of the set of genotypes observed in a population (Genotype Network) to next-generation sequencing data. The main result is a genome-wide picture of how the populations of the 1000 Genomes dataset have explored the genotype space. We found that the genotype networks of coding regions tend to be more connected and more expanded in the space than non coding regions, and that simulated sweeps have similar patterns compared to simulated neutral regions.En esta tesis hemos desarrollado dos métodos para estudiar los patrones de selección positiva y adaptación genética en el genoma humano. Ambos métodos se basan en aplicaciones de teoría de redes. En la primera aplicación hemos investigado cómo las señales de selección están distribuidas a lo largo de una ruta metabólica. Hemos utilizado una representación de la ruta de N-Glicosilación, para estudiar si determinadas posiciones tienen más probabilidades de estar implicadas en eventos de selección positiva. Hemos comparado la distribución de las señales de selección entre la primera parte de la ruta metabólica, que tiene una estructura muy lineal y está involucrada en un proceso conservado, y la segunda parte de la ruta, que tiene una estructura de redes compleja y está involucrada en adaptación al ambiente. En la segunda aplicación hemos aplicado el concepto de redes de genotipos (Genotype Networks) a datos de secuencia de nueva generación. El resultado es un análisis completo de cómo las poblaciones de 1000 Genomas han explorado el espacio de genotipo. Las redes de genotipos de regiones codificantes suelen estar más conectadas y más expandidas que las regiones no-codificantes. Además, por medio de simulaciones hemos observado los patrones esperados para eventos de selección positiva

    Genotype networks as a tool to understand human genetics

    No full text
    Trabajo presentado en la 4th Meeting of the Spanish Society of the Evolutionary Biology (SESBE 2013) celebrada en Barcelona del 27 al 29 de noviembre de 2013.N

    Correction: Human Genome Variation and the Concept of Genotype Networks

    Get PDF
    Genotype networks are a concept used in systems biology to study sets of genotypes having the same phenotype, and the ability of these to bring forth novel phenotypes. In the past they have been applied to determine the genetic heterogeneity, and stability to mutations, of systems such as metabolic networks and RNA folds. Recently, they have been the base for reconciling the neutralist and selectionist views on evolution. Here, we adapted this concept to the study of population genetics data. Specifically, we applied genotype networks to the human 1000 genomes dataset, and analyzed networks composed of short haplotypes of Single Nucleotide Variants (SNV). The result is a scan of how properties related to genetic heterogeneity and stability to mutations are distributed along the human genome. We found that genes involved in acquired immunity, such as some HLA and MHC genes, tend to have the most heterogeneous and connected networks, and that coding regions tend to be more heterogeneous and stable to mutations than non-coding regions. We also found, using coalescent simulations, that regions under selection have more extended and connected networks. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. Learning how the genotype space of each region of our genome has been explored during the evolutionary history of the human species can lead to a better understanding on how selective pressures and neutral factors have shaped genetic diversity within populations and among individuals. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity that looks to the whole genome, and goes beyond the classical division between selection and neutrality methods. © 2014 Dall'Olio et al.This work was supported by grants BFU2010-19443 (subprogram BMC) awarded to JB by Ministerio de Ciencia y Tecnología (Spain) and by the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101). GMD is supported by a FPI fellowship BES-2009-017731. AW would like to acknowledge support by the Swiss National Science Foundation and by the URPP Evolutionary Biology at the University of Zurich.Peer reviewe

    Molecular Evolution and Network-Level Analysis of the N-Glycosylation Metabolic Pathway Across Primates

    No full text
    11 páginas, 4 figuras, 4 tablas.N-glycosylation is one of the most important forms of protein modification, serving key biological functions in multicellular organisms. N-glycans at the cell surface mediate the interaction between cells and the surrounding matrix and may act as pathogen receptors, making the genes responsible for their synthesis good candidates to show signatures of adaptation to different pathogen environments. Here, we study the forces that shaped the evolution of the genes involved in the synthesis of the N-glycans during the divergence of primates within the framework of their functional network. We have found that, despite their function of producing glycan repertoires capable of evading rapidly evolving pathogens, genes involved in the synthesis of the glycans are highly conserved, and no signals of positive selection have been detected within the time of divergence of primates. This suggests strong functional constraints as the main force driving their evolution. We studied the strength of the purifying selection acting on the genes in relation to the network structure considering the position of each gene along the pathway, its connectivity, and the rates of evolution in neighboring genes. We found a strong and highly significant negative correlation between the strength of purifying selection and the connectivity of each gene, indicating that genes encoding for highly connected enzymes evolve slower and thus are subject to stronger selective constraints. This result confirms that network topology does shape the evolution of the genes and that the connectivity within metabolic pathways and networks plays a major role in constraining evolutionary rates.This research was funded by grants SAF2007-63171 and BFU2010- 19443 (subprogram BMC) awarded by Ministerio de Ciencia y Tecnología (Spain) and by the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101).Peer reviewe

    The annotation and the usage of scientific databases could be improved with public issue tracker software

    No full text
    Since the publication of their longtime predecessor The Atlas of Protein Sequences and Structures in 1965 by Margaret Dayhoff, scientific databases have become a key factor in the organization of modern science. All the information and knowledge described in the novel scientific literature is translated into entries in many different scientific databases, making it possible to obtain very accurate information on a biological entity like genes or proteins without having to manually review the literature on it. However, even for the databases with the finest annotation procedures, errors or unclear parts sometimes appear in the publicly released version and influence the research of unaware scientists using them. The researcher that finds an error in a database is often left in a uncertain state, and often abandons the effort of reporting it because of a lack of a standard procedure to do so. In the present work, we propose that the simple adoption of a public error tracker application, as in many open software projects, could improve the quality of the annotations in many databases and encourage feedback from the scientific community on the data annotated publicly. In order to illustrate the situation, we describe a series of errors that we found and helped solve on the genes of a very well-known pathway in various biomedically relevant databases. We would like to show that, even if a majority of the most important scientific databases have procedures for reporting errors, these are usually not publicly visible, making the process of reporting errors time consuming and not useful. Also, the effort made by the user that reports the error often goes unacknowledged, putting him in a discouraging position
    corecore