563 research outputs found

    Visualizing SNP statistics in the context of linkage disequilibrium using LD-Plus

    Get PDF
    Summary: Often in human genetic analysis, multiple tables of single nucleotide polymorphism (SNP) statistics are shown alongside a Haploview style correlation plot. Readers are then asked to make inferences that incorporate knowledge across these multiple sets of results. To better facilitate a collective understanding of all available data, we developed a Ruby-based web application, LD-Plus, to generate figures that simultaneously display physical location of SNPs, binary SNP attributes (such as coding/non-coding or presence on genotyping platforms), common haplotypes and their frequencies and continuously scaled values (such as Fst, minor allele frequency, genotyping efficiency or P-values), all in the context of the D′ and r2 linkage disequilibrium structures. Combining these results into one comprehensive figure reduces dereferencing between figures and tables, and can provide unique insights into genetic features that are not clearly seen when results are partitioned across multiple figures and tables

    Synthesis-View: visualization and interpretation of SNP association results for multi-cohort, multi-phenotype data and meta-analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Initial genome-wide association study (GWAS) discoveries are being further explored through the use of large cohorts across multiple and diverse populations involving meta-analyses within large consortia and networks. Many of the additional studies characterize less than 100 single nucleotide polymorphisms (SNPs), often include multiple and correlated phenotypic measurements, and can include data from multiple-sites, multiple-studies, as well as multiple race/ethnicities. New approaches for visualizing resultant data are necessary in order to fully interpret results and obtain a broad view of the trends between DNA variation and phenotypes, as well as provide information on specific SNP and phenotype relationships.</p> <p>Results</p> <p>The Synthesis-View software tool was designed to visually synthesize the results of the aforementioned types of studies. Presented herein are multiple examples of the ways Synthesis-View can be used to report results from association studies of DNA variation and phenotypes, including the visual integration of p-values or other metrics of significance, allele frequencies, sample sizes, effect size, and direction of effect.</p> <p>Conclusions</p> <p>To truly allow a user to visually integrate multiple pieces of information typical of a genetic association study, innovative views are needed to integrate multiple pieces of information. As a result, we have created "Synthesis-View" software for the visualization of genotype-phenotype association data in multiple cohorts. Synthesis-View is freely available for non-commercial research institutions, for full details see <url>https://chgr.mc.vanderbilt.edu/synthesisview</url>.</p

    LocusZoom: regional visualization of genome-wide association scan results

    Get PDF
    Summary: Genome-wide association studies (GWAS) have revealed hundreds of loci associated with common human genetic diseases and traits. We have developed a web-based plotting tool that provides fast visual display of GWAS results in a publication-ready format. LocusZoom visually displays regional information such as the strength and extent of the association signal relative to genomic position, local linkage disequilibrium (LD) and recombination patterns and the positions of genes in the region

    Visualization of Longitudinal Phenotypes in the Norwegian Mother and Child Cohort Study

    Get PDF
    The Norwegian Mother and Child Cohort Study (MoBa) is a pregnancy cohort study with over 100,000 children enrolled. Data was gathered through questionnaires mailed to the mothers, but also in the form of biological samples where more than 15,000 trios (mother, father, and child) have been genotyped so far. Data collected by MoBa is sensitive and its access is therefore restricted to protect the privacy of the study participants. This can make it difficult (or even impossible) to access the data, not only for parents and the general public, but also for scientists and medical professionals. To solve this issue, it is necessary to provide access to the data in a manner that is high-resolution without compromising participant privacy. The MoBa data is multidimensional and contains longitudinal information on several phenotypes (such as height and weight) for the children, as well as data on certain variables for the parents. Based on the recorded variables, the MoBa cohort can be divided into various subgroups that can be studied separately or compared with each other. Furthermore, the genotyping data can be viewed at different scales: (i) genetic variants can be considered individually, (ii) in the context of their genomic location, or (iii) the entire genome can be considered as a whole. Finally, a good presentation of the data has to account for and take advantage of the complexity of the MoBa data. Hundreds of gigabytes of summary statistics can be generated from the genotyping data from MoBa. Depending on the use case, only a small subset of this data is relevant to present to a user at a given time point. In order to present these subsets to the user quickly upon request, a bioinformatics system that can find and dispatch data in a short amount of time must be implemented. This thesis demonstrates how the issues related to large-scale sensitive data access and dissemination can be solved through a publicly available web application able to handle the associated data volumes efficiently.Masteroppgåve i informatikkINF39

    Genome-wide association study of receptive language ability of 12 year olds

    Get PDF
    Purpose: We have previously shown that individual differences in measures of receptive language ability at age 12 are highly heritable. The current study attempted to identify some of the genes responsible for the heritability of receptive language ability using a genome-wide association (GWA) approach. Method: We administered four internet-based measures of receptive language (vocabulary, semantics, syntax, and pragmatics) to a sample of 2329 12-year-olds for whom DNA and genome-wide genotyping were available. Nearly 700,000 single-nucleotide polymorphisms (SNPs) and one million imputed SNPs were included in a GWA analysis of receptive language composite scores. Results: No SNP associations met the demanding criterion of genome-wide significance that corrects for multiple testing across the genome (p < 5 ×10-8). The strongest SNP association did not replicate in an additional sample of 2639 12-year-olds. Conclusion: These results indicate that individual differences in receptive language ability in the general population do not reflect common genetic variants that account for >3% of the phenotypic variance. The search for genetic variants associated with language skill will require larger samples and additional methods to identify and functionally characterize the full spectrum of risk variants

    From landraces to improved cultivars: Assessment of genetic diversity and population structure of Mediterranean wheat using SNP markers

    Get PDF
    Assessment of genetic diversity and population structure in crops is essential for breeding and germplasm conservation. A collection of 354 bread wheat genotypes, including Mediterranean landraces and modern cultivars representative of the ones most widely grown in the Mediterranean Basin, were characterized with 11196 single nucleotide polymorphism (SNP) markers. Total genetic diversity (HT) and polymorphic information content (PIC) were 0.36 and 0.30 respectively for both landraces and modern cultivars. Linkage disequilibrium for the modern cultivars was higher than for the landraces (0.18 and 0.12, respectively). Analysis of the genetic structure showed a clear geographical pattern for the landraces, which were clustered into three subpopulations (SPs) representing the western, northern and eastern Mediterranean, whereas the modern cultivars were structured according to the breeding programmes that developed them: CIMMYT/ICARDA, France/Italy, and Balkan/eastern European countries. The modern cultivars showed higher genetic differentiation (GST) and lower gene flow (0.1673 and 2.49, respectively) than the landraces (0.1198 and 3.67, respectively), indicating a better distinction between subpopulations. The maximum gene flow was observed between landraces from the northern Mediterranean SPs and the modern cultivars released mainly by French and Italian breeding programmes.info:eu-repo/semantics/publishedVersio

    Haplotype structure and association to Crohn's disease of CARD15 mutations in two ethnically divergent populations

    Get PDF
    Current debate focuses on the relevance of linkage disequilibrium (LD), ethnicity and underlying haplotype structure to the search for genes involved in complex disorders. The recently described association between single nucleotide polymorphisms (SNPs) of the CARD15 (NOD2) gene and Crohn's disease (CD) in populations of north-European descent provides a test case that we have subjected to detailed SNP haplotype based analyses. We examined 23 SNPs spanning 290 kb, including CARD15, in large North-European and Korean samples of patients with Crohn's disease and normal controls. In Europeans we confirmed that the three disease-associated SNPs occur independently but share a common background haplotype. This suggests a common origin and the possibility of an undiscovered more strongly predisposing mutation. Korean CD patients present a phenotype identical to the European patients and have not previously been screened for CARD15. The three disease-associated SNPs were absent and there was no evidence of association between CARD15 and CD. Consequently, the disease-associated mutations in the Europeans, which are rare, have arisen recently (after the Asian–European split). Our results highlight important issues relevant to mapping the genes that predispose to complex disorders. First, although ethnically divergent populations may present identical phenotypes they do not necessarily share the same set of predisposing genes. Second, although single-locus tests of association showed consistent association with markers throughout the gene, pair-wise LD between markers (r2 and D') yielded very little information about actual disease-association. Third, a population comparative approach allowed refining of the marker set through the examination of shared polymorphisms and common LD-groups. This approach, in conjunction with the examination of the mutational steps in a haplotype network, allows unambiguous identification of the potentially causative mutations.ope

    Merging genotyping-by-sequencing data from two ex situ collections provides insights on the pea evolutionary history

    Get PDF
    Pea (Pisum sativum L. subsp. sativum) is one of the oldest domesticated species and a widely cultivated legume. In this study, we combined next generation sequencing (NGS) data referring to two genotyping-by-sequencing (GBS) libraries, each one prepared from a different Pisum germplasm collection. The selection of single nucleotide polymorphism (SNP) loci called in both germplasm collections caused some loss of information; however, this did not prevent the obtainment of one of the largest datasets ever used to explore pea biodiversity, consisting of 652 accessions and 22 127 markers. The analysis of population structure reflected genetic variation based on geographic patterns and allowed the definition of a model for the expansion of pea cultivation from the domestication centre to other regions of the world. In genetically distinct populations, the average decay of linkage disequilibrium (LD) ranged from a few bases to hundreds of kilobases, thus indicating different evolutionary histories leading to their diversification. Genome-wide scans resulted in the identification of putative selective sweeps associated with domestication and breeding, including genes known to regulate shoot branching, cotyledon colour and resistance to lodging, and the correct mapping of two Mendelian genes. In addition to providing information of major interest for fundamental and applied research on pea, our work describes the first successful example of integration of different GBS datasets generated from ex situ collections - a process of potential interest for a variety of purposes, including conservation genetics, genome-wide association studies, and breeding

    Whole-genome re-sequencing provides key genomic insights in farmed Arctic charr (Salvelinus alpinus) populations of anadromous and landlocked origin from Scandinavia

    Get PDF
    Arctic charr (Salvelinus alpinus) is a niche-market high-value species for Nordic aquaculture. Similar to other salmonids, both anadromous and landlocked populations are encountered. Whole-genome re-sequencing (22X coverage) was performed on two farmed populations of anadromous (Sigerfjord; n = 24) and landlocked (Arctic Superior; n = 24) origin from Norway and Sweden respectively. More than 5 million SNPs were used to study their genetic diversity and to scan for selection signatures. The two populations were clearly distinguished through principal component analysis, with the mean fixation index being similar to 0.12. Furthermore, the levels of genomic inbreeding estimated from runs of homozygosity were 6.23% and 8.66% for the Norwegian and the Swedish population respectively. Biological processes that could be linked to selection pressure associated primarily with the anadromous background and/or secondarily with domestication were suggested. Overall, our study provided insights regarding the genetic composition of two main strains of farmed Arctic charr from Scandinavia. At the same time, ample genomic resources were produced in the magnitude of millions of SNPs that could assist the transition of Nordic Arctic charr farming in the genomics era
    corecore