130 research outputs found

    Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

    Full text link
    Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Mapping genes through the use of linkage disequilibrium generated by genetic drift: 'Drift mapping' in small populations with no demographic expansion

    Get PDF
    Linkage disequilibrium has been a powerful tool in identifying rare disease alleles in human populations. To date, most research has been directed to isolated populations which have undergone a bottleneck followed by rapid exponential expansion. While this strategy works well for rare diseases in which all disease alleles in the population today are clonal copies of some common ancestral allele, for common disease genes with substantial allelic heterogeneity, this approach is not predicted to work. In this paper, we describe the dynamics of linkage disequilibrium in populations which have not undergone a demographic expansion. In these populations, it is shown that genetic drift creates disequilibrium over time, while in expanded populations, the disequilibrium decays with time. We propose that common disease alleles might be more efficiently identified by drift mapping - linkage disequilibrium mapping in small, old populations of constant size where the disequilibrium is the result of genetic drift, not founder effect. Theoretical models, empirical data, and simulated population models are presented as evidence for the utility of this approach

    Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets

    Full text link
    Abstract Background The spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants. Results We introduce Helmsman, a program designed to perform mutation signature analysis on arbitrarily large sequencing datasets. Helmsman is up to 300 times faster than existing software. Helmsman’s memory usage is independent of the number of variants, resulting in a small enough memory footprint to analyze datasets that would otherwise exceed the memory limitations of other programs. Conclusions Helmsman is a computationally efficient tool that enables users to evaluate mutational signatures in massive sequencing datasets that are otherwise intractable with existing software. Helmsman is freely available at https://github.com/carjed/helmsman .https://deepblue.lib.umich.edu/bitstream/2027.42/146537/1/12864_2018_Article_5264.pd

    Electronic reconstruction and charge transfer in strained Sr2CoIrO6 double perovskite

    Get PDF
    The electronic, magnetic and optical properties of the double perovskite Sr2_2CoIrO6_6 (SCIO) under biaxial strain are explored in the framework of density functional theory (DFT) including a Hubbard UU term and spin-orbit coupling (SOC) in combination with absorption spectroscopy measurements on epitaxial thin films. While the end member SrIrO3_3 is a semimetal with a quenched spin and orbital moment and bulk SrCoO3_3 is a ferromagnetic (FM) metal with spin and orbital moment of 2.50 and 0.13 μB\mu_{B}, respectively, the double perovskite SCIO emerges as an antiferromagnetic Mott insulator with antiparallel alignment of Co, Ir planes along the [110]-direction. Co exhibits a spin and enhanced orbital moment of ∼2.35−2.45\sim 2.35-2.45 and 0.31−0.31-0.45 μB\mu_{B}, respectively. Most remarkably, Ir acquires a significant spin and orbital moment of 1.21-1.25 and 0.13 μB\mu_{B}, respectively. Analysis of the orbital occupation indicates an electronic reconstruction due to a substantial charge transfer from minority to majority spin states in Ir and from Ir to Co, signaling an Ir4+δ^{4+\delta}, Co4−δ^{4-\delta} configuration. Biaxial strain, varied from -1.02% (aNdGaO3a_{\rm NdGaO_3}) through 0% (aSrTiO3a_{\rm SrTiO_3}) to 1.53% (aGdScO3a_{\rm GdScO_3}), influences in partcular the orbital polarization of the t2gt_{2g} states and leads to a nonmonotonic change of the band gap between 163 and 235 meV. The absorption coefficient reveals a two plateau fearure due to transitions from the valence to the lower lying narrow t2gt_{2g} and the higher lying broader ege_{g} bands. Inclusion of many body effects, in particular, excitonic effects by solving the Bethe-Salpeter equation (BSE), increases the band gap by ∼0.2\sim0.2 and improves the agreement with the measured spectrum concerning the position of the second peak at ∼2.6\sim 2.6 eV.Comment: 11 pages, 10 figure

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Get PDF
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/110906/1/12859_2015_Article_489.pd

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Full text link
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/134735/1/12859_2015_Article_489.pd

    Myocardial T1-mapping at 3T using saturation-recovery: reference values, precision and comparison with MOLLI

    Get PDF
    Background: Myocardial T1-mapping recently emerged as a promising quantitative method for non-invasive tissue characterization in numerous cardiomyopathies. Commonly performed with an inversion-recovery (IR) magnetization preparation at 1.5T, the application at 3T has gained due to increased quantification precision. Alternatively, saturation-recovery (SR) T1-mapping has recently been introduced at 1.5T for improved accuracy. Thus, the purpose of this study is to investigate the robustness and precision of SR T1-mapping at 3T and to establish accurate reference values for native T1-times and extracellular volume fraction (ECV) of healthy myocardium. Methods: Balanced Steady-State Free-Precession (bSSFP) Saturation-Pulse Prepared Heart-rate independent Inversion-REcovery (SAPPHIRE) and Saturation-recovery Single-SHot Acquisition (SASHA) T1-mapping were compared with the Modified Look-Locker inversion recovery (MOLLI) sequence at 3T. Accuracy and precision were studied in phantom. Native and post-contrast T1-times and regional ECV were determined in 20 healthy subjects (10 men, 27 ± 5 years). Subjective image quality, susceptibility artifact rating, in-vivo precision and reproducibility were analyzed. Results: SR T1-mapping showed  0.19; intra: p > 0.09) or consistency (inter: p > 0.07; intra: p > 0.17) between the three methods. Conclusions: Saturation-recovery T1-mapping at 3T yields higher accuracy, comparable inter-subject, inter- and intra-observer variability and less than 30 % precision-loss compared to MOLLI

    Genome-Wide Association of Bipolar Disorder Suggests an Enrichment of Replicable Associations in Regions near Genes

    Get PDF
    Although a highly heritable and disabling disease, bipolar disorder's (BD) genetic variants have been challenging to identify. We present new genotype data for 1,190 cases and 401 controls and perform a genome-wide association study including additional samples for a total of 2,191 cases and 1,434 controls. We do not detect genome-wide significant associations for individual loci; however, across all SNPs, we show an association between the power to detect effects calculated from a previous genome-wide association study and evidence for replication (P = 1.5×10−7). To demonstrate that this result is not likely to be a false positive, we analyze replication rates in a large meta-analysis of height and show that, in a large enough study, associations replicate as a function of power, approaching a linear relationship. Within BD, SNPs near exons exhibit a greater probability of replication, supporting an enrichment of reproducible associations near functional regions of genes. These results indicate that there is likely common genetic variation associated with BD near exons (±10 kb) that could be identified in larger studies and, further, provide a framework for assessing the potential for replication when combining results from multiple studies

    A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    Get PDF
    Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants
    • …
    corecore