212 research outputs found

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Get PDF
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/110906/1/12859_2015_Article_489.pd

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Full text link
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/134735/1/12859_2015_Article_489.pd

    Genetic associations of nonsynonymous exonic variants with psychophysiological endophenotypes

    Full text link
    We mapped ∼85,000 rare nonsynonymous exonic single nucleotide polymorphisms ( SNPs ) to 17 psychophysiological endophenotypes in 4,905 individuals, including antisaccade eye movements, resting EEG , P 300 amplitude, electrodermal activity, affect‐modulated startle eye blink. Nonsynonymous SNPs are predicted to directly change or disrupt proteins encoded by genes and are expected to have significant biological consequences. Most such variants are rare, and new technologies can efficiently assay them on a large scale. We assayed 247,870 mostly rare SNPs on an Illumina exome array. Approximately 85,000 of the SNPs were polymorphic, rare ( MAF  < .05), and nonsynonymous. Single variant association tests identified a SNP in the PARD 3 gene associated with theta resting EEG power. The sequence kernel association test, a gene‐based test, identified a gene PNPLA 7 associated with pleasant difference startle, the difference in startle magnitude between pleasant and neutral images. No other single nonsynonymous variant, or gene‐based group of variants, was strongly associated with any endophenotype.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/109617/1/psyp12349.pd

    Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma

    Full text link
    Asthma is caused by a combination of poorly understood genetic and environmental factors(1,2). We have systematically mapped the effects of single nucleotide polymorphisms ( SNPs) on the presence of childhood onset asthma by genome-wide association. We characterized more than 317,000 SNPs in DNA from 994 patients with childhood onset asthma and 1,243 non-asthmatics, using family and case-referent panels. Here we show multiple markers on chromosome 17q21 to be strongly and reproducibly associated with childhood onset asthma in family and case-referent panels with a combined P value of P < 10(-12). In independent replication studies the 17q21 locus showed strong association with diagnosis of childhood asthma in 2,320 subjects from a cohort of German children (P=0.0003) and in 3,301 subjects from the British 1958 Birth Cohort (P=0.0005). We systematically evaluated the relationships between markers of the 17q21 locus and transcript levels of genes in Epstein - Barr virus (EBV)-transformed lymphoblastoid cell lines from children in the asthma family panel used in our association study. The SNPs associated with childhood asthma were consistently and strongly associated (P < 10(-22)) in cis with transcript levels of ORMDL3, a member of a gene family that encodes transmembrane proteins anchored in the endoplasmic reticulum(3). The results indicate that genetic variants regulating ORMDL3 expression are determinants of susceptibility to childhood asthma.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62682/1/nature06014.pd

    Genome-Wide Mapping of Susceptibility to Coronary Artery Disease Identifies a Novel Replicated Locus on Chromosome 17

    Get PDF
    Coronary artery disease (CAD) is a leading cause of death world-wide, and most cases have a complex, multifactorial aetiology that includes a substantial heritable component. Identification of new genes involved in CAD may inform pathogenesis and provide new therapeutic targets. The PROCARDIS study recruited 2,658 affected sibling pairs (ASPs) with onset of CAD before age 66 y from four European countries to map susceptibility loci for CAD. ASPs were defined as having CAD phenotype if both had CAD, or myocardial infarction (MI) phenotype if both had a MI. In a first study, involving a genome-wide linkage screen, tentative loci were mapped to Chromosomes 3 and 11 with the CAD phenotype (1,464 ASPs), and to Chromosome 17 with the MI phenotype (739 ASPs). In a second study, these loci were examined with a dense panel of grid-tightening markers in an independent set of families (1,194 CAD and 344 MI ASPs). This replication study showed a significant result on Chromosome 17 (MI phenotype; p = 0.009 after adjustment for three independent replication tests). An exclusion analysis suggests that further genes of effect size λ(sib) > 1.24 are unlikely to exist in these populations of European ancestry. To our knowledge, this is the first genome-wide linkage analysis to map, and replicate, a CAD locus. The region on Chromosome 17 provides a compelling target within which to identify novel genes underlying CAD. Understanding the genetic aetiology of CAD may lead to novel preventative and/or therapeutic strategies

    Heterogeneity in Meta-Analyses of Genome-Wide Association Investigations

    Get PDF
    BACKGROUND: Meta-analysis is the systematic and quantitative synthesis of effect sizes and the exploration of their diversity across different studies. Meta-analyses are increasingly applied to synthesize data from genome-wide association (GWA) studies and from other teams that try to replicate the genetic variants that emerge from such investigations. Between-study heterogeneity is important to document and may point to interesting leads. METHODOLOGY/PRINCIPAL FINDINGS: To exemplify these issues, we used data from three GWA studies on type 2 diabetes and their replication efforts where meta-analyses of all data using fixed effects methods (not incorporating between-study heterogeneity) have already been published. We considered 11 polymorphisms that at least one of the three teams has suggested as susceptibility loci for type 2 diabetes. The I2 inconsistency metric (measuring the amount of heterogeneity not due to chance) was different from 0 (no detectable heterogeneity) for 6 of the 11 genetic variants; inconsistency was moderate to very large (I2 = 32-77%) for 5 of them. For these 5 polymorphisms, random effects calculations incorporating between-study heterogeneity revealed more conservative p-values for the summary effects compared with the fixed effects calculations. These 5 associations were perused in detail to highlight potential explanations for between-study heterogeneity. These include identification of a marker for a correlated phenotype (e.g. FTO rs8050136 being associated with type 2 diabetes through its effect on obesity); differential linkage disequilibrium across studies of the identified genetic markers with the respective culprit polymorphisms (e.g., possibly the case for CDKAL1 polymorphisms or for rs9300039 and markers in linkage disequilibrium, as shown by additional studies); and potential bias. Results were largely similar, when we treated the discovery and replication data from each GWA investigation as separate studies. SIGNIFICANCE: Between-study heterogeneity is useful to document in the synthesis of data from GWA investigations and can offer valuable insights for further clarification of gene-disease associations

    Meta-analysis of genome-wide association studies from the CHARGE consortium identifies common variants associated with carotid intima media thickness and plaque

    Get PDF
    Carotid intima media thickness (cIMT) and plaque determined by ultrasonography are established measures of subclinical atherosclerosis that each predicts future cardiovascular disease events. We conducted a meta-analysis of genome-wide association data in 31,211 participants of European ancestry from nine large studies in the setting of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. We then sought additional evidence to support our findings among 11,273 individuals using data from seven additional studies. In the combined meta-analysis, we identified three genomic regions associated with common carotid intima media thickness and two different regions associated with the presence of carotid plaque (P < 5 × 10 -8). The associated SNPs mapped in or near genes related to cellular signaling, lipid metabolism and blood pressure homeostasis, and two of the regions were associated with coronary artery disease (P < 0.006) in the Coronary Artery Disease Genome-Wide Replication and Meta-Analysis (CARDIoGRAM) consortium. Our findings may provide new insight into pathways leading to subclinical atherosclerosis and subsequent cardiovascular events
    corecore