374 research outputs found

    An evaluation of the replicate pool method: quick estimation of genome-wide linkage peak p -values

    Full text link
    The calculation of empirical p -values for genome-wide non-parametric linkage tests continues to present significant computational challenges for many complex disease mapping studies. The gold standard approach is to use gene dropping to simulate null genome scans. Unfortunately, this approach is too computationally expensive for many data sets of interest. An alternative, more efficient method for sampling null genome scans is to pre-calculate pools of family-specific statistics and then resample from these replicate pools to generate “pseudo-replicate” genome scans. In this study, we use simulations to explore properties of the replicate pool p -value estimator [pcirc] RP and show that it provides an excellent approximation to the traditional gene-dropping estimator for significantly less computational effort. While the computational efficiency of the replicate pool estimator is noticeable in almost all data sets, by applying the replicate pool method to several previously characterized data sets we show that savings in computational effort can be especially significant (on the order of 10,000-fold or more) when one or more large families are analyzed. We also estimate replicate pool p -values for the schizophrenia data described by Abecasis et al. and show that [pcirc] RP closely approximates gene-drop p -values for all linkage peaks reported for this study. Lastly, we expand upon Song et al.'s previous work by deriving a conservative estimator of the variance for [pcirc] RP that can easily be computed in practical settings.We have implemented the replicate pool method along with our variance estimator in a new program called Pseudo, which is the first widely available automated implementation of the replicate pool method. Genet . Epidemiol . 30, 2006. © 2006 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/50657/1/20147_ftp.pd

    Relative impact of indels versus SNPs on complex disease

    Full text link
    It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single‐nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well‐phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping‐associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/1/gepi22175_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/2/gepi22175-sup-0001-Gagliano-Supplementary.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/3/gepi22175.pd

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Get PDF
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/110906/1/12859_2015_Article_489.pd

    Comparing variant calling algorithms for target-exon sequencing in a large sample

    Full text link
    Abstract Background Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. Results Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. Conclusions We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.http://deepblue.lib.umich.edu/bitstream/2027.42/134735/1/12859_2015_Article_489.pd

    Genomic variation in a global village: Report of the 10th annual Human Genome Variation Meeting 2008

    Full text link
    The Centre for Applied Genomics of the Hospital for Sick Children and the University of Toronto hosted the 10th Human Genome Variation (HGV) Meeting in Toronto, Canada, in October 2008, welcoming about 240 registrants from 34 countries. During the 3 days of plenary workshops, keynote address, and poster sessions, a strong cross-disciplinary trend was evident, integrating expertise from technology and computation, through biology and medicine, to ethics and law. Single nucleotide polymorphisms (SNPs) as well as the larger copy number variants (CNVs) are recognized by ever-improving array and next-generation sequencing technologies, and the data are being incorporated into studies that are increasingly genome-wide as well as global in scope. A greater challenge is to convert data to information, through databases, and to use the information for greater understanding of human variation. In the wake of publications of the first individual genome sequences, an inaugural public forum provided the opportunity to debate whether we are ready for personalized medicine through direct-to-consumer testing. The HGV meetings foster collaboration, and fruits of the interactions from 2008 are anticipated for the 11th annual meeting in September 2009. Hum Mutat 30:1–5, 2009. © 2009 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63049/1/21008_ftp.pd

    AbCD: arbitrary coverage design for sequencing-based genetic studies

    Get PDF
    Summary: Recent advances in sequencing technologies have revolutionized genetic studies. Although high-coverage sequencing can uncover most variants present in the sequenced sample, low-coverage sequencing is appealing for its cost effectiveness. Here, we present AbCD (arbitrary coverage design) to aid the design of sequencing-based studies. AbCD is a user-friendly interface providing pre-estimated effective sample sizes, specific to each minor allele frequency category, for designs with arbitrary coverage (0.5–30×) and sample size (20–10 000), and for four major ethnic groups (Europeans, Africans, Asians and African Americans). In addition, we also present two software tools: ShotGun and DesignPlanner, which were used to generate the estimates behind AbCD. ShotGun is a flexible short-read simulator for arbitrary user-specified read length and average depth, allowing cycle-specific sequencing error rates and realistic read depth distributions. DesignPlanner is a full pipeline that uses ShotGun to generate sequence data and performs initial SNP discovery, uses our previously presented linkage disequilibrium-aware method to call genotypes, and, finally, provides minor allele frequency-specific effective sample sizes. ShotGun plus DesignPlanner can accommodate effective sample size estimate for any combination of high-depth and low-depth data (for example, whole-genome low-depth plus exonic high-depth) or combination of sequence and genotype data [for example, whole-exome sequencing plus genotyping from existing Genomewide Association Study (GWAS)]

    Molecular Dissection of Psoriasis: Integrating Genetics and Biology

    Get PDF
    Psoriasis is a common and debilitating disease of the skin, nails, and joints, with an acknowledged but complex genetic basis. Early genome-wide linkage studies of psoriasis focused on segregation of microsatellite markers in families; however, the only locus consistently identified resided in the major histocompatibility complex. Subsequently, several groups mapped this locus to the vicinity of HLA-C, and two groups have reported HLA-Cw6 itself to be the major susceptibility allele. More recently, the development of millions of single-nucleotide polymorphisms, coupled with the development of high-throughput genotyping platforms and a comprehensive map of human haplotypes, has made possible a genome-wide association approach using cases and controls rather than families. Taking advantage of these developments, we participated in a collaborative genome-wide association study of psoriasis involving thousands of cases and controls. Initial analysis of these data revealed and/or confirmed association between psoriasis and seven genetic loci—HLA-C, IL12B, IL23R, IL23A, IL4/IL13, TNFAIP3, and TNIP1—and ongoing studies are revealing additional loci. Here, we review the epidemiology, immunopathology, and genetics of psoriasis, and present a disease model integrating its genetics and immunology
    • 

    corecore