33 research outputs found

    Decoding of Superimposed Traces Produced by Direct Sequencing of Heterozygous Indels

    Get PDF
    Direct Sanger sequencing of a diploid template containing a heterozygous insertion or deletion results in a difficult-to-interpret mixed trace formed by two allelic traces superimposed onto each other. Existing computational methods for deconvolution of such traces require knowledge of a reference sequence or the availability of both direct and reverse mixed sequences of the same template. We describe a simple yet accurate method, which uses dynamic programming optimization to predict superimposed allelic sequences solely from a string of letters representing peaks within an individual mixed trace. We used the method to decode 104 human traces (mean length 294 bp) containing heterozygous indels 5 to 30 bp with a mean of 99.1% bases per allelic sequence reconstructed correctly and unambiguously. Simulations with artificial sequences have demonstrated that the method yields accurate reconstructions when (1) the allelic sequences forming the mixed trace are sufficiently similar, (2) the analyzed fragment is significantly longer than the indel, and (3) multiple indels, if present, are well-spaced. Because these conditions occur in most encountered DNA sequences, the method is widely applicable. It is available as a free Web application Indelligent at http://ctap.inhs.uiuc.edu/dmitriev/indel.asp

    A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci

    Get PDF
    Common variant genome-wide association studies (GWASs) have, to date, identified >24 risk loci for Parkinson's disease (PD). To discover additional loci, we carried out a GWAS comparing 6,476 PD cases with 302,042 controls, followed by a meta-analysis with a recent study of over 13,000 PD cases and 95,000 controls at 9,830 overlapping variants. We then tested 35 loci (P < 1 × 10−6) in a replication cohort of 5,851 cases and 5,866 controls. We identified 17 novel risk loci (P < 5 × 10−8) in a joint analysis of 26,035 cases and 403,190 controls. We used a neurocentric strategy to assign candidate risk genes to the loci. We identified protein-altering or cis–expression quantitative trait locus (cis-eQTL) variants in linkage disequilibrium with the index variant in 29 of the 41 PD loci. These results indicate a key role for autophagy and lysosomal biology in PD risk, and suggest potential new drug targets for PD

    Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    Get PDF
    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu

    The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling

    Get PDF
    Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator

    Genomewide Association Study for Determinants of HIV-1 Acquisition and Viral Set Point in HIV-1 Serodiscordant Couples with Quantified Virus Exposure

    Get PDF
    Host genetic factors may be important determinants of HIV-1 sexual acquisition. We performed a genome-wide association study (GWAS) for host genetic variants modifying HIV-1 acquisition and viral control in the context of a cohort of African HIV-1 serodiscordant heterosexual couples. To minimize misclassification of HIV-1 risk, we quantified HIV-1 exposure, using data including plasma HIV-1 concentrations, gender, and condom use.We matched couples without HIV-1 seroconversion to those with seroconversion by quantified HIV-1 exposure risk. Logistic regression of single nucleotide polymorphisms (SNPs) for 798 samples from 496 HIV-1 infected and 302 HIV-1 exposed, uninfected individuals was performed to identify factors associated with HIV-1 acquisition. In addition, a linear regression analysis was performed using SNP data from a subset (n = 403) of HIV-1 infected individuals to identify factors predicting plasma HIV-1 concentrations.After correcting for multiple comparisons, no SNPs were significantly associated with HIV-1 infection status or plasma HIV-1 concentrations.This GWAS controlling for HIV-1 exposure did not identify common host genotypes influencing HIV-1 acquisition. Alternative strategies, such as large-scale sequencing to identify low frequency variation, should be considered for identifying novel host genetic predictors of HIV-1 acquisition

    Predisposition to Cancer Caused by Genetic and Functional Defects of Mammalian Atad5

    Get PDF
    ATAD5, the human ortholog of yeast Elg1, plays a role in PCNA deubiquitination. Since PCNA modification is important to regulate DNA damage bypass, ATAD5 may be important for suppression of genomic instability in mammals in vivo. To test this hypothesis, we generated heterozygous (Atad5+/m) mice that were haploinsuffficient for Atad5. Atad5+/m mice displayed high levels of genomic instability in vivo, and Atad5+/m mouse embryonic fibroblasts (MEFs) exhibited molecular defects in PCNA deubiquitination in response to DNA damage, as well as DNA damage hypersensitivity and high levels of genomic instability, apoptosis, and aneuploidy. Importantly, 90% of haploinsufficient Atad5+/m mice developed tumors, including sarcomas, carcinomas, and adenocarcinomas, between 11 and 20 months of age. High levels of genomic alterations were evident in tumors that arose in the Atad5+/m mice. Consistent with a role for Atad5 in suppressing tumorigenesis, we also identified somatic mutations of ATAD5 in 4.6% of sporadic human endometrial tumors, including two nonsense mutations that resulted in loss of proper ATAD5 function. Taken together, our findings indicate that loss-of-function mutations in mammalian Atad5 are sufficient to cause genomic instability and tumorigenesis

    A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose.

    Get PDF
    We report the first genome-wide association study (GWAS) whose sample size (1,053 Swedish subjects) is sufficiently powered to detect genome-wide significance (p<1.5 x 10(-7)) for polymorphisms that modestly alter therapeutic warfarin dose. The anticoagulant drug warfarin is widely prescribed for reducing the risk of stroke, thrombosis, pulmonary embolism, and coronary malfunction. However, Caucasians vary widely (20-fold) in the dose needed for therapeutic anticoagulation, and hence prescribed doses may be too low (risking serious illness) or too high (risking severe bleeding). Prior work established that approximately 30% of the dose variance is explained by single nucleotide polymorphisms (SNPs) in the warfarin drug target VKORC1 and another approximately 12% by two non-synonymous SNPs (*2, *3) in the cytochrome P450 warfarin-metabolizing gene CYP2C9. We initially tested each of 325,997 GWAS SNPs for association with warfarin dose by univariate regression and found the strongest statistical signals (p<10(-78)) at SNPs clustering near VKORC1 and the second lowest p-values (p<10(-31)) emanating from CYP2C9. No other SNPs approached genome-wide significance. To enhance detection of weaker effects, we conducted multiple regression adjusting for known influences on warfarin dose (VKORC1, CYP2C9, age, gender) and identified a single SNP (rs2108622) with genome-wide significance (p = 8.3 x 10(-10)) that alters protein coding of the CYP4F2 gene. We confirmed this result in 588 additional Swedish patients (p<0.0029) and, during our investigation, a second group provided independent confirmation from a scan of warfarin-metabolizing genes. We also thoroughly investigated copy number variations, haplotypes, and imputed SNPs, but found no additional highly significant warfarin associations. We present power analysis of our GWAS that is generalizable to other studies, and conclude we had 80% power to detect genome-wide significance for common causative variants or markers explaining at least 1.5% of dose variance. These GWAS results provide further impetus for conducting large-scale trials assessing patient benefit from genotype-based forecasting of warfarin dose
    corecore