190 research outputs found

    The Decay of Disease Association with Declining Linkage Disequilibrium: A Fine Mapping Theorem

    Get PDF
    Several important and fundamental aspects of disease genetics models have yet to be described. One such property is the relationship of disease association statistics at a marker site closely linked to a disease causing site. A complete description of this two-locus system is of particular importance to experimental efforts to fine map association signals for complex diseases. Here, we present a simple relationship between disease association statistics and the decline of linkage disequilibrium from a causal site. Specifically, the ratio of Chi-square disease association statistics at a marker site and causal site is equivalent to the standard measure of pairwise linkage disequilibrium, r2. A complete derivation of this relationship from a general disease model is shown. Quite interestingly, this relationship holds across all modes of inheritance. Extensive Monte Carlo simulations using a disease genetics model applied to chromosomes subjected to a standard model of recombination are employed to better understand the variation around this fine mapping theorem due to sampling effects. We also use this relationship to provide a framework for estimating properties of a non-interrogated causal site using data at closely linked markers. Lastly, we apply this way of examining association data from high-density genotyping in a large, publicly-available data set investigating extreme BMI. We anticipate that understanding the patterns of disease association decay with declining linkage disequilibrium from a causal site will enable more powerful fine mapping methods and provide new avenues for identifying causal sites/genes from fine-mapping studies

    Utilizing Genotype Imputation for the Augmentation of Sequence Data

    Get PDF
    In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible

    A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants

    Get PDF
    Advanced age-related macular degeneration (AMD) is the leading cause of blindness in the elderly with limited therapeutic options. Here, we report on a study of \u3e12 million variants including 163,714 directly genotyped, most rare, protein-altering variant. Analyzing 16,144 patients and 17,832 controls, we identify 52 independently associated common and rare variants (P \u3c 5×10–8) distributed across 34 loci. While wet and dry AMD subtypes exhibit predominantly shared genetics, we identify the first signal specific to wet AMD, near MMP9 (difference-P = 4.1×10–10). Very rare coding variants (frequency \u3c 0.1%) in CFH, CFI, and TIMP3 suggest causal roles for these genes, as does a splice variant in SLC16A8. Our results support the hypothesis that rare coding variants can pinpoint causal genes within known genetic loci and illustrate that applying the approach systematically to detect new loci requires extremely large sample sizes

    Methods for meta‐analysis of multiple traits using GWAS summary statistics

    Full text link
    Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142462/1/gepi22105_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142462/2/gepi22105.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142462/3/gepi22105-sup-0001-SuppMat.pd

    SemEHR:A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research

    Get PDF
    OBJECTIVE: Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management, and trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search and analytics tool for EHRs. METHODS: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualized mentions of a wide range of biomedical concepts within EHRs. Natural language processing annotations are further assembled at the patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data are serviced via ontology-based search and analytics interfaces. RESULTS: SemEHR has been deployed at a number of UK hospitals, including the Clinical Record Interactive Search, an anonymized replica of the EHR of the UK South London and Maudsley National Health Service Foundation Trust, one of Europe's largest providers of mental health services. In 2 Clinical Record Interactive Search-based studies, SemEHR achieved 93% (hepatitis C) and 99% (HIV) F-measure results in identifying true positive patients. At King's College Hospital in London, as part of the CogStack program (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100 000 Genomes Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast at searching phenotypes; time for recruitment criteria checking was reduced from days to minutes. Validated on open intensive care EHR data, Medical Information Mart for Intensive Care III, the vital signs extracted by SemEHR can achieve around 97% accuracy. CONCLUSION: Results from the multiple case studies demonstrate SemEHR's efficiency: weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of patients, bringing in more and unexpected insight compared to study-oriented bespoke IE systems. SemEHR is open source, available at https://github.com/CogStack/SemEHR

    A polygenic and phenotypic risk prediction for polycystic ovary syndrome evaluated by phenomewide association studies

    Get PDF
    Context: As many as 75% of patients with polycystic ovary syndrome (PCOS) are estimated tobe unidentified in clinical practice. Objective: Utilizing polygenic risk prediction, we aim to identify the phenome-widecomorbidity patterns characteristic of PCOS to improve accurate diagnosis and preventivetreatment.Design, Patients, and Methods: Leveraging the electronic health records (EHRs) of 124 852individuals, we developed a PCOS risk prediction algorithm by combining polygenic risk scores(PRS) with PCOS component phenotypes into a polygenic and phenotypic risk score (PPRS). Weevaluated its predictive capability across different ancestries and perform a PRS-based phenomewide association study (PheWAS) to assess the phenomic expression of the heightened risk ofPCOS.Results: The integrated polygenic prediction improved the average performance (pseudo-R2)for PCOS detection by 0.228 (61.5-fold), 0.224 (58.8-fold), 0.211 (57.0-fold) over the null modelacross European, African, and multi-ancestry participants respectively. The subsequent PRSpowered PheWAS identified a high level of shared biology between PCOS and a range ofmetabolic and endocrine outcomes, especially with obesity and diabetes: "morbid obesity","type 2 diabetes", "hypercholesterolemia", "disorders of lipid metabolism", "hypertension",and "sleep apnea" reaching phenome-wide significance.Conclusions: Our study has expanded the methodological utility of PRS in patient stratificationand risk prediction, especially in a multifactorial condition like PCOS, across different geneticorigins. By utilizing the individual genome-phenome data available from the EHR, our approachalso demonstrates that polygenic prediction by PRS can provide valuable opportunities todiscover the pleiotropic phenomic network associated with PCOS pathogenesis.Abbreviations: AA, African ancestry; ANOVA, analysis of variance; BMI, body mass index; EA,European ancestry; EHR, electronic health records; eMERGE, electronic Medical Records andGenomics Network; GWAS, genome-wide association study; IBD, identity-by-descent; ICDCM, International Classification of Diseases, Clinical Modification; LD, linkage disequilibrium;MA, multi-ancestry; MAF, minor allele frequency; NIH, National Institutes of Health; PCA,principal component analysis; PheWAS, phenome-wide association study; PCOS, polycysticovary syndrome; PPRS, polygenic and phenotypic risk score; PRS, polygenic risk sc

    Genome-wide linkage analysis of 1,233 prostate cancer pedigrees from the International Consortium for prostate cancer Genetics using novel sumLINK and sumLOD analyses

    Full text link
    BACKGROUND Prostate cancer (PC) is generally believed to have a strong inherited component, but the search for susceptibility genes has been hindered by the effects of genetic heterogeneity. The recently developed sumLINK and sumLOD statistics are powerful tools for linkage analysis in the presence of heterogeneity. METHODS We performed a secondary analysis of 1,233 PC pedigrees from the International Consortium for Prostate Cancer Genetics (ICPCG) using two novel statistics, the sumLINK and sumLOD. For both statistics, dominant and recessive genetic models were considered. False discovery rate (FDR) analysis was conducted to assess the effects of multiple testing. RESULTS Our analysis identified significant linkage evidence at chromosome 22q12, confirming previous findings by the initial conventional analyses of the same ICPCG data. Twelve other regions were identified with genome-wide suggestive evidence for linkage. Seven regions (1q23, 5q11, 5q35, 6p21, 8q12, 11q13, 20p11–q11) are near loci previously identified in the initial ICPCG pooled data analysis or the subset of aggressive PC pedigrees. Three other regions (1p12, 8p23, 19q13) confirm loci reported by others, and two (2p24, 6q27) are novel susceptibility loci. FDR testing indicates that over 70% of these results are likely true positive findings. Statistical recombinant mapping narrowed regions to an average of 9 cM. CONCLUSIONS Our results represent genomic regions with the greatest consistency of positive linkage evidence across a very large collection of high-risk PC pedigrees using new statistical tests that deal powerfully with heterogeneity. These regions are excellent candidates for further study to identify PC predisposition genes. Prostate 70: 735–744, 2010. © 2010 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/71371/1/21106_ftp.pd

    Associations between tamoxifen, estrogens, and FSH serum levels during steady state tamoxifen treatment of postmenopausal women with breast cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The cytochrome P450 (CYP) enzymes 2C19, 2D6, and 3A5 are responsible for converting the selective estrogen receptor modulator (SERM), tamoxifen to its active metabolites 4-hydroxy-tamoxifen (4OHtam) and 4-hydroxy-<it>N</it>-demethyltamoxifen (4OHNDtam, endoxifen). Inter-individual variations of the activity of these enzymes due to polymorphisms may be predictors of outcome of breast cancer patients during tamoxifen treatment. Since tamoxifen and estrogens are both partly metabolized by these enzymes we hypothesize that a correlation between serum tamoxifen and estrogen levels exists, which in turn may interact with tamoxifen on treatment outcome. Here we examined relationships between the serum levels of tamoxifen, estrogens, follicle-stimulating hormone (FSH), and also determined the genotypes of CYP2C19, 2D6, 3A5, and SULT1A1 in 90 postmenopausal breast cancer patients.</p> <p>Methods</p> <p>Tamoxifen and its metabolites were measured by liquid chromatography-tandem mass spectrometry. Estrogen and FSH levels were determined using a sensitive radio- and chemiluminescent immunoassay, respectively.</p> <p>Results</p> <p>We observed significant correlations between the serum concentrations of tamoxifen, <it>N</it>-dedimethyltamoxifen, and tamoxifen-<it>N</it>-oxide and estrogens (p < 0.05). The genotype predicted CYP2C19 activity influenced the levels of both tamoxifen metabolites and E1.</p> <p>Conclusions</p> <p>We have shown an association between tamoxifen and its metabolites and estrogen serum levels. An impact of CYP2C19 predicted activity on tamoxifen, as well as estrogen kinetics may partly explain the observed association between tamoxifen and its metabolites and estrogen serum levels. Since the role of estrogen levels during tamoxifen therapy is still a matter of debate further prospective studies to examine the effect of tamoxifen and estrogen kinetics on treatment outcome are warranted.</p
    corecore