5,079 research outputs found

    Replication in Genome-Wide Association Studies

    Full text link
    Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication, issues of heterogeneity, advantages and disadvantages of different methods of data synthesis across multiple studies, frequentist vs. Bayesian inferences for replication, and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    ๊ฐ‘์ƒ์„ ์•”๊ณผ ๊ฒฐ์ ˆ์— ๋Œ€ํ•œ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ๋ฐ ๋ฐœํ˜„ ์–‘์  ํ˜•์งˆ ์œ ์ „์ž์ขŒ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์˜๊ณผ๋Œ€ํ•™ ์˜ํ•™๊ณผ,2020. 2. ๋ฐ•์˜์ฃผ.Thyroid cancer is the most common endocrine cancer and thyroid nodule is most common endocrine problem in Korea. Both phenotypes show a high degree of heritability. Several genome-wide association studies (GWAS) for thyroid cancer were conducted in European descendants and identified susceptibility loci of differentiated thyroid cancer (DTC). However, there is no GWAS for thyroid cancer in Asian population, and inherited genetic risk factors for thyroid nodules and their associations with thyroid cancer remain unknown. Here, GWAS and replication study was performed using a total of 1,085 DTC cases and 8,884 controls of Koreans and these results were validated with an expression quantitative trait loci (eQTL) analysis and clinical phenotypes. The most robust associations were observed in the NRG1 gene (rs6996585, P=1.08ร—10-10), and this SNP was also associated with NRG1 expression in thyroid tissues. In addition, three previously reported loci (FOXE1, NKX2-1, and DIRC3) were confirmed and seven susceptibility loci (VAV3, PCNXL2,INSR, MRSB3, FHIT, SEPT11, and SLC24A6) associated with DTC were newly identified. Furthermore, I identified specific variants of DTC that have different effects according to the cancer type or ethnicity. Furthermore, a three-stage GWAS for thyroid nodules was performed. The discovery stage involved a genome-wide scan of 811 subjects with thyroid nodules and 691 subjects with a normal thyroid from a population-based cohort. Replication studies were conducted in an additional 1981 cases and 3100 controls from the participants of a health check-up. Expression quantitative trait loci (eQTL) analysis was also performed using public data. The most robust association was observed in TRPM3 (rs4745021) in the joint analysis (OR=1.26, P = 6.12 ร— 10-8) and meta-analysis (OR = 1.28, P = 2.11ร—10-8). Signals at MBIP/NKX2-1 were replicated but did not reach genome-wide significance in the joint analysis (rs2415317; P = 4.62 ร— 10-5, rs944289; P = 8.68 ร— 10-5). The eQTL analysis showed that TRPM3 expression was associated with the rs4745021 genotype in thyroid tissues. The results of GWAS for DTC provide deeper insight into the genetic contribution to thyroid cancer in different populations. And GWAS for thyroid nodule suggest that thyroid nodules have a genetic predisposition distinct from that of thyroid cancer.๊ฐ‘์ƒ์„ ์•”์€ ํ•œ๊ตญ์—์„œ ๊ฐ€์žฅ ํ”ํ•œ ๋‚ด๋ถ„๋น„์•”์ด๋ฉฐ ๊ฐ‘์ƒ์„  ๊ฒฐ์ ˆ์€ ๊ฐ€์žฅ ํ”ํ•œ ๋‚ด๋ถ„๋น„ ์งˆํ™˜์ด๋‹ค. ๋‘๊ฐ€์ง€ ์งˆํ™˜ ๋ชจ๋‘ ๋†’์€ ์œ ์ „์„ฑ์„ ๋ณด์ธ๋‹ค. ๋ช‡๋ช‡์˜ ๊ฐ‘์ƒ์„ ์•”์— ๋Œ€ํ•œ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ์—ฐ๊ตฌ๊ฐ€ ์„œ์–‘์ธ๋“ค์—๊ฒŒ์„œ ์ด๋ฃจ์–ด์กŒ๊ณ , ๋ถ„ํ™”๊ฐ‘์ƒ์„ ์•”์— ๋Œ€ํ•œ ๊ฐ์ˆ˜์„ฑ ์œ ์ „์ž์ขŒ๋ฅผ ๋ฐœ๊ตดํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•„์‹œ์•„์ธ์— ๋Œ€ํ•œ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ์—ฐ๊ตฌ๋Š” ์ˆ˜ํ–‰๋œ ๋ฐ” ์—†์œผ๋ฉฐ, ๊ฐ‘์ƒ์„  ๊ฒฐ์ ˆ์— ๋Œ€ํ•œ ์œ ์ „์  ์—ฐ๊ตฌ๋Š” ์—†์—ˆ์œผ๋ฉฐ ์ด์™€ ๊ด€๋ จ๋œ ์œ ์ „์ž์™€ ๊ฐ‘์ƒ์„ ์•”๊ณผ์˜ ๊ด€๋ จ์„ฑ๋„ ์—ฌ์ „ํžˆ ์•Œ ์ˆ˜ ์—†๋Š” ์ƒํƒœ์ด๋‹ค. ๋”ฐ๋ผ์„œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” 1,085 ๋ช…์˜ ๋ถ„ํ™” ๊ฐ‘์ƒ์„ ์•”๊ณผ 8,884 ๋ช…์˜ ๋Œ€์กฐ๊ตฐ์œผ๋กœ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ๋ถ„์„ ๋ฐ ์žฌํ˜„ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐœํ˜„ ์–‘์  ํ˜•์งˆ ์œ ์ „์ž์ขŒ ์—ฐ๊ตฌ ๋ฐ ์ž„์ƒ ๋ฐœํ˜„ํ˜•์งˆ์„ ํ†ตํ•ด์„œ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ๊ฐ€์žฅ ๋šœ๋ ทํ•œ ๊ด€๋ จ์„ฑ์€ ๋ณด์ด๋Š” ์œ ์ „์ž์ขŒ๋Š” NRG1 ์œ ์ „์ž์˜€์œผ๋ฉฐ (rs6996585, P=1.08ร—10-10), ์ด SNP ์€ NRG1 ์˜ ๋ฐœํ˜„๊ณผ๋„ ๊ด€๋ จ์„ฑ์ด ์žˆ์—ˆ๋‹ค. ๋ถ€๊ฐ€์ ์œผ๋กœ ์ด์ „์— ๋ณด๊ณ ๋˜์—ˆ๋˜ ์œ ์ „์ž์ขŒ (FOXE1, NKX2-1, DIRC3)๋ฅผ ํ™•์ธํ•˜์˜€์œผ๋ฉฐ 7 ๊ฐœ์˜ ์œ ์ „์ž์ขŒ (VAV3, PCNXL2, INSR, MRSB3, FHIT, SEPT11, SLC24A6)๋ฅผ ์ƒˆ๋กญ๊ฒŒ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ถ„ํ™”๊ฐ‘์ƒ์„ ์•”๊ณผ ๊ด€๋ จ๋œ ์œ ์ „๋ณ€์ด๊ฐ€ ์•”์˜ ์ข…๋ฅ˜ ๋ฐ ์ธ์ข…์— ๋”ฐ๋ผ์„œ ๋‹ค๋ฅธ ์˜ํ–ฅ์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ ๊ฐ‘์ƒ์„ ๊ฒฐ์ ˆ์— ๋Œ€ํ•œ 3 ๋‹จ๊ณ„์˜ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ๋ถ„์„์„ ์‹œํ–‰ํ•˜์˜€๋‹ค. ๋ฐœ๊ฒฌ ๋‹จ๊ณ„์˜ ์ „์žฅ ์œ ์ „์ฒด ์Šค์บ”์„ ์ธ๊ตฌ ๊ธฐ๋ฐ˜ ์ฝ”ํ˜ธํŠธ์˜ 811 ๋ช…์˜ ๊ฐ‘์ƒ์„  ๊ฒฐ์ ˆ๊ตฐ๊ณผ 691 ๋ช…์˜ ์ •์ƒ ๊ฐ‘์ƒ์„ ๊ตฐ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ์žฌํ˜„ ์—ฐ๊ตฌ๋Š” ๊ฑด๊ฐ•๊ฒ€์ง„ ๋Œ€์ƒ์ž์—์„œ ์ด 1981 ๋ช…์˜ ๊ฒฐ์ ˆ๊ตฐ๊ณผ 3100 ๋ช…์˜ ์ •์ƒ๊ตฐ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ์œผ๋ฉฐ ๋ฐœํ˜„ ์–‘์  ํ˜•์งˆ ์œ ์ „์ž์ขŒ ๋ถ„์„๋„ ๊ณต๊ณต๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ๊ฐ€์žฅ ์œ ์˜ํ•œ ๊ด€๋ จ์„ฑ์€ ๊ฒฐํ•ฉ๋ถ„์„ (OR=1.26, P = 6.12 ร— 10-8) ๋ฐ ๋ฉ”ํƒ€๋ถ„์„ (OR = 1.28, P = 2.11ร—10-8) ๊ฒฐ๊ณผ TRPM3 (rs4745021) ์œ ์ „์ž์—์„œ ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. MBIP/NKX2-1 ๋ณ€์ด๋Š” ์žฌํ˜„์ด ๋˜์—ˆ์œผ๋‚˜ ์ „์žฅ ์œ ์ „์ฒด ์œ ์˜์„ฑ์„ ๋ณด์ด์ง€ ๋ชปํ–ˆ๋‹ค. ๋ฐœํ˜„ ์–‘์  ํ˜•์งˆ ์œ ์ „์ž์ขŒ ๋ถ„์„์—์„œ TRPM3 ์˜ ๋ฐœํ˜„์€ ๊ฐ‘์ƒ์„ ์กฐ์ง์—์„œ rs4745021 ์œ ์ „์žํ˜•๊ณผ ๊ด€๋ จ์„ฑ์ด ์žˆ์—ˆ๋‹ค. ๋ถ„ํ™”๊ฐ‘์ƒ์„ ์•”์— ๋Œ€ํ•œ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ด€ ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” ๊ฐ‘์ƒ์„ ์•”์˜ ๋ฐœ์ƒ์—์„œ ์œ ์ „์  ๊ธฐ์—ฌ์— ๋Œ€ํ•œ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์—ˆ์œผ๋ฉฐ, ๊ฐ‘์ƒ์„  ๊ฒฐ์ ˆ์— ๋Œ€ํ•œ ์ „์žฅ ์œ ์ „์ฒด ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๊ฐ‘์ƒ์„  ๊ฒฐ์ ˆ์€ ๊ฐ‘์ƒ์„ ์•”๊ณผ ์ฐจ๋ณ„๋˜๋Š” ์œ ์ „์  ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.Introduction 1 1. Epidemiology of thyroid cancer 1 2. Risk factors of differentiated thyroid cancer 1 3. Heritability of differentiated thyroid cancer 3 4. Familial syndromes associated with thyroid cancer and germline mutation of differentiated thyroid cancer 4 5. Epidemiology of thyroid nodule 4 6. Clinical significance and heritability of thyroid nodule 4 7. Genome-wide association study for differentiated thyroid cancer 5 8. Genetic studies for thyroid nodule 6 9. Hypothesis 9 10. Aims of study 9 Chapter I. Genome-wide association and expression quantitative trait loci studies for thyroid cancer 10 Materials and methods 11 Study participants for the Stage 1 genome scan 11 Study participants for the Stage 2 follow-up 11 Discovery SNP genotyping and imputation 15 Replication SNP selection and genotyping 16 RNA sequencing and eQTL analysis 18 Statistical analysis 18 Ethics statement 20 Results 21 Stage 1 genome scan 21 Stage 2 follow-up and joint Stages 1 and 2 analyses 24 Validation of the candidate SNPs with cis-eQTL and GSEA analyses 29 Association between candidate SNPs and clinical phenotypes 35 The most significantly associated variant in the NRG1 locus 38 Other known associated variants in the NKX2-1, DIRC3, or FOXE1 loci 44 Novel candidate variants in the VAV3, PCNXL2, INSR, MRSB3, FHIT or SEPT11 loci 48 A comparison with the European GWAS results 51 Chapter II. Genome-wide association and expression quantitative trait loci studies for thyroid nodule 55 Materials and methods 56 Discovery series and thyroid ultrasonography 56 First replication series and ultrasonography 59 Second replication 59 Discovery GWAS and Imputation 60 Candidate SNP and genotyping of first replication 61 Genotyping of second replication 64 Comparison of allele frequencies between DTC, thyroid nodules, and normal thyroid 64 Expression quantitative trait loci analysis 64 Statistical analysis 65 Ethics statement 66 Results 67 Discovery GWAS 67 Replication studies, joint analysis and meta-analysis 71 Comparison of allele frequencies between DTC, thyroid nodules, and normal thyroid 76 Expression quantitative trait loci analysis 80 Discussion 82 GWAS for DTC 82 GWAS for Thyroid nodule 92 Summary and conclusions 100 References 101 Abstract in Korean 116Docto

    The Role of Environmental Heterogeneity in Metaโ€Analysis of Geneโ€“Environment Interactions With Quantitative Traits

    Full text link
    With challenges in data harmonization and environmental heterogeneity across various data sources, metaโ€analysis of geneโ€“environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixedโ€effect metaโ€analysis: the standard inverseโ€variance weighted metaโ€analysis and a metaโ€regression approach. Akin to the results in Simmonds and Higgins ( ), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between metaโ€analysis and metaโ€regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverseโ€variance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through metaโ€analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on highโ€density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/107543/1/gepi21810.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/107543/2/gepi21810-sup-0001-appendix.pd

    Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models

    Full text link
    In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate Fโ€distribution tests based on Pillaiโ€“Bartlett trace, Hotellingโ€“Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate Fโ€distribution tests provide much more significant results than those of Fโ€tests of univariate analysis and optimal sequence kernel association test (SKATโ€O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate Fโ€distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate Fโ€distribution tests provide much more significant results than those of Fโ€tests of univariate analysis and SKATโ€O for the three biochemical traits. The approximate Fโ€distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKATโ€O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKATโ€O in the univariate case.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111259/1/gepi21895.pd

    Heritability enrichment of immunoglobulin G N-glycosylation in specific tissues

    Get PDF
    Genome-wide association studies (GWAS) have identified over 60 genetic loci associated with immunoglobulin G (IgG) N-glycosylation; however, the causal genes and their abundance in relevant tissues are uncertain. Leveraging data from GWAS summary statistics for 8,090 Europeans, and large-scale expression quantitative trait loci (eQTL) data from the genotype-tissue expression of 53 types of tissues (GTEx v7), we derived a linkage disequilibrium score for the specific expression of genes (LDSC-SEG) and conducted a transcriptome-wide association study (TWAS). We identified 55 gene associations whose predicted levels of expression were significantly associated with IgG N-glycosylation in 14 tissues. Three working scenarios, i.e., tissue-specific, pleiotropic, and coassociated, were observed for candidate genetic predisposition affecting IgG N-glycosylation traits. Furthermore, pathway enrichment showed several IgG N-glycosylation-related pathways, such as asparagine N-linked glycosylation, N-glycan biosynthesis and transport to the Golgi and subsequent modification. Through phenome-wide association studies (PheWAS), most genetic variants underlying TWAS hits were found to be correlated with health measures (height, waist-hip ratio, systolic blood pressure) and diseases, such as systemic lupus erythematosus, inflammatory bowel disease, and Parkinsonโ€™s disease, which are related to IgG N-glycosylation. Our study provides an atlas of genetic regulatory loci and their target genes within functionally relevant tissues, for further studies on the mechanisms of IgG N-glycosylation and its related diseases

    Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!

    Get PDF
    Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neu- roimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimag- ing data

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • โ€ฆ
    corecore