5,079 research outputs found
Replication in Genome-Wide Association Studies
Replication helps ensure that a genotype-phenotype association observed in a
genome-wide association (GWA) study represents a credible association and is
not a chance finding or an artifact due to uncontrolled biases. We discuss
prerequisites for exact replication, issues of heterogeneity, advantages and
disadvantages of different methods of data synthesis across multiple studies,
frequentist vs. Bayesian inferences for replication, and challenges that arise
from multi-team collaborations. While consistent replication can greatly
improve the credibility of a genotype-phenotype association, it may not
eliminate spurious associations due to biases shared by many studies.
Conversely, lack of replication in well-powered follow-up studies usually
invalidates the initially proposed association, although occasionally it may
point to differences in linkage disequilibrium or effect modifiers across
studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
๊ฐ์์ ์๊ณผ ๊ฒฐ์ ์ ๋ํ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ๋ฐ ๋ฐํ ์์ ํ์ง ์ ์ ์์ข ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์๊ณผ๋ํ ์ํ๊ณผ,2020. 2. ๋ฐ์์ฃผ.Thyroid cancer is the most common endocrine cancer and thyroid nodule is most common endocrine problem in Korea. Both phenotypes show a high degree of heritability. Several genome-wide association studies (GWAS) for thyroid cancer were conducted in European descendants and identified susceptibility loci of differentiated thyroid cancer (DTC). However, there is no GWAS for thyroid cancer in Asian population, and inherited genetic risk factors for thyroid nodules and their associations with thyroid cancer remain unknown.
Here, GWAS and replication study was performed using a total of 1,085 DTC cases and 8,884 controls of Koreans and these results were validated with an expression quantitative trait loci (eQTL) analysis and clinical phenotypes. The most robust associations were observed in the NRG1 gene (rs6996585, P=1.08ร10-10), and this SNP was also associated with NRG1 expression in thyroid tissues. In addition, three previously reported loci (FOXE1, NKX2-1, and DIRC3) were confirmed and seven susceptibility loci (VAV3, PCNXL2,INSR, MRSB3, FHIT, SEPT11, and SLC24A6) associated with DTC were newly identified. Furthermore, I identified specific variants of DTC that have different effects according to the cancer type or ethnicity.
Furthermore, a three-stage GWAS for thyroid nodules was performed. The discovery stage involved a genome-wide scan of 811 subjects with thyroid nodules and 691 subjects with a normal thyroid from a population-based cohort. Replication studies were conducted in an additional 1981 cases and 3100 controls from the participants of a health check-up. Expression quantitative trait loci (eQTL) analysis was also performed using public data. The most robust association was observed in TRPM3 (rs4745021) in the joint analysis (OR=1.26,
P = 6.12 ร 10-8) and meta-analysis (OR = 1.28, P = 2.11ร10-8). Signals at MBIP/NKX2-1 were replicated but did not reach genome-wide significance in the joint analysis (rs2415317; P = 4.62 ร 10-5, rs944289; P = 8.68 ร 10-5). The eQTL analysis showed that TRPM3 expression was associated with the rs4745021 genotype in thyroid tissues. The results of GWAS for DTC provide deeper insight into the genetic contribution to thyroid cancer in different populations. And GWAS for thyroid nodule suggest that thyroid nodules have a genetic predisposition distinct from that of thyroid cancer.๊ฐ์์ ์์ ํ๊ตญ์์ ๊ฐ์ฅ ํํ ๋ด๋ถ๋น์์ด๋ฉฐ ๊ฐ์์ ๊ฒฐ์ ์ ๊ฐ์ฅ ํํ ๋ด๋ถ๋น ์งํ์ด๋ค. ๋๊ฐ์ง ์งํ ๋ชจ๋ ๋์ ์ ์ ์ฑ์ ๋ณด์ธ๋ค. ๋ช๋ช์ ๊ฐ์์ ์์ ๋ํ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ์ฐ๊ตฌ๊ฐ ์์์ธ๋ค์๊ฒ์ ์ด๋ฃจ์ด์ก๊ณ , ๋ถํ๊ฐ์์ ์์ ๋ํ ๊ฐ์์ฑ ์ ์ ์์ข๋ฅผ ๋ฐ๊ตดํ์๋ค. ๊ทธ๋ฌ๋ ์์์์ธ์ ๋ํ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ์ฐ๊ตฌ๋ ์ํ๋ ๋ฐ ์์ผ๋ฉฐ, ๊ฐ์์ ๊ฒฐ์ ์ ๋ํ ์ ์ ์ ์ฐ๊ตฌ๋ ์์์ผ๋ฉฐ ์ด์ ๊ด๋ จ๋ ์ ์ ์์ ๊ฐ์์ ์๊ณผ์ ๊ด๋ จ์ฑ๋ ์ฌ์ ํ ์ ์ ์๋ ์ํ์ด๋ค.
๋ฐ๋ผ์, ๋ณธ ์ฐ๊ตฌ์์๋ 1,085 ๋ช
์ ๋ถํ ๊ฐ์์ ์๊ณผ 8,884 ๋ช
์ ๋์กฐ๊ตฐ์ผ๋ก ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ๋ถ์ ๋ฐ ์ฌํ ์ฐ๊ตฌ๋ฅผ ์ํํ์๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐํ ์์ ํ์ง ์ ์ ์์ข ์ฐ๊ตฌ ๋ฐ ์์ ๋ฐํํ์ง์ ํตํด์ ๊ฒ์ฆํ์๋ค. ๊ฐ์ฅ ๋๋ ทํ ๊ด๋ จ์ฑ์ ๋ณด์ด๋ ์ ์ ์์ข๋ NRG1 ์ ์ ์์์ผ๋ฉฐ (rs6996585, P=1.08ร10-10), ์ด SNP ์ NRG1 ์ ๋ฐํ๊ณผ๋ ๊ด๋ จ์ฑ์ด ์์๋ค. ๋ถ๊ฐ์ ์ผ๋ก ์ด์ ์ ๋ณด๊ณ ๋์๋ ์ ์ ์์ข (FOXE1, NKX2-1, DIRC3)๋ฅผ ํ์ธํ์์ผ๋ฉฐ 7 ๊ฐ์ ์ ์ ์์ข (VAV3, PCNXL2, INSR, MRSB3, FHIT, SEPT11, SLC24A6)๋ฅผ ์๋กญ๊ฒ ๋ฐ๊ฒฌํ์๋ค. ๋ํ,
๋ถํ๊ฐ์์ ์๊ณผ ๊ด๋ จ๋ ์ ์ ๋ณ์ด๊ฐ ์์ ์ข
๋ฅ ๋ฐ ์ธ์ข
์ ๋ฐ๋ผ์ ๋ค๋ฅธ ์ํฅ์ ๊ฐ์ง๋ ๊ฒ์ ํ์ธํ์๋ค.
๋ํ ๊ฐ์์ ๊ฒฐ์ ์ ๋ํ 3 ๋จ๊ณ์ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ๋ถ์์ ์ํํ์๋ค. ๋ฐ๊ฒฌ ๋จ๊ณ์ ์ ์ฅ ์ ์ ์ฒด ์ค์บ์ ์ธ๊ตฌ ๊ธฐ๋ฐ ์ฝํธํธ์ 811 ๋ช
์ ๊ฐ์์ ๊ฒฐ์ ๊ตฐ๊ณผ 691 ๋ช
์ ์ ์ ๊ฐ์์ ๊ตฐ์์ ์ํ๋์๋ค. ์ฌํ ์ฐ๊ตฌ๋ ๊ฑด๊ฐ๊ฒ์ง ๋์์์์ ์ด 1981 ๋ช
์ ๊ฒฐ์ ๊ตฐ๊ณผ 3100 ๋ช
์ ์ ์๊ตฐ์์ ์ํ๋์์ผ๋ฉฐ ๋ฐํ ์์ ํ์ง ์ ์ ์์ข ๋ถ์๋ ๊ณต๊ณต๋ฐ์ดํฐ๋ฅผ ํตํด์ ์ํ๋์๋ค. ๊ฐ์ฅ ์ ์ํ ๊ด๋ จ์ฑ์ ๊ฒฐํฉ๋ถ์ (OR=1.26, P = 6.12 ร 10-8) ๋ฐ ๋ฉํ๋ถ์ (OR = 1.28, P = 2.11ร10-8) ๊ฒฐ๊ณผ TRPM3 (rs4745021) ์ ์ ์์์ ๊ด์ฐฐ๋์๋ค. MBIP/NKX2-1 ๋ณ์ด๋ ์ฌํ์ด ๋์์ผ๋ ์ ์ฅ ์ ์ ์ฒด ์ ์์ฑ์ ๋ณด์ด์ง ๋ชปํ๋ค. ๋ฐํ ์์ ํ์ง ์ ์ ์์ข ๋ถ์์์ TRPM3 ์ ๋ฐํ์ ๊ฐ์์ ์กฐ์ง์์ rs4745021 ์ ์ ์ํ๊ณผ ๊ด๋ จ์ฑ์ด ์์๋ค.
๋ถํ๊ฐ์์ ์์ ๋ํ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ด ๋ถ์ ๊ฒฐ๊ณผ๋ ๊ฐ์์ ์์ ๋ฐ์์์ ์ ์ ์ ๊ธฐ์ฌ์ ๋ํ ์ดํดํ ์ ์๊ฒ ํด์ฃผ์์ผ๋ฉฐ, ๊ฐ์์
๊ฒฐ์ ์ ๋ํ ์ ์ฅ ์ ์ ์ฒด ์ฐ๊ตฌ๋ฅผ ํตํด ๊ฐ์์ ๊ฒฐ์ ์ ๊ฐ์์ ์๊ณผ ์ฐจ๋ณ๋๋ ์ ์ ์ ํน์ง์ ๊ฐ์ง๊ณ ์์์ ํ์ธํ์๋ค.Introduction 1
1. Epidemiology of thyroid cancer 1
2. Risk factors of differentiated thyroid cancer 1
3. Heritability of differentiated thyroid cancer 3
4. Familial syndromes associated with thyroid cancer and germline mutation of differentiated thyroid cancer 4
5. Epidemiology of thyroid nodule 4
6. Clinical significance and heritability of thyroid nodule 4
7. Genome-wide association study for differentiated thyroid cancer 5
8. Genetic studies for thyroid nodule 6
9. Hypothesis 9
10. Aims of study 9
Chapter I. Genome-wide association and expression quantitative trait loci studies for thyroid cancer 10
Materials and methods 11
Study participants for the Stage 1 genome scan 11
Study participants for the Stage 2 follow-up 11
Discovery SNP genotyping and imputation 15
Replication SNP selection and genotyping 16
RNA sequencing and eQTL analysis 18
Statistical analysis 18
Ethics statement 20
Results 21
Stage 1 genome scan 21
Stage 2 follow-up and joint Stages 1 and 2 analyses 24
Validation of the candidate SNPs with cis-eQTL and GSEA analyses 29
Association between candidate SNPs and clinical phenotypes 35
The most significantly associated variant in the NRG1 locus 38
Other known associated variants in the NKX2-1, DIRC3, or FOXE1 loci 44
Novel candidate variants in the VAV3, PCNXL2, INSR, MRSB3, FHIT or SEPT11 loci 48
A comparison with the European GWAS results 51
Chapter II. Genome-wide association and expression quantitative trait loci studies for thyroid nodule 55
Materials and methods 56
Discovery series and thyroid ultrasonography 56
First replication series and ultrasonography 59
Second replication 59
Discovery GWAS and Imputation 60
Candidate SNP and genotyping of first replication 61
Genotyping of second replication 64
Comparison of allele frequencies between DTC, thyroid
nodules, and normal thyroid 64
Expression quantitative trait loci analysis 64
Statistical analysis 65
Ethics statement 66
Results 67
Discovery GWAS 67
Replication studies, joint analysis and meta-analysis 71
Comparison of allele frequencies between DTC, thyroid nodules, and normal thyroid 76
Expression quantitative trait loci analysis 80
Discussion 82
GWAS for DTC 82
GWAS for Thyroid nodule 92
Summary and conclusions 100
References 101
Abstract in Korean 116Docto
Recommended from our members
Protein-coding variants implicate novel genes related to lipid homeostasis contributing to body-fat distribution.
Body-fat distribution is a risk factor for adverse cardiovascular health consequences. We analyzed the association of body-fat distribution, assessed by waist-to-hip ratio adjusted for body mass index, with 228,985 predicted coding and splice site variants available on exome arrays in up to 344,369 individuals from five major ancestries (discovery) and 132,177 European-ancestry individuals (validation). We identified 15 common (minor allele frequency, MAF โฅ5%) and nine low-frequency or rare (MAF <5%) coding novel variants. Pathway/gene set enrichment analyses identified lipid particle, adiponectin, abnormal white adipose tissue physiology and bone development and morphology as important contributors to fat distribution, while cross-trait associations highlight cardiometabolic traits. In functional follow-up analyses, specifically in Drosophila RNAi-knockdowns, we observed a significant increase in the total body triglyceride levels for two genes (DNAH10 and PLXND1). We implicate novel genes in fat distribution, stressing the importance of interrogating low-frequency and protein-coding variants
The Role of Environmental Heterogeneity in MetaโAnalysis of GeneโEnvironment Interactions With Quantitative Traits
With challenges in data harmonization and environmental heterogeneity across various data sources, metaโanalysis of geneโenvironment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixedโeffect metaโanalysis: the standard inverseโvariance weighted metaโanalysis and a metaโregression approach. Akin to the results in Simmonds and Higgins ( ), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between metaโanalysis and metaโregression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverseโvariance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through metaโanalysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on highโdensity lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/107543/1/gepi21810.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/107543/2/gepi21810-sup-0001-appendix.pd
Recommended from our members
Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits
Many methods have been developed to leverage expression quantitative trait loci (eQTL) data to nominate candidate genes from genome-wide association studies. These methods, including colocalization, transcriptome-wide association studies (TWAS) and Mendelian randomization-based methods; however, all suffer from a key problemโwhen assessing the role of a gene in a trait using its eQTLs, nearby variants and genetic components of other genesโ expression may be correlated with these eQTLs and have direct effects on the trait, acting as potential confounders. Our extensive simulations showed that existing methods fail to account for these โgenetic confoundersโ, resulting in severe inflation of false positives. Our new method, causal-TWAS (cTWAS), borrows ideas from statistical fine-mapping and allows us to adjust all genetic confounders. cTWAS showed calibrated false discovery rates in simulations, and its application on several common traits discovered new candidate genes. In conclusion, cTWAS provides a robust statistical framework for gene discovery
Pleiotropy Analysis of Quantitative Traits at Gene Level by Multivariate Functional Linear Models
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate Fโdistribution tests based on PillaiโBartlett trace, HotellingโLawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate Fโdistribution tests provide much more significant results than those of Fโtests of univariate analysis and optimal sequence kernel association test (SKATโO). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate Fโdistribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate Fโdistribution tests provide much more significant results than those of Fโtests of univariate analysis and SKATโO for the three biochemical traits. The approximate Fโdistribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKATโO in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKATโO in the univariate case.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111259/1/gepi21895.pd
Heritability enrichment of immunoglobulin G N-glycosylation in specific tissues
Genome-wide association studies (GWAS) have identified over 60 genetic loci associated with immunoglobulin G (IgG) N-glycosylation; however, the causal genes and their abundance in relevant tissues are uncertain. Leveraging data from GWAS summary statistics for 8,090 Europeans, and large-scale expression quantitative trait loci (eQTL) data from the genotype-tissue expression of 53 types of tissues (GTEx v7), we derived a linkage disequilibrium score for the specific expression of genes (LDSC-SEG) and conducted a transcriptome-wide association study (TWAS). We identified 55 gene associations whose predicted levels of expression were significantly associated with IgG N-glycosylation in 14 tissues. Three working scenarios, i.e., tissue-specific, pleiotropic, and coassociated, were observed for candidate genetic predisposition affecting IgG N-glycosylation traits. Furthermore, pathway enrichment showed several IgG N-glycosylation-related pathways, such as asparagine N-linked glycosylation, N-glycan biosynthesis and transport to the Golgi and subsequent modification. Through phenome-wide association studies (PheWAS), most genetic variants underlying TWAS hits were found to be correlated with health measures (height, waist-hip ratio, systolic blood pressure) and diseases, such as systemic lupus erythematosus, inflammatory bowel disease, and Parkinsonโs disease, which are related to IgG N-glycosylation. Our study provides an atlas of genetic regulatory loci and their target genes within functionally relevant tissues, for further studies on the mechanisms of IgG N-glycosylation and its related diseases
Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neu- roimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimag- ing data
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- โฆ