6 research outputs found

    ์ข…์† ํ‘œ๋ณธ์— ๋Œ€ํ•œ ์ด๋ถ„ํ˜• ํ‘œํ˜„ํ˜•์˜ ์œ ์ „์ฒด ์—ฐ๊ด€์„ฑ ๋ถ„์„ ๋ฐฉ๋ฒ•์˜ ๊ฐœ๋ฐœ ๋ฐ ์œ ์ „์ž ๋ฐ์ดํ„ฐ์—์˜ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2019. 2. ์›์„ฑํ˜ธ.Recent improvements in sequencing technology have enabled the investigation of so-called missing heritability, and a large number of affected subjects have been sequenced in order to detect significant associations between human diseases and genetic variants. However, the cost of genome sequencing is still high, and a statistically powerful strategy for selecting informative subjects would be useful. Numerous methods for estimating heritability of dichotomous phenotypes have been proposed. However, unlike quantitative phenotypes, heritability estimation for dichotomous phenotypes is computationally and statistically complex, and the use of heritability is infrequent. In particular, heritability estimates often suffer from substantial bias due to sampling scheme of family-based study. In family-based study, family members are often brought into a study via affected proband and therefore a proportion of affected relatives is larger than population prevalence. This bias refers to the ascertainment bias but there have been no much studies in adjusting method of ascertainment bias for heritability of dichotomous trait. In this study, I propose a new statistical method for selecting cases and controls for sequencing studies based on disease family history in terms of improvement in statistical power of genetic association studies. I assume that disease status is determined by unobserved liability score. The liability threshold model assumes dichotomous phenotypes are determined by unobserved latent variables that are normally distributed, and our method consists of two steps: first, the conditional means of liability are estimated given the individuals disease status and those of their relatives with the liability threshold model, and second, the informative subjects are selected with the estimated conditional means. Our simulation studies showed that statistical power is substantially affected by the subject selection strategy chosen, and power is maximized when affected (unaffected) subjects with high (low) risks are selected as cases (controls). The proposed method was successfully applied to genome-wide association studies for type-2 diabetes, and our analysis results reveal the practical value of the proposed methods. In addition, I developed a statistical method to estimate heritability of dichotomous phenotypes using a liability threshold model in the context of ascertained family-based samples. This model can be applied to general pedigree data. The proposed methods were applied to simulated data and Korean type-2 diabetes family-based samples, and the accuracy of estimates provided by the experimental methods was compared with that of established methods.์ตœ๊ทผ ์œ ์ „์ž ์‹œํ€€์‹ฑ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์€ ์งˆ๋ณ‘์„ ๊ฐ€์ง„ ์ธ๊ฐ„์˜ ์œ ์ „์ •๋ณด๋ฅผ ๋Œ€๋Ÿ‰์œผ๋กœ ์–ป์–ด๋‚ด๋Š” ๊ฒƒ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์˜€์œผ๋ฉฐ ์ด๋ฅผ ํ†ตํ•˜์—ฌ ์ธ๊ฐ„์˜ ์งˆ๋ณ‘๊ณผ ์œ ์ „์  ๋ณ€์ด ์‚ฌ์ด์˜ ์—ฐ๊ด€์„ฑ์„ ๋ฐํ˜€๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹œํ€€์‹ฑ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ ๋น„์šฉ์ด ํ˜„์ €ํžˆ ๋‚ฎ์•„์กŒ๋‹ค๊ณ  ํ• ์ง€๋ผ๋„ ์œ ์ „์ •๋ณด๋ฅผ ์–ป๋Š”๋ฐ ํ•„์š”ํ•œ ๋น„์šฉ์€ ๊ฒฐ์ฝ” ์ €๋ ดํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ œํ•œ๋œ ๋น„์šฉ์—์„œ ์ตœ๋Œ€์˜ ํšจ์œจ์„ ๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ถ„์„ ๋Œ€์ƒ์„ ์„ ๋ณ„ํ•˜๋Š” ๊ณผ์ •์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ํ•œํŽธ, ์ด๋ถ„ํ˜• ํ‘œํ˜„ํ˜•์˜ ์œ ์ „์œจ์„ ์ถ”์ •ํ•˜๋Š” ์ˆ˜๋งŽ์€ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ์ง€๋งŒ ์—ฐ์†ํ˜• ํ‘œํ˜„ํ˜•์˜ ์œ ์ „์œจ ์ถ”์ •๊ณผ๋Š” ๋‹ฌ๋ฆฌ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋˜ ํ†ต๊ณ„์ ์œผ๋กœ ๋งค์šฐ ๋ณต์žกํ•˜์—ฌ ์ œํ•œ์ ์œผ๋กœ ์ด์šฉ๋˜๊ณค ํ•˜์˜€๋‹ค. ์ด์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, ์ „์žฅ์œ ์ „์ฒด์—ฐ๊ด€์„ฑ๋ถ„์„์˜ ํ†ต๊ณ„์  ๊ฒ€์ •๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ์œ ์ „์ž ์‹œํ€€์‹ฑ์„ ํ•จ์— ์žˆ์–ด ๊ฐ€์กฑ๋ ฅ์„ ๋ฐ”ํƒ•์œผ๋กœ ์‚ฌ๋ก€๊ตฐ๊ณผ ๋Œ€์กฐ๊ตฐ์„ ์„ ๋ณ„ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ†ต๊ณ„์  ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์งˆ๋ณ‘ ๋ชจํ˜•์€ ๊ด€์ธก๋˜์ง€ ์•Š์€ ์—ฐ์†ํ˜• ๋ณ€์ˆ˜์— ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋Š”๋ฐ, ์ด ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๊ฐ€ ์งˆ๋ณ‘ ๊ณ ์œ ์˜ ํ•œ๊ณ„์ ๋ณด๋‹ค ํฐ ์‚ฌ๋žŒ์€ ์งˆ๋ณ‘์„ ์–ป๊ฒŒ ๋œ๋‹ค. ์ด ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋Š” ์ฑ…์ž„์ ์ˆ˜(Liability) ๋ผ๊ณ  ์ผ์ปซ๊ณ  ์ด ์งˆ๋ณ‘ ๋ชจํ˜•์„ ์ฑ…์ž„ํ•œ๊ณ„๋ชจํ˜•(Liability threshold model)์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ์ด ์งˆ๋ณ‘ ๋ชจํ˜•์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ณธ ์—ฐ๊ตฌ์˜ ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ์˜ ๋‘ ๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ฒซ์งธ๋กœ, ๊ฐ ๊ฐ€์กฑ ๋ณ„๋กœ ๊ฐ€์กฑ๋“ค์˜ ์งˆ๋ณ‘๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ์˜ ์ฑ…์ž„์ ์ˆ˜์˜ ์กฐ๊ฑด๋ถ€ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ทธ ๋‹ค์Œ์œผ๋กœ ์ด๋ ‡๊ฒŒ ๊ตฌํ•ด์ง„ ์กฐ๊ฑด๋ถ€ํ‰๊ท ์„ ๋ฐ”ํƒ•์œผ๋กœ ์‚ฌ๋ก€๊ตฐ๊ณผ ๋Œ€์กฐ๊ตฐ์„ ์„ ๋ณ„ํ•œ๋‹ค. ๋ชจ์˜์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ์ „์žฅ์œ ์ „์ฒด์—ฐ๊ด€์„ฑ๋ถ„์„์˜ ํ†ต๊ณ„์  ๊ฒ€์ •๋ ฅ์€ ์–ด๋–ป๊ฒŒ ์‚ฌ๋ก€๊ตฐ๊ณผ ๋Œ€์กฐ๊ตฐ์„ ์„ ๋ณ„ํ•˜๋Š”์ง€์— ๋”ฐ๋ผ์„œ ์ค‘๋Œ€ํ•œ ์˜ํ–ฅ์„ ๋ฐ›๊ณ , ์กฐ๊ฑด๋ถ€ํ‰๊ท ์ด ํฐ ์งˆ๋ณ‘๊ตฐ์„ ์‚ฌ๋ก€๊ตฐ์œผ๋กœ, ์ž‘์€ ์ •์ƒ๊ตฐ์„ ๋Œ€์กฐ๊ตฐ์œผ๋กœ ์„ ๋ณ„ํ•˜์˜€์„ ๋•Œ ๊ฐ€์žฅ ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์ œ 2ํ˜• ๋‹น๋‡จ์˜ ์œ ์ „์ฒด ์—ฐ๊ด€์„ฑ ๋ถ„์„์— ์ ์šฉ๋˜์—ˆ๊ณ , ๋ฌด์ž‘์œ„๋กœ ๋ถ„์„๋Œ€์ƒ์„ ์ถ”์ถœํ•˜์˜€์„ ๋•Œ์™€ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ, ํ›จ์”ฌ ๋” ํ–ฅ์ƒ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•๊ณผ ๋”๋ถˆ์–ด, ๋‚˜๋Š” ์ด๋ถ„ํ˜• ํ‘œํ˜„ํ˜•์˜ ์œ ์ „์œจ ์ถ”์ •๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ฐ€์กฑ๋ ฅ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ถ”์ •์ด ๋˜๊ณ , ๊ฐ€๊ณ„๋„์˜ ๊ตฌ์กฐ์— ๊ตฌ์•  ๋ฐ›์ง€ ์•Š๋Š”๋‹ค. ํŠนํžˆ ์ด ๋ฐฉ๋ฒ•์€ ๋ฌด์ž‘์œ„๋กœ ์„ ๋ณ„๋œ ๊ฐ€์กฑ์— ๋Œ€ํ•œ ์ถ”์ • ๋ฟ ์•„๋‹ˆ๋ผ, proband์˜ ์งˆ๋ณ‘๋ ฅ์œผ๋กœ ์ธํ•˜์—ฌ ๊ฐ€์กฑ์ด ๋ถ„์„์— ์ฐธ์—ฌํ•˜๊ฒŒ ๋œ ๊ฒฝ์šฐ์— ๋Œ€ํ•œ ์ถ”์ •๋„ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋ชจ์˜์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ์ด ๋ฐฉ๋ฒ•์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜์˜€์œผ๋ฉฐ, ๊ธฐ ๊ฐœ๋ฐœ๋œ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ์™€ ๋น„๊ต๋ฅผ ํ†ตํ•˜์—ฌ ์ถ”์ •์น˜์˜ ์ •ํ™•์„ฑ์˜ ํ–ฅ์ƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ ์ œ 2ํ˜• ๋‹น๋‡จ์˜ ํ•œ๊ตญ์ธ ๊ฐ€๊ณ„๋„ ๋ฐ์ดํ„ฐ์— ๋ณธ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ์œ ์ „์œจ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค.Abstract i Table of Contents iv Chapter 1 Introduction ๏ผ‘ 1.1 An Overview of Genetic Association Analyses of Dichotomous Phenotypes ๏ผ‘ 1.2 Heritability Estimation of Dichotomous Phenotypes 5 1.3 The Purpose of This Study 7 1.4 Outline of the thesis 9 Chapter 2 Application of Genome-wide Association Study and Fine-mapping for Independent Samples 10 2.1 Introduction 10 2.2 Materials and Methods 13 2.2.1 Discovery cohort 13 2.2.2 Quality control analyses of SNP genotype data 14 2.2.3 Replication data 17 2.2.4 Statistical analyses with genetic data 18 2.2.5 Genotype imputation and statistical analyses with imputed genotypes 20 2.2.6 Topologically associated domains (TADs) and chromatin interactions 21 2.2.7 Statistical analyses with RNA sequencing data 22 2.2.8 Immunohistochemistry analyses 23 2.3 Results 24 2.3.1 GWAS analysis of S-LAM identifies two intergenic SNPs on chromosome 15 24 2.3.2 Association of GWAS-significant SNPs with NR2F2 39 2.3.3 Analysis of NR2F2 in kidney angiomyolipoma and LAM 46 2.4 Discussion 52 Chapter 3 Selecting Cases and Controls for Genome-wide Association Studies Using Family Histories of Disease 56 3.1 Introduction 56 3.2 Methods 61 3.2.1 Notations and the disease model 61 3.2.2 Selection of samples with extreme phenotypes 65 3.2.3 Statistical power when the family history of disease is controlled 67 3.3 Simulation study 70 3.3.1 The simulation model 70 3.3.2 Evaluation of selection strategy with simulated data 74 3.3.3 Robustness of CE to choices of prevalence and heritability 85 3.4 Application to genome-wide association of type-2 diabetes 90 3.4.1 The KARE cohort 90 3.4.2 The SNUH data 92 3.4.3 Association analyses using the pooled data 93 3.4.4 Results 94 3.5 Discussion 101 3.6 Appendix 106 3.6.1 Calculation of the conditional expectation (CE) 106 3.6.2 Derivation of Fijx 109 Chapter 4 Heritability Estimation of Dichotomous Phenotypes Using a Liability Threshold Model on Ascertained Family-based Samples 111 4.1 Introduction 111 4.2 Materials and Methods 115 4.2 1 Notations and Disease Model 115 4.2.2 Heritability Estimation using the EM Algorithm 118 4.2.3 Lagrangian Multiplier and Karush-Kuhn-Tucker Condition 121 4.2.4 Ascertainment Bias-corrected Heritability Estimation 125 4.2.5 Conditional Expected Score Tests 128 4.2.6 Simulation studies 130 4.2.7 Application for Family-based Samples of Type-2 Diabetes 132 4.2.8 Application for GWAS of S-LAM 133 4.3 Results 135 4.3.1 Evaluations of simulated samples 135 4.3.2 Applications of LTMH and CEST to Type-2 Diabetes 144 4.3.2 Applications of CEST to S-LAM disease 148 4.4 Discussion 151 4.5 Appendix 154 4.5.1 Numerical analysis for optimization of the heritability in M-step of EM algorithm 154 4.5.2 Numerical analysis for maximizing the global lower bound 156 Chapter 5 Summary and Conclusions 158 Bibliography 162 ๊ตญ ๋ฌธ ์ดˆ ๋ก 184Docto
    corecore