2,049 research outputs found

    ๊ณ„์ธต์  ๊ตฌ์กฐ ๋ชจํ˜•์„ ์ด์šฉํ•œ common variants์˜ ํŒจ์Šค์›จ์ด ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต,2019. 8. ๋ฐ•ํƒœ์„ฑ.์ „์žฅ ์œ ์ „์ฒด ์ƒ๊ด€์„ฑ ๋ถ„์„ ์—ฐ๊ตฌ (Genome-Wide Association Study, GWAS)์—์„œ ์ด๋ฏธ ๋งŽ์€ ํ†ต๊ณ„ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ํ‘œํ˜„ํ˜•๊ณผ ๊ด€๋ จ๋œ ๋Œ€๋ฆฝ์œ ์ „์ž ๋นˆ๋„๊ฐ€ ๋น„๊ต์  ํฐ ๋ณ€์ด(common variant)๋ฅผ ๋ฐœ๊ตด ํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐœ๊ตด๋œ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ๋ณ€์ด๋“ค๋กœ ์ถ”์ •๋œ ์œ ์ „๋ ฅ์˜ ์ผ๋ถ€๋งŒ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์œ ์ „์  ๊ฒฐ์‹ค (missing heritability) ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์œ ์ „์ž(gene) ๊ธฐ๋ฐ˜ ๋ฐ ํŒจ์Šค์›จ์ด(pathway) ๊ธฐ๋ฐ˜ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์ด ์ง„ํ–‰๋˜๊ณ  ์žˆ๊ณ  GWAS ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒ๋ฌผํ•™์  ๊ธฐ์ž‘ ๋ฐ ๊ด€๋ จ๋œ ํŒจ์Šค์›จ์ด๋ฅผ ์ฐพ์•˜๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌ์šฉ๋œ ๋งŽ์€ ๋ฐฉ๋ฒ•๋“ค์€ ์œ ์ •์ž ๊ฐ„ ๋ฐ ํŒจ์Šค์›จ์ด ๊ฐ„์˜ ์ƒํ˜ธ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์œ ์ „์ž ๊ฐ„ ๋ฐ ํŒจ์Šค์›จ์ด ๊ฐ„์˜ ์ƒํ˜ธ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ณ„์ธต์  ๊ตฌ์กฐ ๋ชจํ˜• ๊ธฐ๋ฐ˜์œผ๋กœ GWAS ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒจ์Šค์›จ์ด ๊ธฐ๋ฐ˜ ๋ถ„์„ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœ ํ–ˆ์Œ. ์ด ๋ฐฉ๋ฒ•์˜ ์ด๋ฆ„์€ HisCoM-PCA(Hierarchical structural Component Model for Pathway analysis of Common vAriants)์ด๋‹ค. HisCoM-PCA๋Š” ์šฐ์„  ๋™์ผํ•œ ์œ ์ „์ž์— ์†ํ•˜๋Š” common variants๋ฅผ ํ•œ ํ†ต๊ณ„๋Ÿ‰์œผ๋กœ ์š”์•ฝํ•˜๊ณ , ๊ณ„์‚ฐ๋œ ํ†ต๊ณ„๋Ÿ‰์„ ์ด์šฉํ•˜์—ฌ ์œ ์ „์ž ๊ธฐ๋ฐ˜ ๋ถ„์„๊ณผ ํŒจ์Šค์›จ์ด ๊ธฐ๋ฐ˜ ๋ถ„์„์„ ๋ฆฟ์ง€ ํšŒ๊ท€๋ถ„์„ ๋ฐฉ๋ฒ•์„ ํ†ตํ•˜์—ฌ ๋™์‹œ์— ์ง„ํ–‰ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ˆœ์—ด๊ฒ€์ •๋ฒ•(permutation test)์„ ํ†ตํ•ด์„œ ์œ ์ „์ž์™€ ํŒจ์Šค์›จ์ด์˜ ์œ ์˜์„ฑ ๊ฒ€์ •์€ ์ง„ํ–‰ ํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ GAW17 ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด์ง„ํ˜• ํ‘œํ˜„ํ˜•๊ณผ ์—ฐ์†ํ˜• ํ‘œํ˜„ํ˜•์— ๋Œ€ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด HisCoM-PCA๋Š” ์ œ I ํ˜• ์˜ค๋ฅ˜๋ฅผ ์ž˜ ํ†ต์ œํ•˜๊ณ  ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ๋†’์€ ๊ฒ€์ •๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  HisCoM-PCA๋ฅผ ํ•œ๊ตญ์ธ ์œ ์ „์ฒด ๋ถ„์„์‚ฌ์—…(KARE) ์ž๋ฃŒ์— ์ ์šฉํ•˜์—ฌ 4๊ฐ€์ง€ ์ธ์ฒด ํ‘œํ˜„ํ˜•: (1) 2ํ˜• ๋‹น๋‡จ๋ณ‘; (2) ๊ณ ํ˜ˆ์••; (3) ์ˆ˜์ถ•๊ธฐ ํ˜ˆ์••, (4) ์ด์™„๊ธฐ ํ˜ˆ์••์— ๋Œ€ํ•˜์—ฌ ๋ถ„์„ ํ–ˆ์„ ๋•Œ, ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•˜์—ฌ HisCoM-PCA๋Š” ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜๋ฏธํ•˜๊ณ  ์ƒ๋ฌผ์ ์ธ ์˜๋ฏธ ์žˆ๋Š” ํŒจ์Šค์›จ์ด๋ฅผ ๋ฐœ๊ตดํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ ๋๋‹ค.Genome-wide association studies (GWAS) have been widely used in identifying phenotype-related genetic variants by many statistical methods, such as logistic regression and linear regression. However, the identified SNPs with stringent statistical significance just explain a small portion of the overall estimated genetic heritability. To address this missing heritability issue, gene-based and pathway-based analysis have been developed in many studies. The biological mechanisms and some related pathways have been reported using pathway-based methods in GWAS datasets. However, many of these methods often neglecting the correlation between genes and between pathways. Here, we construct a hierarchical component model with considering of the correlation existing both between genes and between pathways. Based on this model, we propose a novel pathway analysis method for GWAS datasets, named Hierarchical structural Component Model for Pathway analysis of Common vAriants (HisCoM-PCA). HisCoM-PCA first summaries the common variants in each gene into the gene-level statistics and then analyzes all pathways simultaneously by ridge-type penalization on both gene and pathway effects on the phenotype. The statistical significance of the gene and pathway coefficients can be examined by permutation tests. Through simulation study for both binary and continuous phenotypes using GAW17 simulation dataset, HisCoM-PCA controlled type I error well and showed a higher empirical power than several comparison methods. In addition, we applied our method to SNP chip dataset of KARE for four human physiologic traits: (1) type 2 diabetes; (2) hypertension; (3) systolic blood pressure; and (4) diastolic blood pressure. Those results showed that HisCoM-PCA could successfully identify signal pathways with superior statistical and biological significance. Our approach has an advantage of providing an intuitive biological interpretation for the association between common variants and phenotypes through the pathway information.Introduction ..................................................................... 1 Materials .......................................................................... 5 Methodology .................................................................. 10 Results ........................................................................... 14 Discussions .................................................................... 25 Bibliography .............................................................................. 29 Abstract (Korean) .................................................................... 32Maste

    Identification of cis-regulatory sequence variations in individual genome sequences

    Get PDF
    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing

    Statistical Methods for Genetic Prediction of Complex Traits in Single and Multiple Populations

    Get PDF
    Genetic prediction of complex traits, also known as polygenic risk score (PRS), is constructed by combining the estimated effect sizes of genetic markers across the genome for an individual. PRS has shown great promise in biomedical and clinical research for disease prevention, monitoring and treatment. However, the development of accurate prediction models is challenging due to the wide diversity of genetic architecture, limited access to individual level data, and the demand for computational resources. The broader application of PRS to the general population is further hindered by the poor transferability of PRS developed in Europeans to non-European populations. In this thesis, we develop two statistical methods to help address these limitations. Chapter 1 includes a review of PRS from a statistical perspective. In Chapter 2, we present a summary statistics-based nonparametric method SDPR that is adaptive to different genetic architectures, statistically robust, and computationally efficient. The material is drawn from the manuscript โ€œA fast and robust Bayesian nonparametric method for prediction of complex traits using summary statisticsโ€ with minor modification. In Chapter 3, we develop a statistical method called SDPRX that can effectively integrate genome wide association study summary statistics from different populations to improve the prediction accuracy in non-European populations. The material is drawn from the manuscript โ€œSDPRX: A statistical method for cross-population prediction of complex traitsโ€ in preparation

    Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA sequence diversity within the human genome may be more greatly affected by copy number variations (CNVs) than single nucleotide polymorphisms (SNPs). Although the importance of CNVs in genome wide association studies (GWAS) is becoming widely accepted, the optimal methods for identifying these variants are still under evaluation. We have previously reported a comprehensive view of CNVs in the HapMap DNA collection using high density 500 K EA (Early Access) SNP genotyping arrays which revealed greater than 1,000 CNVs ranging in size from 1 kb to over 3 Mb. Although the arrays used most commonly for GWAS predominantly interrogate SNPs, CNV identification and detection does not necessarily require the use of DNA probes centered on polymorphic nucleotides and may even be hindered by the dependence on a successful SNP genotyping assay.</p> <p>Results</p> <p>In this study, we have designed and evaluated a high density array predicated on the use of non-polymorphic oligonucleotide probes for CNV detection. This approach effectively uncouples copy number detection from SNP genotyping and thus has the potential to significantly improve probe coverage for genome-wide CNV identification. This array, in conjunction with PCR-based, complexity-reduced DNA target, queries over 1.3 M independent NspI restriction enzyme fragments in the 200 bp to 1100 bp size range, which is a several fold increase in marker density as compared to the 500 K EA array. In addition, a novel algorithm was developed and validated to extract CNV regions and boundaries.</p> <p>Conclusion</p> <p>Using a well-characterized pair of DNA samples, close to 200 CNVs were identified, of which nearly 50% appear novel yet were independently validated using quantitative PCR. The results indicate that non-polymorphic probes provide a robust approach for CNV identification, and the increasing precision of CNV boundary delineation should allow a more complete analysis of their genomic organization.</p

    Alcohol addiction: a molecular biology perspective.

    Get PDF
    Alcohol misuse represents worldwide an important risk factor for death and disability. Excessive alcohol consumption is widely diffused in different ethnicities and alcohol use is part of the lifestyle of both young and old people. The genetic basis of alcohol dependence concerning ethanol metabolism and the pathways of reward circuits are well known. The role of genetic variants in the neurobiology of addiction as well as in response to medication in alcoholism therapy still represents an intriguing argument that needs to be deeply analyzed and explained. The molecular approach to the study of these aspects could be difficult because of the large number of genes and variations involved. Our work is intended to offer an overview of genes and variants involved in alcohol addiction and pharmacogenetics. Our aim is to delineate a molecular approach strategy to look at alcohol dependence from a genetic and applicative point of view. The indications provided in this work should be of help for those who wish to undertake a molecular study of this multifactorial disease

    CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data.

    Get PDF
    Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases

    Reaping the Benefits of Next-generation Sequencing Technologies for Crop Improvement โ€” Solanaceae

    Get PDF
    Next-generation sequencing (NGS) technologies make possible the sequencing of the whole genome of a species decoding a complete gene catalogue and transcriptome to allow the study of expression pattern of entire genes. The huge data generated through whole genome and transcriptome sequencing not only provide a basis to study variation at gene sequence (such as single-nucleotide polymorphism and InDels) and expression level but also help to understand the evolutionary relationship between different crop species. Furthermore, NGS technologies have made possible the quick correlations of phenotypes with genotypes in different crop species, thereby increasing the precision of crop improvement. The Solanaceae family represents the third most economically important family after grasses and legumes due to high nutritional components. The current advances in NGS technology and their application in Solanaceae crops made several progresses in the identification of genes responsible for economically important traits, development of molecular markers, and understanding the genome organization and evolution in Solanaceae crops. The combination of high-throughput NGS technologies with conventional crop breeding has been shown to be promising in the Solanaceae translational genomics research. As a result, NGS technologies has been seen to be adopted in a large scale to study the molecular basis of fruit and tuber development, disease resistance, and increasing quantity and quality of crop production
    • โ€ฆ
    corecore