47 research outputs found
Improved Analysis of Large Genetic Association Studies Using Summary Statistics
Genome-wide association studies, which examine millions of genetic variants in thousands of individuals, have identified many complex trait associated loci. As sample sizes increase, particularly through meta-analysis, the number of disease associated loci has increased rapidly. The objective of this dissertation is to demonstrate the advantages of combining data across studies using summary statistics and to demonstrate methods that use publicly available information, such as functional annotation of the genome, to gain further insight into the genetics of human disease.
In the first project, we analyze data from 188,578 individuals using genome-wide and custom genotyping arrays to identify new loci and refine known loci for lipid traits low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, and total cholesterol. We identify and annotate 157 loci associated with lipid levels at P < 5x10-8, including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian, and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipids are often associated with cardiovascular and metabolic traits including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio, and body mass index. Our results illustrate the value of genetic data from individuals of diverse ancestries and provide insights into biological mechanisms regulating blood lipids to guide future genetic, biological, and therapeutic research.
In the second project, we propose that causal variants for a trait may share certain genomic features. Importantly, we show that when these genomic features can be identified, we can use them to help pinpoint likely causal variants among many trait associated variants. We develop a model that identifies genomic features enriched among the associated loci and uses this information to prioritize likely functional variants in each locus leading to narrower sets of variants for follow-up. Our models work for both quantitative and case-control data and can be used with summary statistics, making it convenient to incorporate in ongoing meta-analysis of genome-wide association studies that can include 100,000s of individuals.
In the third project, we consider meta-analysis where studies may have overlapping sets of participants. In such scenarios, meta-analysis methods that do not account for overlap will perform poorly and have inflated Type I error. We propose a method to identify participant overlap between GWAS using only summary statistics, estimate the degree of overlap, and correctly meta-analyze studies taking into account the overlap. Our method builds upon and extends previous methods that allow meta-analysis of GWAS studies with known overlap proportions. We illustrate our method using simulations and artificially created overlapping samples using real GWAS data.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143992/1/sebanti_1.pd
Relative impact of indels versus SNPs on complex disease
It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single‐nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well‐phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping‐associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/1/gepi22175_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/2/gepi22175-sup-0001-Gagliano-Supplementary.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147866/3/gepi22175.pd
A transcriptome-wide association study based on 27 tissues identifies 106 genes potentially relevant for disease pathology in age-related macular degeneration
Genome-wide association studies (GWAS) for late stage age-related macular degeneration (AMD) have identified 52 independent genetic variants with genome-wide significance at 34 genomic loci. Typically, such an approach rarely results in the identification of functional variants implicating a defined gene in the disease process. We now performed a transcriptome-wide association study (TWAS) allowing the prediction of effects of AMD-associated genetic variants on gene expression. The TWAS was based on the genotypes of 16,144 late-stage AMD cases and 17,832 healthy controls, and gene expression was imputed for 27 different human tissues which were obtained from 134 to 421 individuals. A linear regression model including each individuals imputed gene expression data and the respective AMD status identified 106 genes significantly associated to AMD variants in at least one tissue (Q-value \u3c 0.001). Gene enrichment analysis highlighted rather systemic than tissue- or cell-specific processes. Remarkably, 31 of the 106 genes overlapped with significant GWAS signals of other complex traits and diseases, such as neurological or autoimmune conditions. Taken together, our study highlights the fact that expression of genes associated with AMD is not restricted to retinal tissue as could be expected for an eye disease of the posterior pole, but instead is rather ubiquitous suggesting processes underlying AMD pathology to be of systemic nature
A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants
Advanced age-related macular degeneration (AMD) is the leading cause of blindness in the elderly with limited therapeutic options. Here, we report on a study of \u3e12 million variants including 163,714 directly genotyped, most rare, protein-altering variant. Analyzing 16,144 patients and 17,832 controls, we identify 52 independently associated common and rare variants (P \u3c 5×10–8) distributed across 34 loci. While wet and dry AMD subtypes exhibit predominantly shared genetics, we identify the first signal specific to wet AMD, near MMP9 (difference-P = 4.1×10–10). Very rare coding variants (frequency \u3c 0.1%) in CFH, CFI, and TIMP3 suggest causal roles for these genes, as does a splice variant in SLC16A8. Our results support the hypothesis that rare coding variants can pinpoint causal genes within known genetic loci and illustrate that applying the approach systematically to detect new loci requires extremely large sample sizes
A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants.
This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/ng.3448Advanced age-related macular degeneration (AMD) is the leading cause of blindness in the elderly, with limited therapeutic options. Here we report on a study of >12 million variants, including 163,714 directly genotyped, mostly rare, protein-altering variants. Analyzing 16,144 patients and 17,832 controls, we identify 52 independently associated common and rare variants (P < 5 × 10(-8)) distributed across 34 loci. Although wet and dry AMD subtypes exhibit predominantly shared genetics, we identify the first genetic association signal specific to wet AMD, near MMP9 (difference P value = 4.1 × 10(-10)). Very rare coding variants (frequency <0.1%) in CFH, CFI and TIMP3 suggest causal roles for these genes, as does a splice variant in SLC16A8. Our results support the hypothesis that rare coding variants can pinpoint causal genes within known genetic loci and illustrate that applying the approach systematically to detect new loci requires extremely large sample sizes.We thank all participants of all the studies included for enabling this research by their participation in these studies. Computer resources for this project have been provided by the high-performance computing centers of the University of Michigan and the University of Regensburg. Group-specific acknowledgments can be found in the Supplementary Note. The Center for Inherited Diseases Research (CIDR) Program contract number is HHSN268201200008I. This and the main consortium work were predominantly funded by 1X01HG006934-01 to G.R.A. and R01 EY022310 to J.L.H
New genetic loci link adipose and insulin biology to body fat distribution.
Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 × 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms
Discovery and refinement of loci associated with lipid levels
Levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides and total cholesterol are heritable, modifiable risk factors for coronary artery disease. To identify new loci and refine known loci influencing these lipids, we examined 188,577 individuals using genome-wide and custom genotyping arrays. We identify and annotate 157 loci associated with lipid levels at P \u3c 5 × 10 -8, including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipid levels are often associated with cardiovascular and metabolic traits, including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio and body mass index. Our results demonstrate the value of using genetic data from individuals of diverse ancestry and provide insights into the biological mechanisms regulating blood lipids to guide future genetic, biological and therapeutic research. © 2013 Nature America, Inc. All rights reserved
The Type 2 Diabetes Knowledge Portal: an Open access Genetic Resource Dedicated to Type 2 Diabetes and Related Traits
Associations between human genetic variation and clinical phenotypes have become a foundation of biomedical research. Most repositories of these data seek to be disease-agnostic and therefore lack disease-focused views. The Type 2 Diabetes Knowledge Portal (T2DKP) is a public resource of genetic datasets and genomic annotations dedicated to type 2 diabetes (T2D) and related traits. Here, we seek to make the T2DKP more accessible to prospective users and more useful to existing users. First, we evaluate the T2DKP\u27s comprehensiveness by comparing its datasets with those of other repositories. Second, we describe how researchers unfamiliar with human genetic data can begin using and correctly interpreting them via the T2DKP. Third, we describe how existing users can extend their current workflows to use the full suite of tools offered by the T2DKP. We finally discuss the lessons offered by the T2DKP toward the goal of democratizing access to complex disease genetic results
