138 research outputs found

    Leveraging Summary Statistics to Make Inferences about Complex Phenotypes in Large Biobanks

    Get PDF
    As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids

    Using population biobanks to understand complex traits, rare diseases, and their shared genetic architecture

    Get PDF
    The study of the role of genetic variability in common traits has led to a growing number of studies aimed at representing whole populations. These studies gather multiple layers of information on healthy and non-healthy individuals at large scales, constituting what is known as population biobanks.In this thesis I took advantage of the potential of these population biobanks to measure the influence of genetic variation in common and rare traits. I explored the mechanisms behind these by exploring their interaction with conditions, physiological measurements, and habits in general and healthy population. First, I used the Lifelines cohort, with genetic information of Dutch population. Here, my colleagues and I explored traits with different levels of genetic influence we uncovered associations between both Blood type and dairy consumption with human gut microbiome function and composition, and we identified a protective factor for a rare type of cardiomyopathy with potential use for diagnosis.Additionally, within a global collaboration across world-wide biobanks totaling > 2 million individuals, we demonstrated the robustness of the connections between genetic variation and 14 different diseases across the populations. We also provided methodological guidance for the combination of the effects of genetic variation to calculate the risk of disease in studies including biobanks with populations of different ethnic backgrounds.Overall, my PhD research contributed on identifying and validating which factors are relevant for potential clinical applications, and provided guidelines to be used in future genetic studies on common traits and diseases at a global scale

    Integration of evidence across human and model organism studies: A meeting report.

    Get PDF
    The National Institute on Drug Abuse and Joint Institute for Biological Sciences at the Oak Ridge National Laboratory hosted a meeting attended by a diverse group of scientists with expertise in substance use disorders (SUDs), computational biology, and FAIR (Findability, Accessibility, Interoperability, and Reusability) data sharing. The meeting\u27s objective was to discuss and evaluate better strategies to integrate genetic, epigenetic, and \u27omics data across human and model organisms to achieve deeper mechanistic insight into SUDs. Specific topics were to (a) evaluate the current state of substance use genetics and genomics research and fundamental gaps, (b) identify opportunities and challenges of integration and sharing across species and data types, (c) identify current tools and resources for integration of genetic, epigenetic, and phenotypic data, (d) discuss steps and impediment related to data integration, and (e) outline future steps to support more effective collaboration-particularly between animal model research communities and human genetics and clinical research teams. This review summarizes key facets of this catalytic discussion with a focus on new opportunities and gaps in resources and knowledge on SUDs

    Systems genetics approaches for understanding complex traits with relevance for human disease.

    Get PDF
    peer reviewedQuantitative traits are often complex because of the contribution of many loci, with further complexity added by environmental factors. In medical research, systems genetics is a powerful approach for the study of complex traits, as it integrates intermediate phenotypes, such as RNA, protein, and metabolite levels, to understand molecular and physiological phenotypes linking discrete DNA sequence variation to complex clinical and physiological traits. The primary purpose of this review is to describe some of the resources and tools of systems genetics in humans and rodent models, so that researchers in many areas of biology and medicine can make use of the data

    Longitudinal multi-dimensional investigation of metabolic and endocrine genetics

    Get PDF
    Genome-wide association studies (GWASs) in recent decades have revealed the genetic landscape and shared aetiology of common, complex traits across the spectrum of human phenotypes. In this work, I develop and apply statistical tools to interrogate the genetic basis of, and relationships between, metabolic and endocrine traits. I demonstrate that under-explored primary care electronic health records (EHRs), linked to massive biobank projects across the globe, are a valuable source of longitudinal and rare biomarker data for genetics studies. Using EHRs, I find a common missense variant in the APOE gene that is associated with weight-loss in adulthood, which replicates in three global biobanking cohorts of between 125,000 to 475,000 individuals each. While the heritability of weight-change is low ( 700,000 participants across seven global biobanks), to characterise the genetic contributions to these common but poorly understood phenotypes. I find 21 unique genetic loci for infertility, of which only six colocalise with reproductive hormone levels. While there is modest correlation between female infertility and heritable diseases of the reproductive tract, such as endometriosis (rG = 58%) and polycystic ovary syndrome (PCOS) (rG = 40%), I find no evidence for metabolic conditions such as obesity in the genetic aetiology of infertility. I explore these findings further through Mendelian Randomisation analyses to reveal heterogeneity in the genetically predicted causal effects of overall and central obesity on the risk of female reproductive conditions, including infertility, endometriosis, and PCOS, which may be partly genetically mediated by hormone levels. Through a range of genetics-based investigations, I outline the shared and distinct mechanisms of metabolic and endocrine disease in humans

    Discovering Pleiotropy Across Circulatory System Diseases And Nervous System Disorders

    Get PDF
    Pleiotropy is a phenomenon which describes a gene or a genetic variant that affects more than one phenotype. This fundamental concept has been thought to play a critical role in genetics, medicine, evolutionary biology, molecular biology, and clinical research. With the recent development in sequencing technologies and statistical methods, pleiotropy can be characterized systematically in human genome. Circulatory system diseases and nervous system disorders have a significant impact on mortality rates worldwide and frequently co-occur in patients. Thus, the field would benefit greatly from the knowledge of the underlying genetic relationship between multiple diseases in these disease categories. In this dissertation, we aim to identify pleiotropy across a wide range of circulatory system diseases and nervous system disorders using large-scale electronic health record-linked biobank datasets. For common genetic variants, we applied an ensemble of methods including univariate, multivariate, and sequential multivariate association methods to characterize pleiotropy in the UK Biobank and the eMERGE network. Our results implicated five pleiotropic regions that help to explain the disease relationships across these disease categories. For rare variants, we performed univariate burden and dispersion tests using whole-exome sequencing data from the UK Biobank and characterized 143 Bonferroni significant pleiotropic genes. Our analytical framework on both common and rare genetic variants offer novel insights into biology and provide a new perspective for studying pleiotropy in large-scale biobank datasets. Besides the application of statistical methods on natural biomedical datasets, we also conducted simulation projects investigating the impact of sample size imbalance on the performance of the proposed statistical methods. Our simulation results can serve as a reference guideline to assist sample size design for association studies

    Statistical Methods for Large Scale Genetic Analyses

    Full text link
    Population scale genomic analyses have informed the development of novel therapeutics, diagnostics, and understanding of disease etiology. Among the recent developments in human genetic association analyses, electronic health record (EHR) linked biobanks and population scale whole genome sequencing (WGS) have provided fertile ground for association discovery. In tandem with the emergence of these approaches, novel computational and statistical approaches are needed to address the methodological challenges of working with these data. In Chapter 2, I present study design recommendations and meta-analysis results for genetic association studies applied to clinical laboratory data in EHR linked biobanks. We conducted genome-wide association studies (GWAS) of 70 clinical lab traits from both the Michigan Genomics Initiative (MGI) and BioVU from the University of Vanderbilt health system. In addition to the discovery of novel association results, we conducted systematic study design analyses in parallel across the two biobanks to inform recommendations for association studies of lab traits. In Chapter 3, I present a novel sparse Mendelian randomization (MR) method for causal inference. MR methods are an instrumental variable approach for inferring the causal effect of an exposure on an outcome using genetic variants as an instrument. Under settings where the proportion of genetic variants that are causal is low, current approaches that assume dense genetic architectures may have poor statistical power. Here, we present a novel Bayesian MR method using a horseshoe prior which can be applied to summary statistics. The horseshoe prior is a continuous-scale shrinkage prior which facilitates variable selection. We use simulations to evaluate the performance of the method across genetic architectures. We apply the method to lab trait GWAS summary statistics. In Chapter 4, I present a novel method for estimating the rate at which somatic clones are expanding in clonal hematopoiesis. Clonal hematopoiesis refers to a state of mosaicism in blood defined by the acquisition of oncogenic driver mutations at an appreciate clone size and can be identified using WGS. Previous approaches for describing the growth of these mutations have relied on longitudinal sequencing methods. Here, we develop a Bayesian hierarchical model for estimating the parameters that describe the expansion of driver variants. In contrast to previous reports, our method only requires a single draw of blood. We validate the method using simulations and longitudinal amplicon sequencing. We apply our method to ~5,000 samples with clonal hematopoiesis from the Trans-Omics for Precision Medicine (TOPMed) sequencing initiative, enabling association studies of the molecular determinants of clonal expansion.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169713/1/jweinstk_1.pd

    Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations

    Get PDF
    Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well

    Policy Issues Associated with Undertaking a New Large U.S. Population Cohort Study of Genes, Environment, and Disease

    Get PDF
    This report describes the efforts of the Secretary’s Advisory Committee on Genetics, Health, and Society (SACGHS) to assess the need and readiness for a new large population study (LPS) in the United States and presents recommendations to the Secretary of the U.S. Department of Health and Human Services (HHS) so that this concept can be further explored. The HHS Secretary established SACGHS in 2002 as a public forum for deliberation on the broad range of human health and societal issues raised by advances in genetics and, as warranted, the development of advice on these issues. In a March 2004 priority-setting process, SACGHS identified 11 high-priority issues warranting its attention and analysis. One of those issues was the need for an analysis of the opportunities and challenges associated with conducting an LPS aimed at understanding the relationships between genes, environments,1 and their interactions and common complex diseases. Among the considerations that led the Committee to this decision was the fact that discussions were underway at the National Institutes of Health (NIH) about whether the United States should mount a new large population-based study. In June 2005, as SACGHS factfinding efforts were beginning, NIH Director Dr. Elias A. Zerhouni requested that the Committee develop a report on the preliminary questions, steps, and strategies that would need to be addressed before considering the larger question of whether the United States should undertake a new LPS. Specifically, the Committee was asked to (1) delineate the questions that need to be addressed for policymakers to determine whether the U.S. Government should undertake a new LPS to elucidate the influences of genetic variations and environmental factors on common complex diseases; (2) explore the ways in which, or processes by which, the questions identified in step 1 can be addressed, including any intermediate research studies, pilot projects, or policy analysis efforts needed; and (3) determine the possible ways in which these questions could be addressed, taking into account the feasibility of those approaches expect the Committee to recommend solutions to the questions raised. The next section summarizes exploratory work by the National Human Genome Research Institute (NHGRI) and factfinding and consultative efforts by SACGHS on this issue. Chapter II presents the scientific basis for an LPS. Chapter III outlines the key policy issues that SACGHS has identified as warranting further attention. Chapter IV discusses the critical role that public engagement must play in determining the willingness of U.S. citizens to support and participate in such an endeavor. In keeping with its agreed upon charge, throughout this report the Committee explores the ways in which the identified policy issues could be addressed and describes possible approaches for the HHS Secretary’s consideration.http://oba.od.nih.gov/SACGHS/sacghs_focus_population.htm

    Human Genomics and Drug Development

    Get PDF
    Insights into the genetic basis of human disease are helping to address some of the key challenges in new drug development including the very high rates of failure. Here we review the recent history of an emerging, genomics-assisted approach to pharmaceutical research and development, and its relationship to Mendelian randomization (MR), a well-established analytical approach to causal inference. We demonstrate how human genomic data linked to pharmaceutically relevant phenotypes can be used for (1) drug target identification (mapping relevant drug targets to diseases), (2) drug target validation (inferring the likely effects of drug target perturbation), (3) evaluation of the effectiveness and specificity of compound-target engagement (inferring the extent to which the effects of a compound are exclusive to the target and distinguishing between on-target and off-target compound effects), and (4) the selection of end points in clinical trials (the diseases or conditions to be evaluated as trial outcomes). We show how genomics can help identify indication expansion opportunities for licensed drugs and repurposing of compounds developed to clinical phase that proved safe but ineffective for the original intended indication. We outline statistical and biological considerations in using MR for drug target validation (drug target MR) and discuss the obstacles and challenges for scaled applications of these genomics-based approaches
    • …
    corecore