329 research outputs found
Alzheimer’s disease heterogeneity assessment with MRI biomarkers and unsupervised statistical learning
Alzheimer’s disease (AD) is the most common cause of dementia. It is characterized by loss of
memory and other cognitive functions. Although it is a heterogeneous condition, it has been
studied as one disease for many decades. Neuropathological data and a large body of in vivo
neuroimaging literature challenge the hypothesis that AD is a single entity, supporting the
hypothesis of AD as a heterogeneous disease.
In this thesis, we set out to understand some aspects of the heterogeneity in AD and aging with
the help of atrophy and WM integrity markers from magnetic resonance imaging (MRI). The
main aim of the thesis was to investigate the potential use of statistical and machine learning
models for the assessment of heterogeneous conditions. In Study I, we utilized whole brain
atrophy markers and cross-sectional cluster analysis to characterize the neurodegeneration
variability in a large AD dementia cohort (299 amnestic AD patients). The clusters of patients
that we discovered presented with distinct atrophy patterns. Some of them exist due to disease
severity, but we identified topologically variable atrophy patterns too. Patients of the different
clusters had distinct cognitive symptoms and clinical progression.
Then, we designed a pipeline that will help us to assess heterogeneous populations when
longitudinal neuroimaging and clinical data are available (Study II).We tested this pipeline in
atrophy data from a small dataset of AD patients to assess its usefulness in MRI data and
heterogeneous conditions. The model fitted the data well and we concluded that it can be used
in larger scale analyses. Moreover, larger numbers of participants with long follow-up period
should increase its freedom in searching for heterogeneity in longitudinal neuroimaging
trajectories. After this methodological study, we used a very large dataset that consisted of
neuroimaging, cerebrospinal fluid (CSF), and clinical data. We split our data in discovery and
prediction datasets. The discovery dataset included positive clinically diagnosed AD
dementia patients and negative cognitively unimpaired individuals (CU). Based on this
dataset (Study III), we aimed to understand whether the observed heterogeneity in AD is
caused by sampling patient’s data at different disease stages, or if it resembles distinct
neurodegeneration subtypes. We modelled longitudinal brain atrophy data anchored to the
clinical dementia onset. Our findings show that all the previously reported atrophy subtypes do
exist but some of them reflect disease stages rather than subtypes. Most importantly, our
modeling managed to summarize the observed heterogeneity in neurodegeneration with two
unique pathways (mediotemporal and cortical). These two pathways have distinct cognitive
signatures and were evaluated in a large independent AD dataset. Heterogeneity within the
pathways exist and is likely caused by a complex interaction between protective/risk factors
and concomitant non-AD pathologies.
Some findings indicate that WM changes may precede grey matter atrophy in AD. In Study
IV we investigated whether more than one WM profile exists in the aging population. We
wanted to understand their association with AD pathophysiological changes and relate them to
the risk of developing dementia. We discovered four distinct WM integrity patterns with different spatial WM integrity distribution in aging. Those patterns were related to different
longitudinal cognitive profiles and specific white matter tracts informed about cluster
assignments.
In conclusion, heterogeneity can be observed not only in AD, but also in the population
including healthy individuals. In this thesis, we identified distinct pathways of brain atrophy
and WM integrity. Understanding the heterogeneous patterns of the different
pathophysiological markers during ageing and the course of AD, will ultimately lead to the
development of disease modifying (personalized) treatments
Oxidative Modifications of Apolipoprotein(a): Implications for Proinflammatory and Prothrombotic Roles of Lipoprotein(a) in the Vasculature
Elevated plasma concentrations of lipoprotein(a) (Lp(a)) have been identified as a causal risk factor for calcific aortic valve disease (CAVD) and coronary heart disease (CHD). Relationships have recently been identified for genetic factors, such as single nucleotide polymorphisms (SNPs) in the LPA gene, specifically r10455872 and rs3798220, that have been correlated with increased Lp(a) plasma levels and risk of cardiovascular disease (CVD). Apo(a) bears striking homology with the zymogen plasminogen and possesses several similar structural features. A key feature shared between these proteins is the presence of multiple repeats of a kringle domain, which possesses the ability to bind to exposed lysine residues with high affinity. Apo(a) contains several copies of a plasminogen like KIV domain, one of which, KIV10, has been implicated in many proinflammatory processes in vitro. It has been hypothesized that the proinflammatory potential of Lp(a)/apo(a) is derived from the ability to be covalently modified by an oxidized phosphatidylcholine (oxPC) moiety. The work in this dissertation assesses the mechanism by which the oxPC moiety on apo(a) stimulates interleukin-8 (IL-8) production in macrophages. Targeted mutagenesis was used to determine a role for the KIV10 strong lysine binding site (sLBS) in the covalent addition of the oxPC moiety on apo(a) and identified the site of covalent oxPC modification at the amino acid level. Furthermore, characterization of the I4399M variant of apo(a), resulting from the rs3798220 SNP, from a perspective of its distinct structural and functional properties, revealed roles for the polymorphism on the structure and permeability of purified fibrin and plasma clots. The enhanced prothrombotic potential of this variant may be a result of an oxidized methionine residue, as identified by mass spectrometry. The identification of distinct functional properties associated with the oxidative modification of Lp(a)/ apo(a) offers insights into its proatherosclerotic and prothrombotic potentials
Recommended from our members
Electronic Health Record-Derived Phenotyping Models to Improve Genomic Research in Stroke
Stroke is a highly heterogeneous and complex disease that is a leading cause of death in the United States. The landscape of risk factors for stroke is vast, and its large genetic burden has yet to be fully discovered. We hypothesize that the small number of stroke variants recovered so far is due to 1) the vast phenotypic heterogeneity of stroke and 2) binary labeling of stroke genome-wide association study (GWAS) participants as cases or controls. Specifically, genome-wide association studies accumulate hundreds of thousands to millions of participants to acquire adequate signal for variant discovery. This requires time-consuming manual curation of cases and controls often involving large-scale collaborations. Genetic biobanks connected to electronic health records (EHR) can facilitate these studies by using data routinely captured during clinical care like billing diagnosis codes. These data, however, do not define adjudicated cases and controls, with many patients falling somewhere in between. There is an opportunity to use machine learning to add nuance to these definitions. We hypothesize that an expanded definition of disease by incorporating correlated diseases and risk factors from EHR data will improve GWAS power. We also hypothesize that granularly subtyping stroke using unsupervised learning methods can provide insight into stroke etiology and heterogeneity. In Chapter 1, we described the motivation for building upon current phenotyping methods for subtyping and genome-wide association studies to improve GWAS power. In Chapter 2, using patients from Columbia-New York Presbyterian (NYP) Hospital, we built and evaluated machine learning models to identify patients with acute ischemic stroke based on 75 different case-control and classifier combinations. In chapter 3, we compared two data-driven and unsupervised methods, non-negative matrix factorization (NMF) and Hierarchical Poisson Factorization, to subtype stroke patients and determined whether any of the subtypes correlate to stroke severity. In chapter 4, we estimated the heritability of acute ischemic stroke by treating the patient probabilities assigned by the machine learning phenotyping models for acute ischemic stroke in chapter 2 as a quantitative trait and mapping the probabilities to Columbia-NYP EHR-generated pedigrees. We also applied our machine learning phenotyping algorithm method, which we call QTPhenProxy, to venous thromboembolism on Columbia eMERGE Consortium patients and ran a genome-wide association study using the model probabilities as a quantitative trait. Finally, we applied QTPhenProxy to subjects in the UK Biobank for stroke and 14 other diseases and ran genome-wide association studies for each disease. We found that our machine-learned models performed well in identifying acute ischemic stroke patients in the Columbia-NYP EHR and in the UK Biobank. We also found some NMF-derived subtypes that were significantly correlated with stroke severity. We were underpowered in the eMERGE venous thromboembolism cohort GWAS and did not recover any known or new variants. Finally, we found that QTPhenProxy improved the power of GWAS of stroke and several subtypes in the UK Biobank, recovered known variants, and discovered a new variant that replicates in a previous stroke GWAS. Our results for QTPhenProxy demonstrate the promise of incorporating large but messy sets of data, such as the electronic health record, to improve signal in genome-wide association studies
Elucidating the genetic determinants of the archetypal complex disease hypertriglyceridemia
Cardiovascular disease (CVD) is the leading cause of morbidity and mortality in Canada. Among non-traditional risk factors, plasma triglyceride (TG) concentration is re-emerging as a significant risk factor. Patients with hypertriglyceridemia (HTG) – an archetypal complex phenotype defined by fasting plasma TG concentration \u3e95th percentile – thus have significantly increased CVD risk, compounded by associated co-morbidities such as obesity, metabolic syndrome and type 2 diabetes. However, the molecular pathways contributing to HTG susceptibility are incompletely defined. A better understanding of the genetic determinants that underlie the phenotypic spectrum of plasma TG and HTG susceptibility is necessary to identify novel genes and pathways that could be targeted to effectively lower plasma TG and improve cardiovascular risk. Accordingly, we sought to characterize the genetic architecture of HTG susceptibility and phenotypic heterogeneity using several modern genomic technologies, including high-density microarray genotyping and high-throughput resequencing of candidate genes in HTG patients and healthy controls. We demonstrate that a broad allelic spectrum of common small effect variants and rare large effect variants is associated with HTG. Furthermore, we demonstrate that significant overlap exists between genes and variants that modulate plasma TG and increase HTG susceptibility. Taken together, we suggest that HTG susceptibility is the result of a genetic burden of TG-raising alleles in genes that modulate plasma TG concentration. These findings provide a breadth of novel targets for pharmaceutical development in hopes of reducing plasma TG concentration and improve cardiovascular risk in HTG patients
Genetic association studies: application in the investigation of biomarkers related to cardiovascular diseases and study design
Cardiovascular disease (CVD) is the No. 1 cause of death in the United States, killing about 610,000 people every year. Biomarkers are important tools to identify vulnerable individuals at high risk of CVD. Investigation of the genetic architecture for biomarkers and other risk factors related to CVD is of critical importance in the prevention and treatment of CVD. For my first chapter, I conducted genome-wide admixture and association studies for iron-related traits in 2347 African Americans (AAs) participants from the Jackson Heart Study (JHS). I identified, for the first time, a second independent genome-wide significant signal in the TF region associated with total iron binding capacity levels. I also identified a novel functional missense variant in the G6PD-GAB3 region significantly associated with ferritin levels. Both results were replicated in a second AA cohort with iron measures. For my second chapter, I conducted genome-wide admixture and association studies, and gene-based exome-wide association studies of rare variants, to identify variants or genes, harboring a high burden of rare functional variants, associated with lipoprotein(a) [Lp(a)] cholesterol levels in 2895 AAs participating in the JHS. I observed significant evidence for association between Lp(a) and both local ancestry and hundreds variants spanning ~10Mb the LPA gene region on chromosome 6q. Of note, the region containing associated variants became much narrower, centered over the LPA gene (<1Mb), after adjusting for local ancestry. I also observed a single significant non-synonymous SNP in APOE and a high burden of coding variants in LPA and APOE significantly associated with Lp(a) levels For my third chapter, I investigated the genetic association of four candidate variants with blood pressure and tested the modifying effects of environmental factors in 7,319 Chinese adults from the China Nutrition and Health Survey (CHNS). I observed that rs1458038 exhibited a significant genotype-by-BMI interaction affecting blood pressure measures, with the strongest variant effects in those with the highest BMI. Finally, for my last chapter, I described a multistage GWAS study design that uses selective phenotyping to increase power for studies with existing genome-wide genotypic data and to-be-measured quantitative phenotypes that are under a sample-size constraint. The approach uses simulated annealing to identify the optimal subset of subjects to be phenotyped in Stage 2 of a two-stage GWAS. I showed that our approach has greater statistical power than the conventional approach of randomly selecting a subset of subjects for phenotyping. We demonstrate the gains in power for both directly genotyped and imputed genetic variants. Together, these studies further our understanding of the genetic architecture of risk factors for CVD, suggest some candidates for future genetic and molecular studies, and also shed some light on the study design of future large-scale genetic association studies where the cost constraints will be determined by the expense of measuring new biomarkers in studies that have existing genetic data.Doctor of Philosoph
Summaries of plenary, symposia, and oral sessions at the XXII World Congress of Psychiatric Genetics, Copenhagen, Denmark, 12-16 October 2014
The XXII World Congress of Psychiatric Genetics, sponsored by the International Society of Psychiatric Genetics, took place in Copenhagen, Denmark, on 12-16 October 2014. A total of 883 participants gathered to discuss the latest findings in the field. The following report was written by student and postdoctoral attendees. Each was assigned one or more sessions as a rapporteur. This manuscript represents topics covered in most, but not all of the oral presentations during the conference, and contains some of the major notable new findings reported
Recommended from our members
Genetic studies of cardiometabolic traits
Diet and lifestyle have changed dramatically in the last few decades, leading to an increase in prevalence of obesity, defined as a body mass index >30Kg/m2, dyslipidaemias (defined as abnormal lipid profiles) and type 2 diabetes (T2D). Together, these cardiometabolic traits and diseases, have contributed to the increased burden of cardiovascular disease, the leading cause of death in Western societies.
Complex traits and diseases, such as cardiometabolic traits, arise as a result of the interaction between an individual’s predisposing genetic makeup and a permissive environment. Since 2007, genome-wide association studies (GWAS) have been successfully applied to complex traits leading to the discovery of thousands of trait-associated variants. Nonetheless, much is still to be understood regarding the genetic architecture of these traits, as well as their underlying biology. This thesis aims to further explore the genetic architecture of cardiometabolic traits by using complementary approaches with greater genetic and phenotype resolution, ranging from studying clinically ascertained extreme phenotypes, deep molecular profiling, or sequence level data.
In chapter 2, I investigated the genetic architecture of healthy human thinness (N=1,471) and contrasted it to that of severe early onset childhood obesity (N=1,456). I demonstrated that healthy human thinness, like severe obesity, is a heritable trait, with a polygenic component. I identified a novel BMI-associated locus at PKHD1, and found evidence of association at several loci that had only been discovered using large cohorts with >40,000 individuals demonstrating the power gains in studying clinical extreme phenotypes.
In chapter 3, I coupled high-resolution nuclear magnetic resonance (NMR) measurements in healthy blood donors, with next-generation sequencing to establish the role of rare coding variation in circulating metabolic biomarker biology. In gene-based analysis, I identified ACSL1, MYCN, FBXO36 and B4GALNT3 as novel gene-trait associations (P<2.5x10-6). I also found a novel link between loss-of-function mutations in the “regulation of the pyruvate dehydrogenase (PDH) complex” pathway and intermediate-density lipoprotein (IDL), low-density lipoprotein (LDL) and circulating cholesterol measurements. In addition, I demonstrated that rare “protective” variation in lipoprotein metabolism genes was present in the lower tails of four measurements which are CVD risk factors in this healthy population, demonstrating a role for rare coding variation and the extremes of healthy phenotypes.
In chapter 4, I performed a genome-wide association study of fructosamine, a measurement of total serum protein glycation which is useful to monitor rapid changes in glycaemic levels after treatment, as it reflects average glycaemia over 2-3 weeks. In contrast to HbA1c, which reflects average glucose concentration over the life-span of the erythrocyte (~3 months), fructosamine levels are not predicted to be influenced by factors affecting the erythrocyte. Surprisingly, I found that in this dataset fructosamine had low heritability (2% vs 20% for HbA1c), and was poorly correlated with HbA1c and other glycaemic traits. Despite this, I found two loci previously associated with glycaemic or albumin traits, G6PC2 and FCGRT respectively (P<5x10-8), associated with fructosamine suggesting shared genetic influence..
Altogether my results demonstrate the utility of higher resolution genotype and phenotype data in further elucidating the genetic architecture of a range of cardiometabolic traits, and the power advantages of study designs that focus on individuals at the extremes of phenotype distribution. As large cohorts and national biobanks with sequencing and deep multi-dimensional phenotyping become more prevalent, we will be moving closer to understanding the multiple aetiological mechanisms leading to CVD, and subsequently improve diagnosis and treatment of these conditions.Wellcome Sanger Institute
CONACy
xploring Genetic Interactions: from Tools Development with Massive Parallelization on GPGPU to Multi-Phenotype Studies on Dyslexia
Over a decade, genome-wide association studies (GWASs) have provided insightful information into the genetic architecture of complex traits. However, the variants found by GWASs explain just a small portion of heritability. Meanwhile, as large scale GWASs and meta-analyses of multiple phenotypes are becoming increasingly common, there is a need to develop computationally efficient models/tools for multi-locus studies and multi-phenotype studies. Thus, we were motivated to focus on the development of tools serving for epistatic studies and to seek for analysis strategy jointly analyzed multiple phenotypes. By exploiting the technical and methodological progress, we developed three R packages. SimPhe was built based on the Cockerham epistasis model to simulate (multiple correlated) phenotype(s) with epistatic effects. Another two packages, episcan and gpuEpiScan, simplified the calculation of EPIBALSTER and epiHSIC and were implemented with high performance, especially the package based on Graphics Processing Unit (GPU). The two packages can be employed by epistasis detection in both case-control studies and quantitative trait studies. Our packages might help drive down costs of computation and increase innovation in epistatic studies. Moreover, we explored the gene-gene interactions on developmental dyslexia, which is mainly characterized by reading problems in children. Multivariate meta-analysis was performed on genome-wide interaction study (GWIS) for reading-related phenotypes in the dyslexia dataset, which contains nine cohorts from different locations. We identified one genome-wide significant epistasis, rs1442415 and rs8013684, associated with word reading, as well as suggestive genetic interactions which might affect reading abilities. Except for rs1442415, which has been reported to influence educational attainment, the genetic variants involved in the suggestive interactions have shown associations with psychiatric disorders in previous GWASs, particularly with bipolar disorder. Our findings suggest making efforts to investigate not just the genetic interactions but also multiple correlated psychiatric disorders
Comprehensive statistical and bioinformatics analysis in the deciphering of putative mechanisms by which lipid-associated GWAS loci contribute to coronary artery disease
The study was designed to evaluate putative mechanisms by which lipid-associated loci identified by genome-wide association studies (GWAS) are involved in the molecular pathogenesis of coronary artery disease (CAD) using a comprehensive statistical and bioinformatics analysis
- …