215 research outputs found

    Whole-genome sequencing to understand the genetic architecture of common gene expression and biomarker phenotypes.

    Get PDF
    Initial results from sequencing studies suggest that there are relatively few low-frequency (<5%) variants associated with large effects on common phenotypes. We performed low-pass whole-genome sequencing in 680 individuals from the InCHIANTI study to test two primary hypotheses: (i) that sequencing would detect single low-frequency-large effect variants that explained similar amounts of phenotypic variance as single common variants, and (ii) that some common variant associations could be explained by low-frequency variants. We tested two sets of disease-related common phenotypes for which we had statistical power to detect large numbers of common variant-common phenotype associations-11 132 cis-gene expression traits in 450 individuals and 93 circulating biomarkers in all 680 individuals. From a total of 11 657 229 high-quality variants of which 6 129 221 and 5 528 008 were common and low frequency (<5%), respectively, low frequency-large effect associations comprised 7% of detectable cis-gene expression traits [89 of 1314 cis-eQTLs at P < 1 × 10(-06) (false discovery rate ∼5%)] and one of eight biomarker associations at P < 8 × 10(-10). Very few (30 of 1232; 2%) common variant associations were fully explained by low-frequency variants. Our data show that whole-genome sequencing can identify low-frequency variants undetected by genotyping based approaches when sample sizes are sufficiently large to detect substantial numbers of common variant associations, and that common variant associations are rarely explained by single low-frequency variants of large effect

    Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain

    Get PDF
    Our knowledge of the transcriptome has become much more complex since the days of the central dogma of molecular biology. We now know that splicing takes place to create potentially thousands of isoforms from a single gene, and we know that RNA does not always faithfully recapitulate DNA if RNA editing occurs. Collectively, these observations show that the transcriptome is amazingly rich with intricate regulatory mechanisms for overall gene expression, splicing, and RNA editing. Genetic variability can play a role in controlling gene expression, which can be identified by examining expression quantitative trait loci (eQTLs). eQTLs are genomic regions where genetic variants, including single nucleotide polymorphisms (SNPs) show a statistical association with expression of mRNA transcripts. In humans, many SNPs are also associated with disease, and have been identified using genome wide association studies (GWAS) but the biological effects of those SNPs are usually not known. If SNPs found in GWAS are also found in eQTLs, then one could hypothesize that expression levels may contribute to disease risk. Performing eQTL analysis with GWAS SNPs in both blood and brain, specifically the frontal cortex and the cerebellum, we found both shared and tissue unique eQTLS. The identification of tissue-unique eQTLs supports the argument that choice of tissue type is important in eQTL studies (Paper I). Aging is a complex process with the mechanisms underlying aging still being poorly defined. There is evidence that the transcriptome changes with age, and hence we used the brain dataset from our first paper as a discovery set, with an additional replication dataset, to investigate any aging-gene expression associations. We found evidence that many genes were associated with aging. We further found that there were more statically significant expression changes in the frontal cortex versus the cerebellum, indicating that brain regions may age at different rates. As the brain is a heterogeneous tissue including both neurons and non-neuronal cells, we used LCM to capture Purkinje cells as a representative neuronal type and repeated the age analysis. Looking at the discovery, replication and Purkinje cell datasets we found five genes with strong, replicated evidence of age-expression associations (Paper II). Being able to capture and quantify the depth of the transcriptome has been a lengthy process starting with methods that could only measure a single gene to genome-wide techniques such as microarray. A recently developed technology, RNA-Seq, shows promise in its ability to capture expression, splicing, and editing and with its broad dynamic range quantification is accurate and reliable. RNA-Seq is, however, data intensive and a great deal of computational expertise is required to fully utilize the strengths of this method. We aimed to create a small, well-controlled, experiment in order to test the performance of this relatively new technology in the brain. We chose embryonic versus adult cerebral cortex, as mice are genetically homogenous and there are many known differences in gene expression related to brain development that we could use as benchmarks for analysis testing. We found a large number of differences in total gene expression between embryonic and adult brain. Rigorous technical and biological validation illustrated the accuracy and dynamic range of RNA-Seq. We were also able to interrogate differences in exon usage in the same dataset. Finally we were able to identify and quantify both well-known and novel A-to-I edit sites. Overall this project helped us develop the tools needed to build usable pipelines for RNA-Seq data processing (Paper III). Our studies in the developing brain (Paper III) illustrated that RNA-Seq was a useful unbiased method for investigating RNA editing. To extend this further, we utilized a genetically modified mouse model to study the transcriptomic role of the RNA editing enzyme ADAR2. We found that ADAR2 was important for editing of the coding region of mRNA as a large proportion of RNA editing sites in coding regions had a statistically significant decrease in editing percentages in Adar2 -/-Gria2 R/R mice versus controls. However, despite indications in the literature that ADAR2 may also be involved in splicing and expression regulatory machinery we found no changes in gene expression or exon utilization in Adar2 -/-Gria2 R/R mice as compared to their littermate controls (Paper IV). In our final study, based on the methods developed in Papers III and IV, we revisited the idea of age related gene expression associations from Paper II. We used a subset of human frontal cortices for RNA sequencing. Interestingly we found more gene expression changes with aging compared to the previous data using microarrays in Paper II. When the significant gene lists were analysed for gene ontology enrichment, we found that there was a large number of downregulated genes involved in synaptic function while those that were upregulated had enrichment in immune function. This dataset illustrates that the aging brain may be predisposed to the processes found in neurodegenerative diseases (Paper V)

    Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke

    Get PDF
    Genetic factors have been implicated in stroke risk but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) in ischemic stroke and its subtypes in 3,548 cases and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 cases and 6,281 controls. We replicated reported associations between variants close to PITX2 and ZFHX3 with cardioembolic stroke, and a 9p21 locus with large vessel stroke. We identified a novel association for a SNP within the histone deacetylase 9(HDAC9) gene on chromosome 7p21.1 which was associated with large vessel stroke including additional replication in a further 735 cases and 28583 controls (rs11984041, combined P = 1.87×10−11, OR=1.42 (95% CI) 1.28-1.57). All four loci exhibit evidence for heterogeneity of effect across the stroke subtypes, with some, and possibly all, affecting risk for only one subtype. This suggests differing genetic architectures for different stroke subtypes

    Allelic heterogeneity and more detailed analyses of known loci explain additional phenotypic variation and reveal complex patterns of association

    Get PDF
    The identification of multiple signals at individual loci could explain additional phenotypic variance (‘missing heritability’) of common traits, and help identify causal genes. We examined gene expression levels as a model trait because of the large number of strong genetic effects acting in cis. Using expression profiles from 613 individuals, we performed genome-wide single nucleotide polymorphism (SNP) analyses to identify cis-expression quantitative trait loci (eQTLs), and conditional analysis to identify second signals. We examined patterns of association when accounting for multiple SNPs at a locus and when including additional SNPs from the 1000 Genomes Project. We identified 1298 cis-eQTLs at an approximate false discovery rate 0.01, of which 118 (9%) showed evidence of a second independent signal. For this subset of 118 traits, accounting for two signals resulted in an average 31% increase in phenotypic variance explained (Wilcoxon P< 0.0001). The association of SNPs with cis gene expression could increase, stay similar or decrease in significance when accounting for linkage disequilibrium with second signals at the same locus. Pairs of SNPs increasing in significance tended to have gene expression increasing alleles on opposite haplotypes, whereas pairs of SNPs decreasing in significance tended to have gene expression increasing alleles on the same haplotypes. Adding data from the 1000 Genomes Project showed that apparently independent signals could be potentially explained by a single association signal. Our results show that accounting for multiple variants at a locus will increase the variance explained in a substantial fraction of loci, but that allelic heterogeneity will be difficult to define without resequencing loci and functional work

    Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A central aim for studying runs of homozygosity (ROHs) in genome-wide SNP data is to detect the effects of autozygosity (stretches of the two homologous chromosomes within the same individual that are identical by descent) on phenotypes. However, it is unknown which current ROH detection program, and which set of parameters within a given program, is optimal for differentiating ROHs that are truly autozygous from ROHs that are homozygous at the marker level but vary at unmeasured variants between the markers.</p> <p>Method</p> <p>We simulated 120 Mb of sequence data in order to know the true state of autozygosity. We then extracted common variants from this sequence to mimic the properties of SNP platforms and performed ROH analyses using three popular ROH detection programs, PLINK, GERMLINE, and BEAGLE. We varied detection thresholds for each program (e.g., prior probabilities, lengths of ROHs) to understand their effects on detecting known autozygosity.</p> <p>Results</p> <p>Within the optimal thresholds for each program, PLINK outperformed GERMLINE and BEAGLE in detecting autozygosity from distant common ancestors. PLINK's sliding window algorithm worked best when using SNP data pruned for linkage disequilibrium (LD).</p> <p>Conclusion</p> <p>Our results provide both general and specific recommendations for maximizing autozygosity detection in genome-wide SNP data, and should apply equally well to research on whole-genome autozygosity burden or to research on whether specific autozygous regions are predictive using association mapping methods.</p

    A novel MMP12 locus is associated with large artery atherosclerotic stroke using a genome-wide age-at-onset informed approach.

    Get PDF
    Genome-wide association studies (GWAS) have begun to identify the common genetic component to ischaemic stroke (IS). However, IS has considerable phenotypic heterogeneity. Where clinical covariates explain a large fraction of disease risk, covariate informed designs can increase power to detect associations. As prevalence rates in IS are markedly affected by age, and younger onset cases may have higher genetic predisposition, we investigated whether an age-at-onset informed approach could detect novel associations with IS and its subtypes; cardioembolic (CE), large artery atherosclerosis (LAA) and small vessel disease (SVD) in 6,778 cases of European ancestry and 12,095 ancestry-matched controls. Regression analysis to identify SNP associations was performed on posterior liabilities after conditioning on age-at-onset and affection status. We sought further evidence of an association with LAA in 1,881 cases and 50,817 controls, and examined mRNA expression levels of the nearby genes in atherosclerotic carotid artery plaques. Secondly, we performed permutation analyses to evaluate the extent to which age-at-onset informed analysis improves significance for novel loci. We identified a novel association with an MMP12 locus in LAA (rs660599; p = 2.5×10⁻⁷), with independent replication in a second population (p = 0.0048, OR(95% CI) = 1.18(1.05-1.32); meta-analysis p = 2.6×10⁻⁸). The nearby gene, MMP12, was significantly overexpressed in carotid plaques compared to atherosclerosis-free control arteries (p = 1.2×10⁻¹⁵; fold change = 335.6). Permutation analyses demonstrated improved significance for associations when accounting for age-at-onset in all four stroke phenotypes (p<0.001). Our results show that a covariate-informed design, by adjusting for age-at-onset of stroke, can detect variants not identified by conventional GWAS

    C9ORF72 hexanucleotide repeat exerts toxicity in a stable, inducible motor neuronal cell model, which is rescued by partial depletion of Pten.

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a devastating and incurable neurodegenerative disease, characterised by progressive failure of the neuromuscular system. A (G4C2)n repeat expansion in C9ORF72 is the most common genetic cause of ALS and frontotemporal dementia (FTD). To date, the balance of evidence indicates that the (G4C2)n repeat causes toxicity and neurodegeneration via a gain-of-toxic function mechanism; either through direct RNA toxicity or through the production of toxic aggregating dipeptide repeat proteins. Here, we have generated a stable and isogenic motor neuronal NSC34 cell model with inducible expression of a (G4C2)102 repeat, to investigate the gain-of-toxic function mechanisms. The expression of the (G4C2)102 repeat produces RNA foci and also undergoes RAN translation. In addition, the expression of the (G4C2)102 repeat shows cellular toxicity. Through comparison of transcriptomic data from the cellular model with laser-captured spinal motor neurons from C9ORF72-ALS cases, we also demonstrate that the PI3K/Akt cell survival signalling pathway is dysregulated in both systems. Furthermore, partial knockdown of Pten rescues the toxicity observed in the NSC34 (G4C2)102 cellular gain-of-toxic function model of C9ORF72-ALS. Our data indicate that PTEN may provide a potential therapeutic target to ameliorate toxic effects of the (G4C2)n repeat

    Comprehensive Research Synopsis and Systematic Meta-Analyses in Parkinson's Disease Genetics: The PDGene Database

    Get PDF
    More than 800 published genetic association studies have implicated dozens of potential risk loci in Parkinson's disease (PD). To facilitate the interpretation of these findings, we have created a dedicated online resource, PDGene, that comprehensively collects and meta-analyzes all published studies in the field. A systematic literature screen of ∼27,000 articles yielded 828 eligible articles from which relevant data were extracted. In addition, individual-level data from three publicly available genome-wide association studies (GWAS) were obtained and subjected to genotype imputation and analysis. Overall, we performed meta-analyses on more than seven million polymorphisms originating either from GWAS datasets and/or from smaller scale PD association studies. Meta-analyses on 147 SNPs were supplemented by unpublished GWAS data from up to 16,452 PD cases and 48,810 controls. Eleven loci showed genome-wide significant (P<5×10−8) association with disease risk: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, and SYT11/RAB25. In addition, we identified novel evidence for genome-wide significant association with a polymorphism in ITGA8 (rs7077361, OR 0.88, P = 1.3×10−8). All meta-analysis results are freely available on a dedicated online database (www.pdgene.org), which is cross-linked with a customized track on the UCSC Genome Browser. Our study provides an exhaustive and up-to-date summary of the status of PD genetics research that can be readily scaled to include the results of future large-scale genetics projects, including next-generation sequencing studies

    Genetic overlap between diagnostic subtypes of ischemic stroke

    Get PDF
    Background and Purpose: Despite moderate heritability, the phenotypic heterogeneity of ischemic stroke has hampered gene discovery, motivating analyses of diagnostic subtypes with reduced sample sizes. We assessed evidence for a shared genetic basis among the 3 major subtypes: large artery atherosclerosis (LAA), cardioembolism, and small vessel disease (SVD), to inform potential cross-subtype analyses. Methods: Analyses used genome-wide summary data for 12 389 ischemic stroke cases (including 2167 LAA, 2405 cardioembolism, and 1854 SVD) and 62 004 controls from the Metastroke consortium. For 4561 cases and 7094 controls, individual-level genotype data were also available. Genetic correlations between subtypes were estimated using linear mixed models and polygenic profile scores. Meta-analysis of a combined LAA-SVD phenotype (4021 cases and 51 976 controls) was performed to identify shared risk alleles. Results: High genetic correlation was identified between LAA and SVD using linear mixed models (rg=0.96, SE=0.47, P=9×10-4) and profile scores (rg=0.72; 95% confid
    corecore