36 research outputs found

    The molecular basis, genetic control and pleiotropic effects of local gene co-expression.

    Get PDF
    Nearby genes are often expressed as a group. Yet, the prevalence, molecular mechanisms and genetic control of local gene co-expression are far from being understood. Here, by leveraging gene expression measurements across 49 human tissues and hundreds of individuals, we find that local gene co-expression occurs in 13% to 53% of genes per tissue. By integrating various molecular assays (e.g. ChIP-seq and Hi-C), we estimate the ability of several mechanisms, such as enhancer-gene interactions, in distinguishing gene pairs that are co-expressed from those that are not. Notably, we identify 32,636 expression quantitative trait loci (eQTLs) which associate with co-expressed gene pairs and often overlap enhancer regions. Due to affecting several genes, these eQTLs are more often associated with multiple human traits than other eQTLs. Our study paves the way to comprehend trait pleiotropy and functional interpretation of QTL and GWAS findings. All local gene co-expression identified here is available through a public database ( https://glcoex.unil.ch/ )

    Expression estimation and eQTL mapping for HLA genes with a personalized pipeline.

    Get PDF
    The HLA (Human Leukocyte Antigens) genes are well-documented targets of balancing selection, and variation at these loci is associated with many disease phenotypes. Variation in expression levels also influences disease susceptibility and resistance, but little information exists about the regulation and population-level patterns of expression. This results from the difficulty in mapping short reads originated from these highly polymorphic loci, and in accounting for the existence of several paralogues. We developed a computational pipeline to accurately estimate expression for HLA genes based on RNA-seq, improving both locus-level and allele-level estimates. First, reads are aligned to all known HLA sequences in order to infer HLA genotypes, then quantification of expression is carried out using a personalized index. We use simulations to show that expression estimates obtained in this way are not biased due to divergence from the reference genome. We applied our pipeline to the GEUVADIS dataset, and compared the quantifications to those obtained with reference transcriptome. Although the personalized pipeline recovers more reads, we found that using the reference transcriptome produces estimates similar to the personalized pipeline (r ≥ 0.87) with the exception of HLA-DQA1. We describe the impact of the HLA-personalized approach on downstream analyses for nine classical HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Although the influence of the HLA-personalized approach is modest for eQTL mapping, the p-values and the causality of the eQTLs obtained are better than when the reference transcriptome is used. We investigate how the eQTLs we identified explain variation in expression among lineages of HLA alleles. Finally, we discuss possible causes underlying differences between expression estimates obtained using RNA-seq, antibody-based approaches and qPCR

    The effect of genetic variation on promoter usage and enhancer activity.

    Get PDF
    The identification of genetic variants affecting gene expression, namely expression quantitative trait loci (eQTLs), has contributed to the understanding of mechanisms underlying human traits and diseases. The majority of these variants map in non-coding regulatory regions of the genome and their identification remains challenging. Here, we use natural genetic variation and CAGE transcriptomes from 154 EBV-transformed lymphoblastoid cell lines, derived from unrelated individuals, to map 5376 and 110 regulatory variants associated with promoter usage (puQTLs) and enhancer activity (eaQTLs), respectively. We characterize five categories of genes associated with puQTLs, distinguishing single from multi-promoter genes. Among multi-promoter genes, we find puQTL effects either specific to a single promoter or to multiple promoters with variable effect orientations. Regulatory variants associated with opposite effects on different mRNA isoforms suggest compensatory mechanisms occurring between alternative promoters. Our analyses identify differential promoter usage and modulation of enhancer activity as molecular mechanisms underlying eQTLs related to regulatory elements

    Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

    Full text link
    A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes

    MethCORR modelling of methylomes from formalin-fixed paraffin-embedded tissue enables characterization and prognostication of colorectal cancer

    Get PDF
    Transcriptional characterization and classification has potential to resolve the inter-tumor heterogeneity of colorectal cancer and improve patient management. Yet, robust transcriptional profiling is difficult using formalin-fixed, paraffin-embedded (FFPE) samples, which complicates testing in clinical and archival material. We present MethCORR, an approach that allows uniform molecular characterization and classification of fresh-frozen and FFPE samples. MethCORR identifies genome-wide correlations between RNA expression and DNA methylation in fresh-frozen samples. This information is used to infer gene expression information in FFPE samples from their methylation profiles. MethCORR is here applied to methylation profiles from 877 fresh-frozen/FFPE samples and comparative analysis identifies the same two subtypes in four independent cohorts. Furthermore, subtype-specific prognostic biomarkers that better predicts relapse-free survival (HR = 2.66, 95%CI [1.67-4.22], P value < 0.001 (log-rank test)) than UICC tumor, node, metastasis (TNM) staging and microsatellite instability status are identified and validated using DNA methylation-specific PCR. The MethCORR approach is general, and may be similarly successful for other cancer types

    Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription.

    Get PDF
    DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event

    Comprehensive evaluation of coding region point mutations in microsatellite-unstable colorectal cancer

    Get PDF
    Microsatellite instability (MSI) leads to accumulation of an excessive number of mutations in the genome, mostly small insertions and deletions. MSI colorectal cancers (CRCs), however, also contain more point mutations than microsatellite-stable (MSS) tumors, yet they have not been as comprehensively studied. To identify candidate driver genes affected by point mutations in MSI CRC, we ranked genes based on mutation significance while correcting for replication timing and gene expression utilizing an algorithm, MutSigCV. Somatic point mutation data from the exome kit-targeted area from 24 exome-sequenced sporadic MSI CRCs and respective normals, and 12 whole-genome-sequenced sporadic MSI CRCs and respective normals were utilized. The top 73 genes were validated in 93 additional MSI CRCs. The MutSigCV ranking identified several well-established MSI CRC driver genes and provided additional evidence for previously proposed CRC candidate genes as well as shortlisted genes that have to our knowledge not been linked to CRC before. Two genes, SMARCB1 and STK38L, were also functionally scrutinized, providing evidence of a tumorigenic role, for SMARCB1 mutations in particular. © 2018 The Authors. Published under the terms of the CC BY 4.0 licensePeer reviewe

    Gene co-expression analysis identifies brain regions and cell types involved in migraine pathophysiology

    Get PDF
    Migraine is a common disabling neurovascular brain disorder typically characterised by attacks of severe headache and associated with autonomic and neurological symptoms. Migraine is caused by an interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified over a dozen genetic loci associated with migraine. Here, we integrated migraine GWAS data with high-resolution spatial gene expression data of normal adult brains from the Allen Human Brain Atlas to identify specific brain regions and molecular pathways that are possibly involved in migraine pathophysiology. To this end, we used two complementary methods. In GWAS data from 23,285 migraine cases and 95,425 controls, we first studied modules of co-expressed genes that were calculated based on human brain expression data for enrichment of genes that showed association with migraine. Enrichment of a migraine GWAS signal was found for five modules that suggest involvement in migraine pathophysiology of: (i) neurotransmission, protein catabolism and mitochondria in the cortex; (ii) transcription regulation in the cortex and cerebellum; and (iii) oligodendrocytes and mitochondria in subcortical areas. Second, we used the high-confidence genes from the migraine GWAS as a basis to construct local migraine-related co-expression gene networks. Signatures of all brain regions and pathways that were prominent in the first method also surfaced in the second method, thus providing support that these brain regions and pathways are indeed involved in migraine pathophysiology

    The role of physical activity in metabolic homeostasis before and after the onset of type 2 diabetes: an IMI DIRECT study.

    Get PDF
    AIMS/HYPOTHESIS: It is well established that physical activity, abdominal ectopic fat and glycaemic regulation are related but the underlying structure of these relationships is unclear. The previously proposed twin-cycle hypothesis (TC) provides a mechanistic basis for impairment in glycaemic control through the interactions of substrate availability, substrate metabolism and abdominal ectopic fat accumulation. Here, we hypothesise that the effect of physical activity in glucose regulation is mediated by the twin-cycle. We aimed to examine this notion in the Innovative Medicines Initiative Diabetes Research on Patient Stratification (IMI DIRECT) Consortium cohorts comprised of participants with normal or impaired glucose regulation (cohort 1: N ≤ 920) or with recently diagnosed type 2 diabetes (cohort 2: N ≤ 435). METHODS: We defined a structural equation model that describes the TC and fitted this within the IMI DIRECT dataset. A second model, twin-cycle plus physical activity (TC-PA), to assess the extent to which the effects of physical activity in glycaemic regulation are mediated by components in the twin-cycle, was also fitted. Beta cell function, insulin sensitivity and glycaemic control were modelled from frequently sampled 75 g OGTTs (fsOGTTs) and mixed-meal tolerance tests (MMTTs) in participants without and with diabetes, respectively. Abdominal fat distribution was assessed using MRI, and physical activity through wrist-worn triaxial accelerometry. Results are presented as standardised beta coefficients, SE and p values, respectively. RESULTS: The TC and TC-PA models showed better fit than null models (TC: χ2 = 242, p = 0.004 and χ2 = 63, p = 0.001 in cohort 1 and 2, respectively; TC-PA: χ2 = 180, p = 0.041 and χ2 = 60, p = 0.008 in cohort 1 and 2, respectively). The association of physical activity with glycaemic control was primarily mediated by variables in the liver fat cycle. CONCLUSIONS/INTERPRETATION: These analyses partially support the mechanisms proposed in the twin-cycle model and highlight mechanistic pathways through which insulin sensitivity and liver fat mediate the association between physical activity and glycaemic control.S.Bra. was funded by the UK Medical Research Council [MC_UU_12015/3]

    Genetic variant effects on gene expression in human pancreatic islets and their implications for T2D

    Get PDF
    Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues
    corecore