125 research outputs found

    Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human

    Get PDF
    Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases

    Detecting differential allelic expression using high-resolution melting curve analysis: application to the breast cancer susceptibility gene CHEK2

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The gene <it>CHEK2 </it>encodes a checkpoint kinase playing a key role in the DNA damage pathway. Though <it>CHEK2 </it>has been identified as an intermediate breast cancer susceptibility gene, only a small proportion of high-risk families have been explained by genetic variants located in its coding region. Alteration in gene expression regulation provides a potential mechanism for generating disease susceptibility. The detection of differential allelic expression (DAE) represents a sensitive assay to direct the search for a functional sequence variant within the transcriptional regulatory elements of a candidate gene. We aimed to assess whether <it>CHEK2 </it>was subject to DAE in lymphoblastoid cell lines (LCLs) from high-risk breast cancer patients for whom no mutation in <it>BRCA1</it> or <it>BRCA2</it> had been identified.</p> <p>Methods</p> <p>We implemented an assay based on high-resolution melting (HRM) curve analysis and developed an analysis tool for DAE assessment.</p> <p>Results</p> <p>We observed allelic expression imbalance in 4 of the 41 LCLs examined. All four were carriers of the truncating mutation 1100delC. We confirmed previous findings that this mutation induces non-sense mediated mRNA decay. In our series, we ruled out the possibility of a functional sequence variant located in the promoter region or in a regulatory element of <it>CHEK2 </it>that would lead to DAE in the transcriptional regulatory milieu of freely proliferating LCLs.</p> <p>Conclusions</p> <p>Our results support that HRM is a sensitive and accurate method for DAE assessment. This approach would be of great interest for high-throughput mutation screening projects aiming to identify genes carrying functional regulatory polymorphisms.</p

    A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies

    Get PDF
    Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/

    A risk haplotype of STAT4 for systemic lupus erythematosus is over-expressed, correlates with anti-dsDNA and shows additive effects with two risk alleles of IRF5

    Get PDF
    Systemic lupus erythematosus (SLE) is the prototype autoimmune disease where genes regulated by type I interferon (IFN) are over-expressed and contribute to the disease pathogenesis. Because signal transducer and activator of transcription 4 (STAT4) plays a key role in the type I IFN receptor signaling, we performed a candidate gene study of a comprehensive set of single nucleotide polymorphism (SNPs) in STAT4 in Swedish patients with SLE. We found that 10 out of 53 analyzed SNPs in STAT4 were associated with SLE, with the strongest signal of association (P = 7.1 × 10−8) for two perfectly linked SNPs rs10181656 and rs7582694. The risk alleles of these 10 SNPs form a common risk haplotype for SLE (P = 1.7 × 10−5). According to conditional logistic regression analysis the SNP rs10181656 or rs7582694 accounts for all of the observed association signal. By quantitative analysis of the allelic expression of STAT4 we found that the risk allele of STAT4 was over-expressed in primary human cells of mesenchymal origin, but not in B-cells, and that the risk allele of STAT4 was over-expressed (P = 8.4 × 10−5) in cells carrying the risk haplotype for SLE compared with cells with a non-risk haplotype. The risk allele of the SNP rs7582694 in STAT4 correlated to production of anti-dsDNA (double-stranded DNA) antibodies and displayed a multiplicatively increased, 1.82-fold risk of SLE with two independent risk alleles of the IRF5 (interferon regulatory factor 5) gene

    Chromosome 9p21 SNPs associated with multiple disease phenotypes correlate with ANRIL expression

    Get PDF
    Author Summary Genetic variants on chromosome 9p21 have been associated with several important diseases including coronary artery disease, diabetes, and multiple cancers. Most of the risk variants in this region do not alter any protein sequence and are therefore likely to act by influencing the expression of nearby genes. We investigated whether chromosome 9p21 variants are correlated with expression of the three nearest genes ( CDKN2A , CDKN2B , and ANRIL ) which might mediate the association with disease. Using two different techniques to study effects on expression in blood from two separate populations of healthy volunteers, we show that variants associated with disease are all correlated with ANRIL expression, but associations with the other two genes are weaker and less consistent. Multiple genetic variants are independently associated with expression of all three genes. Although total expression levels of CDKN2A , CDKN2B , and ANRIL are positively correlated, individual genetic variants influence ANRIL and CDKN2B expression in opposite directions, suggesting a possible role of ANRIL in CDKN2B regulation. Our study suggests that modulation of ANRIL expression mediates susceptibility to several important human diseases

    Hundreds of variants clustered in genomic loci and biological pathways affect human height

    Get PDF
    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

    Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells

    Get PDF
    Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+^{+} monocytes, CD16+^{+} neutrophils, and naive CD4+^{+} T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis\textit{cis}-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.This work was predominantly funded by the EU FP7 High Impact Project BLUEPRINT (HEALTH-F5-2011-282510) and the Canadian Institutes of Health Research (CIHR EP1-120608). The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 282510 (BLUEPRINT), the European Molecular Biology Laboratory, the Max Planck society, the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’, SEV-2012-0208 and Spanish National Bioinformatics Institute (INB-ISCIII) PT13/0001/0021 co-funded by FEDER "“Una Manera de hacer Europa”. D.G. is supported by a “la Caixa”-Severo Ochoa pre-doctoral fellowship, M.F. was supported by the BHF Cambridge Centre of Excellence [RE/13/6/30180], K.D. is funded as a HSST trainee by NHS Health Education England, S.E. is supported by a fellowship from La Caixa, V.P. is supported by a FEBS long-term fellowship and N.S.'s research is supported by the Wellcome Trust (Grant Codes WT098051 and WT091310), the EU FP7 (EPIGENESYS Grant Code 257082 and BLUEPRINT Grant Code HEALTH-F5-2011-282510) and the NIHR BRC. The Blood and Transplant Unit (BTRU) in Donor Health and Genomics is part of and funded by the National Institute for Health Research (NIHR) and is a partnership between the University of Cambridge and NHS Blood and Transplant (NHSBT) in collaboration with the University of Oxford and the Wellcome Trust Sanger Institute. The T-cell data was produced by the McGill Epigenomics Mapping Centre (EMC McGill). It is funded under the Canadian Epigenetics, Environment, and Health Research Consortium (CEEHRC) by the Canadian Institutes of Health Research and by Genome Quebec (CIHR EP1-120608), with additional support from Genome Canada and FRSQ. T.P. holds a Canada Research Chair

    Association between Variants of the Leptin Receptor Gene (LEPR) and Overweight: A Systematic Review and an Analysis of the CoLaus Study

    Get PDF
    BACKGROUND: Three non-synonymous single nucleotide polymorphisms (Q223R, K109R and K656N) of the leptin receptor gene (LEPR) have been tested for association with obesity-related outcomes in multiple studies, showing inconclusive results. We performed a systematic review and meta-analysis on the association of the three LEPR variants with BMI. In addition, we analysed 15 SNPs within the LEPR gene in the CoLaus study, assessing the interaction of the variants with sex. METHODOLOGY/PRINCIPAL FINDINGS: We searched electronic databases, including population-based studies that investigated the association between LEPR variants Q223R, K109R and K656N and obesity- related phenotypes in healthy, unrelated subjects. We furthermore performed meta-analyses of the genotype and allele frequencies in case-control studies. Results were stratified by SNP and by potential effect modifiers. CoLaus data were analysed by logistic and linear regressions and tested for interaction with sex. The meta-analysis of published data did not show an overall association between any of the tested LEPR variants and overweight. However, the choice of a BMI cut-off value to distinguish cases from controls was crucial to explain heterogeneity in Q223R. Differences in allele frequencies across ethnic groups are compatible with natural selection of derived alleles in Q223R and K109R and of the ancient allele in K656N in Asians. In CoLaus, the rs10128072, rs3790438 and rs3790437 variants showed interaction with sex for their association with overweight, waist circumference and fat mass in linear regressions. CONCLUSIONS: Our systematic review and analysis of primary data from the CoLaus study did not show an overall association between LEPR SNPs and overweight. Most studies were underpowered to detect small effect sizes. A potential effect modification by sex, population stratification, as well as the role of natural selection should be addressed in future genetic association studies

    Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants.

    Get PDF
    Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.This work was supported by a Canadian Institute of Health Research (CIHR) team grant awarded to E.G., A.T., M.C.V. and M.L. (TEC-128093) and the CIHR funded Epigeneome Mapping Centre at McGill University (EP1-120608) awarded to T.P. and M.L., and the Swedish Research Council, Knut and Alice Wallenberg Foundation and the Torsten Söderberg Foundation awarded to L.R. F.A. holds studentship from The Research Institute of the McGill University Health Center (MUHC). F.G. is a recipient of a research fellowship award from the Heart and Stroke Foundation of Canada. A.T. is the director of a Research Chair in Bariatric and Metabolic Surgery. M.C.V. is the recipient of the Canada Research Chair in Genomics Applied to Nutrition and Health (Tier 1). E.G. and T.P. are recipients of a Canada Research Chair Tier 2 award. The MuTHER Study was funded by a programme grant from the Wellcome Trust (081917/Z/07/Z) and core funding for the Wellcome Trust Centre for Human Genetics (090532). TwinsUK was funded by the Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013). The study also receives support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. T.D.S. is a holder of an ERC Advanced Principal Investigator award. SNP genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH/CIDR. Finally, we thank the NIH Roadmap Epigenomics Consortium and the Mapping Centers (http://nihroadmap.nih.gov/epigenomics/) for the production of publicly available reference epigenomes. Specifically, we thank the mapping centre at MGH/BROAD for generation of human adipose reference epigenomes used in this study.This is the final version. It was first published by NPG at http://www.nature.com/ncomms/2015/150529/ncomms8211/full/ncomms8211.html#abstrac
    corecore