188 research outputs found

    Using Functional Annotation for the Empirical Determination of Bayes Factors for Genome-Wide Association Study Analysis

    Get PDF
    A genome wide association study (GWAS) typically results in a few highly significant ‘hits’ and a much larger set of suggestive signals (‘near-hits’). The latter group are expected to be a mixture of true and false associations. One promising strategy to help separate these is to use functional annotations for prioritisation of variants for follow-up. A key task is to determine which annotations might prove most valuable. We address this question by examining the functional annotations of previously published GWAS hits. We explore three annotation categories: non-synonymous SNPs (nsSNPs), promoter SNPs and cis expression quantitative trait loci (eQTLs) in open chromatin regions. We demonstrate that GWAS hit SNPs are enriched for these three functional categories, and that it would be appropriate to provide a higher weighting for such SNPs when performing Bayesian association analyses. For GWAS studies, our analyses suggest the use of a Bayes Factor of about 4 for cis eQTL SNPs within regions of open chromatin, 3 for nsSNPs and 2 for promoter SNPs

    A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization

    Get PDF
    The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals ("hits") to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data

    Assessing models for genetic prediction of complex traits:a comparison of visualization and quantitative methods

    Get PDF
    BACKGROUND: In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models. METHODS: We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not. RESULTS: We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores

    Little genetic differentiation as assessed by uniparental markers in the presence of substantial language variation in peoples of the Cross River region of Nigeria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Cross River region in Nigeria is an extremely diverse area linguistically with over 60 distinct languages still spoken today. It is also a region of great historical importance, being a) adjacent to the likely homeland from which Bantu-speaking people migrated across most of sub-Saharan Africa 3000-5000 years ago and b) the location of Calabar, one of the largest centres during the Atlantic slave trade. Over 1000 DNA samples from 24 clans representing speakers of the six most prominent languages in the region were collected and typed for Y-chromosome (SNPs and microsatellites) and mtDNA markers (Hypervariable Segment 1) in order to examine whether there has been substantial gene flow between groups speaking different languages in the region. In addition the Cross River region was analysed in the context of a larger geographical scale by comparison to bordering Igbo speaking groups as well as neighbouring Cameroon populations and more distant Ghanaian communities.</p> <p>Results</p> <p>The Cross River region was shown to be extremely homogenous for both Y-chromosome and mtDNA markers with language spoken having no noticeable effect on the genetic structure of the region, consistent with estimates of inter-language gene flow of 10% per generation based on sociological data. However the groups in the region could clearly be differentiated from others in Cameroon and Ghana (and to a lesser extent Igbo populations). Significant correlations between genetic distance and both geographic and linguistic distance were observed at this larger scale.</p> <p>Conclusions</p> <p>Previous studies have found significant correlations between genetic variation and language in Africa over large geographic distances, often across language families. However the broad sampling strategies of these datasets have limited their utility for understanding the relationship within language families. This is the first study to show that at very fine geographic/linguistic scales language differences can be maintained in the presence of substantial gene flow over an extended period of time and demonstrates the value of dense sampling strategies and having DNA of known and detailed provenance, a practice that is generally rare when investigating sub-Saharan African demographic processes using genetic data.</p

    Analysis of subcellular RNA fractions demonstrates significant genetic regulation of gene expression in human brain post-transcriptionally

    Get PDF
    Gaining insight into the genetic regulation of gene expression in human brain is key to the interpretation of genome-wide association studies for major neurological and neuropsychiatric diseases. Expression quantitative trait loci (eQTL) analyses have largely been used to achieve this, providing valuable insights into the genetic regulation of steady-state RNA in human brain, but not distinguishing between molecular processes regulating transcription and stability. RNA quantification within cellular fractions can disentangle these processes in cell types and tissues which are challenging to model in vitro. We investigated the underlying molecular processes driving the genetic regulation of gene expression specific to a cellular fraction using allele-specific expression (ASE). Applying ASE analysis to genomic and transcriptomic data from paired nuclear and cytoplasmic fractions of anterior prefrontal cortex, cerebellar cortex and putamen tissues from 4 post-mortem neuropathologically-confirmed control human brains, we demonstrate that a significant proportion of genetic regulation of gene expression occurs post-transcriptionally in the cytoplasm, with genes undergoing this form of regulation more likely to be synaptic. These findings have implications for understanding the structure of gene expression regulation in human brain, and importantly the interpretation of rapidly growing single-nucleus brain RNA-sequencing and eQTL datasets, where cytoplasm-specific regulatory events could be missed

    Genetic evidence for a pathogenic role for the vitamin D3 metabolizing enzyme CYP24A1 in multiple sclerosis

    Get PDF
    Background: Multiple sclerosis (MS) is a common disease of the central nervous system and a major cause of disability amongst young adults. Genome-wide association studies have identified many novel susceptibility loci including rs2248359. We hypothesized that genotypes of this locus could increase the risk of MS by regulating expression of neighboring gene, CYP24A1 which encodes the enzyme responsible for initiating degradation of 1,25-dihydroxyvitamin D3. Methods: We investigated this hypothesis using paired gene expression and genotyping data from three independent datasets of neurologically healthy adults of European descent. The UK Brain Expression Consortium (UKBEC) consists of post-mortem samples across 10 brain regions originating from 134 individuals (1231 samples total). The North American Brain Expression Consortium (NABEC) consists of cerebellum and frontal cortex samples from 304 individuals (605 samples total). The brain dataset from Heinzen and colleagues consists of prefrontal cortex samples from 93 individuals. Additionally, we used gene network analysis to analyze UKBEC expression data to understand CYP24A1 function in human brain. Findings: The risk allele, rs2248359-C, is strongly associated with increased expression of CYP24A1 in frontal cortex (p-value=1.45×10−13), but not white matter. This association was replicated using data from NABEC (p-value=7.2×10−6) and Heinzen and colleagues (p-value=1.2×10−4). Network analysis shows a significant enrichment of terms related to immune response in eight out of the 10 brain regions. Interpretation: The known MS risk allele rs2248359-C increases CYP24A1 expression in human brain providing a genetic link between MS and vitamin D metabolism, and predicting that the physiologically active form of vitamin D3 is protective. Vitamin D3's involvement in MS may relate to its immunomodulatory functions in human brain. Finding: Medical Research Council UK; King Faisal Specialist Hospital and Research Centre, Saudi Arabia; Intramural Research Program of the National Institute on Aging, National Institutes of Health, USA

    Dense sampling of ethnic groups within African countries reveals fine-scale genetic structure and extensive historical admixture

    Get PDF
    Previous studies have highlighted how African genomes have been shaped by a complex series of historical events. Despite this, genome-wide data have only been obtained from a small proportion of present-day ethnolinguistic groups. By analyzing new autosomal genetic variation data of 1333 individuals from over 150 ethnic groups from Cameroon, Republic of the Congo, Ghana, Nigeria, and Sudan, we demonstrate a previously underappreciated fine-scale level of genetic structure within these countries, for example, correlating with historical polities in western Cameroon. By comparing genetic variation patterns among populations, we infer that many northern Cameroonian and Sudanese groups share genetic links with multiple geographically disparate populations, likely resulting from long-distance migrations. In Ghana and Nigeria, we infer signatures of intermixing dated to over 2000 years ago, corresponding to reports of environmental transformations possibly related to climate change. We also infer recent intermixing signals in multiple African populations, including Congolese, that likely relate to the expansions of Bantu language-speaking peoples
    corecore