Search CORE

Lancaster E-Prints

A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization

Author: Barnes Michael R.
Gagliano Sarah A.
Knight Jo
Weale Michael E.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals ("hits") to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data

Directory of Open Access Journals

Lancaster E-Prints

FigShare

Assessing models for genetic prediction of complex traits:a comparison of visualization and quantitative methods

Author: Gagliano Sarah A.
Knight Jo
Paterson Andrew D.
Weale Michael E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/05/2015
Field of study

BACKGROUND: In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models. METHODS: We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not. RESULTS: We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores

Springer - Publisher Connector

Lancaster E-Prints

Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies

Author: Adaikalavan Ramasamy
Colin Smith
Daniah Trabzuni
John Hardy
Michael Weale
Mina Ryten
Robert Walker
Sabaena Imran
Trabzuni
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

We are building an open-access database of regional human brain expression designed to allow the genome-wide assessment of genetic variability on expression. Array and RNA sequencing technologies make assessment of genome-wide expression possible. Human brain tissue is a challenging source for this work because it can only be obtained several and variable hours post-mortem and after varying agonal states. These variables alter RNA integrity in a complex manner. In this report, we assess the effect of post-mortem delay, agonal state and age on gene expression, and the utility of pH and RNA integrity number as predictors of gene expression as measured on 1266 Affymetrix Exon Arrays. We assessed the accuracy of the array data using QuantiGene, as an independent non-PCR-based method. These quality control parameters will allow database users to assess data accuracy. We report that within the parameters of this study post-mortem delay, agonal state and age have little impact on array quality, array data are robust to variable RNA integrity, and brain pH has only a small effect on array performance. QuantiGene gave very similar expression profiles as array data. This study is the first step in our initiative to make human, regional brain expression freely available

Edinburgh Research Explorer

Delta-Centralization Fails to Control for Population Stratification in Genetic Association Studies

Author: Cathryn M. Lewis
Michael E. Weale
Tony Dadd
Publication venue: 'S. Karger AG'
Publication date
Field of study

Little genetic differentiation as assessed by uniparental markers in the presence of substantial language variation in peoples of the Cross River region of Nigeria

Author: Bradman Neil
Connell Bruce A
Mendell Nancy R
Plaster Christopher A
Pour Naser Ansari
Powell Adam
Thomas Mark G
Veeramah Krishna R
Weale Michael E
Zeitlyn David
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The Cross River region in Nigeria is an extremely diverse area linguistically with over 60 distinct languages still spoken today. It is also a region of great historical importance, being a) adjacent to the likely homeland from which Bantu-speaking people migrated across most of sub-Saharan Africa 3000-5000 years ago and b) the location of Calabar, one of the largest centres during the Atlantic slave trade. Over 1000 DNA samples from 24 clans representing speakers of the six most prominent languages in the region were collected and typed for Y-chromosome (SNPs and microsatellites) and mtDNA markers (Hypervariable Segment 1) in order to examine whether there has been substantial gene flow between groups speaking different languages in the region. In addition the Cross River region was analysed in the context of a larger geographical scale by comparison to bordering Igbo speaking groups as well as neighbouring Cameroon populations and more distant Ghanaian communities. Results The Cross River region was shown to be extremely homogenous for both Y-chromosome and mtDNA markers with language spoken having no noticeable effect on the genetic structure of the region, consistent with estimates of inter-language gene flow of 10% per generation based on sociological data. However the groups in the region could clearly be differentiated from others in Cameroon and Ghana (and to a lesser extent Igbo populations). Significant correlations between genetic distance and both geographic and linguistic distance were observed at this larger scale. Conclusions Previous studies have found significant correlations between genetic variation and language in Africa over large geographic distances, often across language families. However the broad sampling strategies of these datasets have limited their utility for understanding the relationship within language families. This is the first study to show that at very fine geographic/linguistic scales language differences can be maintained in the presence of substantial gene flow over an extended period of time and demonstrates the value of dense sampling strategies and having DNA of known and detailed provenance, a practice that is generally rare when investigating sub-Saharan African demographic processes using genetic data.</p

Springer - Publisher Connector

Directory of Open Access Journals

eScholarship - University of California

Oxford University Research Archive

MPG.PuRe

Analysis of subcellular RNA fractions demonstrates significant genetic regulation of gene expression in human brain post-transcriptionally

Author: Botía Juan A
D'Sa Karishma
Guelfi Sebastian
Hardy John
Reynolds Regina H
Ryten Mina
Small Kerrin S
Taliun Sarah A Gagliano
Vandrovcova Jana
Weale Michael E
Zhang David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2023
Field of study

Gaining insight into the genetic regulation of gene expression in human brain is key to the interpretation of genome-wide association studies for major neurological and neuropsychiatric diseases. Expression quantitative trait loci (eQTL) analyses have largely been used to achieve this, providing valuable insights into the genetic regulation of steady-state RNA in human brain, but not distinguishing between molecular processes regulating transcription and stability. RNA quantification within cellular fractions can disentangle these processes in cell types and tissues which are challenging to model in vitro. We investigated the underlying molecular processes driving the genetic regulation of gene expression specific to a cellular fraction using allele-specific expression (ASE). Applying ASE analysis to genomic and transcriptomic data from paired nuclear and cytoplasmic fractions of anterior prefrontal cortex, cerebellar cortex and putamen tissues from 4 post-mortem neuropathologically-confirmed control human brains, we demonstrate that a significant proportion of genetic regulation of gene expression occurs post-transcriptionally in the cytoplasm, with genes undergoing this form of regulation more likely to be synaptic. These findings have implications for understanding the structure of gene expression regulation in human brain, and importantly the interpretation of rapidly growing single-nucleus brain RNA-sequencing and eQTL datasets, where cytoplasm-specific regulatory events could be missed

Central Archive at the University of Reading

Recommended from our members

Investigating the utility of human embryonic stem cell-derived neurons to model ageing and neurodegenerative disease using whole-genome gene expression and splicing analysis

Author: Chandran Siddharthan
Hardingham Giles E.
Hardy John
Lewis Patrick A
Patani Rickie
Puddifoot Clare A.
Ryten Mina
Smith Colin
Trabzuni Daniah
Walker Robert
Weale Michael
Wyllie David J. A.
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

A major goal in regenerative medicine is the predictable manipulation of human embryonic stem cells (hESCs) to defined cell fates that faithfully represent their somatic counterparts. Directed differentiation of hESCs into neuronal populations has galvanized much interest into their potential application in modelling neurodegenerative disease. However, neurodegenerative diseases are age-related, and therefore establishing the maturational comparability of hESC-derived neural derivatives is critical to generating accurate in vitro model systems. We address this issue by comparing genome-wide, exon-specific expression analyses of pluripotent hESCs, multipotent neural precursor cells and a terminally differentiated enriched neuronal population to expression data from post-mortem foetal and adult human brain samples. We show that hESC-derived neuronal cultures (using a midbrain differentiation protocol as a prototypic example of lineage restriction), while successful in generating physiologically functional neurons, are closer to foetal than adult human brain in terms of molecular maturation. These findings suggest that developmental stage has a more dominant influence on the cellular transcriptome than regional identity. In addition, we demonstrate that developmentally regulated gene splicing is common, and potentially a more sensitive measure of maturational state than gene expression profiling alone. In summary, this study highlights the value of genomic indices in refining and validating optimal cell populations appropriate for modelling ageing and neurodegeneration

Edinburgh Research Explorer

Integrated polygenic tool substantially enhances coronary artery disease prediction

Author: Ashley Euan A
Deanfield John
Donnelly Peter
Griffiths Jonathan A
Hippisley-Cox Julia
Hunter David J
Krapohl Eva
Lachapelle Alexander S
Moore Rachel
O'Sullivan Jack W
Plagnol Vincent
Riveros-Mckay Fernando
Saffari Ayden
Selzam Saskia
Sivley R Michael
Spencer Chris CA
Sørensen Peter
Tarran William A
Weale Michael E
Publication venue: American Heart Association
Publication date: 02/03/2021
Field of study

Background: There is considerable interest in whether genetic data can be used to improve standard cardiovascular disease risk calculators, as the latter are routinely used in clinical practice to manage preventative treatment. Methods: Using the UK Biobank resource, we developed our own polygenic risk score for coronary artery disease (CAD). We used an additional 60 000 UK Biobank individuals to develop an integrated risk tool (IRT) that combined our polygenic risk score with established risk tools (either the American Heart Association/American College of Cardiology pooled cohort equations [PCE] or UK QRISK3), and we tested our IRT in an additional, independent set of 186 451 UK Biobank individuals. Results: The novel CAD polygenic risk score shows superior predictive power for CAD events, compared with other published polygenic risk scores, and is largely uncorrelated with PCE and QRISK3. When combined with PCE into an IRT, it has superior predictive accuracy. Overall, 10.4% of incident CAD cases were misclassified as low risk by PCE and correctly classified as high risk by the IRT, compared with 4.4% misclassified by the IRT and correctly classified by PCE. The overall net reclassification improvement for the IRT was 5.9% (95% CI, 4.7–7.0). When individuals were stratified into age-by-sex subgroups, the improvement was larger for all subgroups (range, 8.3%–15.4%), with the best performance in 40- to 54-year-old men (15.4% [95% CI, 11.6–19.3]). Comparable results were found using a different risk tool (QRISK3) and also a broader definition of cardiovascular disease. Use of the IRT is estimated to avoid up to 12 000 deaths in the United States over a 5-year period. Conclusions: An IRT that includes polygenic risk outperforms current risk stratification tools and offers greater opportunity for early interventions. Given the plummeting costs of genetic tests, future iterations of CAD risk tools would be enhanced with the addition of a person’s polygenic risk