262 research outputs found
Using Functional Annotation for the Empirical Determination of Bayes Factors for Genome-Wide Association Study Analysis
A genome wide association study (GWAS) typically results in a few highly
significant ‘hits’ and a much larger set of suggestive signals
(‘near-hits’). The latter group are expected to be a mixture of true
and false associations. One promising strategy to help separate these is to use
functional annotations for prioritisation of variants for follow-up. A key task
is to determine which annotations might prove most valuable. We address this
question by examining the functional annotations of previously published GWAS
hits. We explore three annotation categories: non-synonymous SNPs (nsSNPs),
promoter SNPs and cis expression quantitative trait loci
(eQTLs) in open chromatin regions. We demonstrate that GWAS hit SNPs are
enriched for these three functional categories, and that it would be appropriate
to provide a higher weighting for such SNPs when performing Bayesian association
analyses. For GWAS studies, our analyses suggest the use of a Bayes Factor of
about 4 for cis eQTL SNPs within regions of open chromatin, 3
for nsSNPs and 2 for promoter SNPs
A Bayesian method to incorporate hundreds of functional characteristics with association evidence to improve variant prioritization
The increasing quantity and quality of functional genomic information motivate the assessment and integration of these data with association data, including data originating from genome-wide association studies (GWAS). We used previously described GWAS signals ("hits") to train a regularized logistic model in order to predict SNP causality on the basis of a large multivariate functional dataset. We show how this model can be used to derive Bayes factors for integrating functional and association data into a combined Bayesian analysis. Functional characteristics were obtained from the Encyclopedia of DNA Elements (ENCODE), from published expression quantitative trait loci (eQTL), and from other sources of genome-wide characteristics. We trained the model using all GWAS signals combined, and also using phenotype specific signals for autoimmune, brain-related, cancer, and cardiovascular disorders. The non-phenotype specific and the autoimmune GWAS signals gave the most reliable results. We found SNPs with higher probabilities of causality from functional characteristics showed an enrichment of more significant p-values compared to all GWAS SNPs in three large GWAS studies of complex traits. We investigated the ability of our Bayesian method to improve the identification of true causal signals in a psoriasis GWAS dataset and found that combining functional data with association data improves the ability to prioritise novel hits. We used the predictions from the penalized logistic regression model to calculate Bayes factors relating to functional characteristics and supply these online alongside resources to integrate these data with association data
Assessing models for genetic prediction of complex traits:a comparison of visualization and quantitative methods
BACKGROUND: In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models. METHODS: We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not. RESULTS: We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores
Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies
We are building an open-access database of regional human brain expression designed to allow the genome-wide assessment of genetic variability on expression. Array and RNA sequencing technologies make assessment of genome-wide expression possible. Human brain tissue is a challenging source for this work because it can only be obtained several and variable hours post-mortem and after varying agonal states. These variables alter RNA integrity in a complex manner. In this report, we assess the effect of post-mortem delay, agonal state and age on gene expression, and the utility of pH and RNA integrity number as predictors of gene expression as measured on 1266 Affymetrix Exon Arrays. We assessed the accuracy of the array data using QuantiGene, as an independent non-PCR-based method. These quality control parameters will allow database users to assess data accuracy. We report that within the parameters of this study post-mortem delay, agonal state and age have little impact on array quality, array data are robust to variable RNA integrity, and brain pH has only a small effect on array performance. QuantiGene gave very similar expression profiles as array data. This study is the first step in our initiative to make human, regional brain expression freely available
Little genetic differentiation as assessed by uniparental markers in the presence of substantial language variation in peoples of the Cross River region of Nigeria
<p>Abstract</p> <p>Background</p> <p>The Cross River region in Nigeria is an extremely diverse area linguistically with over 60 distinct languages still spoken today. It is also a region of great historical importance, being a) adjacent to the likely homeland from which Bantu-speaking people migrated across most of sub-Saharan Africa 3000-5000 years ago and b) the location of Calabar, one of the largest centres during the Atlantic slave trade. Over 1000 DNA samples from 24 clans representing speakers of the six most prominent languages in the region were collected and typed for Y-chromosome (SNPs and microsatellites) and mtDNA markers (Hypervariable Segment 1) in order to examine whether there has been substantial gene flow between groups speaking different languages in the region. In addition the Cross River region was analysed in the context of a larger geographical scale by comparison to bordering Igbo speaking groups as well as neighbouring Cameroon populations and more distant Ghanaian communities.</p> <p>Results</p> <p>The Cross River region was shown to be extremely homogenous for both Y-chromosome and mtDNA markers with language spoken having no noticeable effect on the genetic structure of the region, consistent with estimates of inter-language gene flow of 10% per generation based on sociological data. However the groups in the region could clearly be differentiated from others in Cameroon and Ghana (and to a lesser extent Igbo populations). Significant correlations between genetic distance and both geographic and linguistic distance were observed at this larger scale.</p> <p>Conclusions</p> <p>Previous studies have found significant correlations between genetic variation and language in Africa over large geographic distances, often across language families. However the broad sampling strategies of these datasets have limited their utility for understanding the relationship within language families. This is the first study to show that at very fine geographic/linguistic scales language differences can be maintained in the presence of substantial gene flow over an extended period of time and demonstrates the value of dense sampling strategies and having DNA of known and detailed provenance, a practice that is generally rare when investigating sub-Saharan African demographic processes using genetic data.</p
Analysis of subcellular RNA fractions demonstrates significant genetic regulation of gene expression in human brain post-transcriptionally
Gaining insight into the genetic regulation of gene expression in human brain is key to the interpretation of genome-wide association studies for major neurological and neuropsychiatric diseases. Expression quantitative trait loci (eQTL) analyses have largely been used to achieve this, providing valuable insights into the genetic regulation of steady-state RNA in human brain, but not distinguishing between molecular processes regulating transcription and stability. RNA quantification within cellular fractions can disentangle these processes in cell types and tissues which are challenging to model in vitro. We investigated the underlying molecular processes driving the genetic regulation of gene expression specific to a cellular fraction using allele-specific expression (ASE). Applying ASE analysis to genomic and transcriptomic data from paired nuclear and cytoplasmic fractions of anterior prefrontal cortex, cerebellar cortex and putamen tissues from 4 post-mortem neuropathologically-confirmed control human brains, we demonstrate that a significant proportion of genetic regulation of gene expression occurs post-transcriptionally in the cytoplasm, with genes undergoing this form of regulation more likely to be synaptic. These findings have implications for understanding the structure of gene expression regulation in human brain, and importantly the interpretation of rapidly growing single-nucleus brain RNA-sequencing and eQTL datasets, where cytoplasm-specific regulatory events could be missed
Recommended from our members
Investigating the utility of human embryonic stem cell-derived neurons to model ageing and neurodegenerative disease using whole-genome gene expression and splicing analysis
A major goal in regenerative medicine is the predictable manipulation of human embryonic stem cells (hESCs) to defined cell fates that faithfully represent their somatic counterparts. Directed differentiation of hESCs into neuronal populations has galvanized much interest into their potential application in modelling neurodegenerative disease. However, neurodegenerative diseases are age-related, and therefore establishing the maturational comparability of hESC-derived neural derivatives is critical to generating accurate in vitro model systems. We address this issue by comparing genome-wide, exon-specific expression analyses of pluripotent hESCs, multipotent neural precursor cells and a terminally differentiated enriched neuronal population to expression data from post-mortem foetal and adult human brain samples. We show that hESC-derived neuronal cultures (using a midbrain differentiation protocol as a prototypic example of lineage restriction), while successful in generating physiologically functional neurons, are closer to foetal than adult human brain in terms of molecular maturation. These findings suggest that developmental stage has a more dominant influence on the cellular transcriptome than regional identity. In addition, we demonstrate that developmentally regulated gene splicing is common, and potentially a more sensitive measure of maturational state than gene expression profiling alone. In summary, this study highlights the value of genomic indices in refining and validating optimal cell populations appropriate for modelling ageing and neurodegeneration
Integrated polygenic tool substantially enhances coronary artery disease prediction
Background: There is considerable interest in whether genetic data can be used to improve standard cardiovascular disease risk calculators, as the latter are routinely used in clinical practice to manage preventative treatment.
Methods: Using the UK Biobank resource, we developed our own polygenic risk score for coronary artery disease (CAD). We used an additional 60 000 UK Biobank individuals to develop an integrated risk tool (IRT) that combined our polygenic risk score with established risk tools (either the American Heart Association/American College of Cardiology pooled cohort equations [PCE] or UK QRISK3), and we tested our IRT in an additional, independent set of 186 451 UK Biobank individuals.
Results: The novel CAD polygenic risk score shows superior predictive power for CAD events, compared with other published polygenic risk scores, and is largely uncorrelated with PCE and QRISK3. When combined with PCE into an IRT, it has superior predictive accuracy. Overall, 10.4% of incident CAD cases were misclassified as low risk by PCE and correctly classified as high risk by the IRT, compared with 4.4% misclassified by the IRT and correctly classified by PCE. The overall net reclassification improvement for the IRT was 5.9% (95% CI, 4.7–7.0). When individuals were stratified into age-by-sex subgroups, the improvement was larger for all subgroups (range, 8.3%–15.4%), with the best performance in 40- to 54-year-old men (15.4% [95% CI, 11.6–19.3]). Comparable results were found using a different risk tool (QRISK3) and also a broader definition of cardiovascular disease. Use of the IRT is estimated to avoid up to 12 000 deaths in the United States over a 5-year period.
Conclusions: An IRT that includes polygenic risk outperforms current risk stratification tools and offers greater opportunity for early interventions. Given the plummeting costs of genetic tests, future iterations of CAD risk tools would be enhanced with the addition of a person’s polygenic risk
- …