825 research outputs found

    Accuracy of Gene Scores when Pruning Markers by Linkage Disequilibrium.

    Get PDF
    OBJECTIVE: Gene scores are often used to model the combined effects of genetic variants. When variants are in linkage disequilibrium, it is common to prune all variants except the most strongly associated. This avoids duplicating information but discards information when variants have independent effects. However, joint modelling of correlated variants increases the sampling error in the gene score. In recent applications, joint modelling has offered only small improvements in accuracy over pruning. We aimed to quantify the relationship between pruning and joint modelling in relation to sample size. METHODS: We derived the coefficient of determination R2 for a gene score constructed from pruned markers, and for one constructed from correlated markers with jointly estimated effects. RESULTS: Pruned scores tend to have slightly lower R2 than jointly modelled scores, but the differences are small at sample sizes up to 100,000. If the proportion of correlated variants is high, joint modelling can obtain modest improvements asymptotically. CONCLUSIONS: The small gains observed to date from joint modelling can be explained by sample size. As studies become larger, joint modelling will be useful for traits affected by many correlated variants, but the improvements may remain small. Pruning remains a useful heuristic for current studies

    JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects.

    Get PDF
    Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re-analysis of published marginal summary statistics under joint multi-SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi-region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta-analysis of glucose and insulin related traits consortium) - a GWAS meta-analysis of more than 15,000 people. We re-analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index

    Tailored Bayes: a risk modeling framework under unequal misclassification costs.

    Get PDF
    Risk prediction models are a crucial tool in healthcare. Risk prediction models with a binary outcome (i.e., binary classification models) are often constructed using methodology which assumes the costs of different classification errors are equal. In many healthcare applications, this assumption is not valid, and the differences between misclassification costs can be quite large. For instance, in a diagnostic setting, the cost of misdiagnosing a person with a life-threatening disease as healthy may be larger than the cost of misdiagnosing a healthy person as a patient. In this article, we present Tailored Bayes (TB), a novel Bayesian inference framework which "tailors" model fitting to optimize predictive performance with respect to unbalanced misclassification costs. We use simulation studies to showcase when TB is expected to outperform standard Bayesian methods in the context of logistic regression. We then apply TB to three real-world applications, a cardiac surgery, a breast cancer prognostication task, and a breast cancer tumor classification task and demonstrate the improvement in predictive performance over standard methods

    A flexible and parallelizable approach to genome-wide polygenic risk scores.

    Get PDF
    The heritability of most complex traits is driven by variants throughout the genome. Consequently, polygenic risk scores, which combine information on multiple variants genome-wide, have demonstrated improved accuracy in genetic risk prediction. We present a new two-step approach to constructing genome-wide polygenic risk scores from meta-GWAS summary statistics. Local linkage disequilibrium (LD) is adjusted for in Step 1, followed by, uniquely, long-range LD in Step 2. Our algorithm is highly parallelizable since block-wise analyses in Step 1 can be distributed across a high-performance computing cluster, and flexible, since sparsity and heritability are estimated within each block. Inference is obtained through a formal Bayesian variable selection framework, meaning final risk predictions are averaged over competing models. We compared our method to two alternative approaches: LDPred and lassosum using all seven traits in the Welcome Trust Case Control Consortium as well as meta-GWAS summaries for type 1 diabetes (T1D), coronary artery disease, and schizophrenia. Performance was generally similar across methods, although our framework provided more accurate predictions for T1D, for which there are multiple heterogeneous signals in regions of both short- and long-range LD. With sufficient compute resources, our method also allows the fastest runtimes

    Why public health must contribute to reduce violence [Letter]

    Get PDF
    Florence and colleagues found that systematic collection, analysis, and use of anonymised emergency department recorded information on violence significantly reduced violence related hospital admissions.1 Although violence has long been a public health matter, there is ambivalence in making it a health priority. After all, crime reduction is the responsibility of government departments other than health, and other public services—the police, courts, and probation and prison services—are there to tackle it. It has been known for some time that police records underestimate serious community violence and that emergency department injury records are more representative. We suggest that data matching studies in Denmark and Norway indicate that the extent to which serious violence is under-ascertained by police services is consistent across Europe.2 3 Data matching (between emergency department and police violence records) from three north European countries has shown that on average the police record a third or less of violence that results in emergency department treatment.4 The main reasons for under-recording have been identified in the UK, and these factors may exert a remarkably similar influence across national boundaries. There are important policy implications—that police data are a poor measure of serious violence; that health services provide information about violence that is not available elsewhere; and that violence in a city, for example, can be understood and targeted only if police and health data are combined. Such an approach, together with involvement of trauma service doctors in community safety partnerships, is proving effective, particularly by directing police activity.5 Public health and trauma services have much to contribute to violence prevention across Europe

    Development and External Validation of Prediction Models for 10-Year Survival of Invasive Breast Cancer. Comparison with PREDICT and CancerMath.

    Get PDF
    Purpose: To compare PREDICT and CancerMath, two widely used prognostic models for invasive breast cancer, taking into account their clinical utility. Furthermore, it is unclear whether these models could be improved.Experimental Design: A dataset of 5,729 women was used for model development. A Bayesian variable selection algorithm was implemented to stochastically search for important interaction terms among the predictors. The derived models were then compared in three independent datasets (n = 5,534). We examined calibration, discrimination, and performed decision curve analysis.Results: CancerMath demonstrated worse calibration performance compared with PREDICT in estrogen receptor (ER)-positive and ER-negative tumors. The decline in discrimination performance was -4.27% (-6.39 to -2.03) and -3.21% (-5.9 to -0.48) for ER-positive and ER-negative tumors, respectively. Our new models matched the performance of PREDICT in terms of calibration and discrimination, but offered no improvement. Decision curve analysis showed predictions for all models were clinically useful for treatment decisions made at risk thresholds between 5% and 55% for ER-positive tumors and at thresholds of 15% to 60% for ER-negative tumors. Within these threshold ranges, CancerMath provided the lowest clinical utility among all the models.Conclusions: Survival probabilities from PREDICT offer both improved accuracy and discrimination over CancerMath. Using PREDICT to make treatment decisions offers greater clinical utility than CancerMath over a range of risk thresholds. Our new models performed as well as PREDICT, but no better, suggesting that, in this setting, including further interaction terms offers no predictive benefit. Clin Cancer Res; 24(9); 2110-5. ©2018 AACR

    Insight into Genotype-Phenotype Associations through eQTL Mapping in Multiple Cell Types in Health and Immune-Mediated Disease

    Get PDF
    Genome-wide association studies (GWAS) have transformed our understanding of the genetics of complex traits such as autoimmune diseases, but how risk variants contribute to pathogenesis remains largely unknown. Identifying genetic variants that affect gene expression (expression quantitative trait loci, or eQTLs) is crucial to addressing this. eQTLs vary between tissues and following in vitro cellular activation, but have not been examined in the context of human inflammatory diseases. We performed eQTL mapping in five primary immune cell types from patients with active inflammatory bowel disease (n = 91), anti-neutrophil cytoplasmic antibody-associated vasculitis (n = 46) and healthy controls (n = 43), revealing eQTLs present only in the context of active inflammatory disease. Moreover, we show that following treatment a proportion of these eQTLs disappear. Through joint analysis of expression data from multiple cell types, we reveal that previous estimates of eQTL immune cell-type specificity are likely to have been exaggerated. Finally, by analysing gene expression data from multiple cell types, we find eQTLs not previously identified by database mining at 34 inflammatory bowel disease-associated loci. In summary, this parallel eQTL analysis in multiple leucocyte subsets from patients with active disease provides new insights into the genetic basis of immune-mediated diseases.This research was funded by a Wellcome Trust Clinical PhD Programme Fellowship (JEP), the NIH-Oxford-Cambridge Scholars Program (ACR), Wellcome Trust Grant 083650/Z/07/Z and MRC Grant MR/L19027/1 (KGCS), and the National Institute for Health Research Cambridge Biomedical Research Centre. KGCS is a National Institute for Health Research Senior Investigator. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Evaluation of stability of directly standardized rates for sparse data using simulation methods.

    Get PDF
    Background Directly standardized rates (DSRs) adjust for different age distributions in different populations and enable, say, the rates of disease between the populations to be directly compared. They are routinely published but there is concern that a DSR is not valid when it is based on a “small” number of events. The aim of this study was to determine the value at which a DSR should not be published when analyzing real data in England. Methods Standard Monte Carlo simulation techniques were used assuming the number of events in 19 age groups (i.e., 0–4, 5–9, ... 90+ years) follow independent Poisson distributions. The total number of events, age specific risks, and the population sizes in each age group were varied. For each of 10,000 simulations the DSR (using the 2013 European Standard Population weights), together with the coverage of three different methods (normal approximation, Dobson, and Tiwari modified gamma) of estimating the 95% confidence intervals (CIs), were calculated. Results The normal approximation was, as expected, not suitable for use when fewer than 100 events occurred. The Tiwari method and the Dobson method of calculating confidence intervals produced similar estimates and either was suitable when the expected or observed numbers of events were 10 or greater. The accuracy of the CIs was not influenced by the distribution of the events across categories (i.e., the degree of clustering, the age distributions of the sampling populations, and the number of categories with no events occurring in them). Conclusions DSRs should not be given when the total observed number of events is less than 10. The Dobson method might be considered the preferred method due to the formulae being simpler than that of the Tiwari method and the coverage being slightly more accurate
    • …
    corecore