383 research outputs found

    Commentary: Tobacco consumption and body weight: Mendelian randomization across a range of exposure.

    Get PDF
    Tobacco consumption is consistently associated with reduced body weight, creating an incentive to initiate smoking and a disincentive to cease, although the health risks associated with the habit outweigh the benefits of reduced weight. Among smokers however, increasing con-sumption has been associated with increased body weight. To determine whether this contradiction reflects causal processes, Winsløw et al.1 have applied Mendelian ran-domization (MR) in testing the association of a genetic variant, rs1051730 in CHRNA3, with measures of body weight among 80 342 members of the Copenhagen General Population Study. Among smokers, each minor (T) allele carried was associated with an increase of about one cigarette per day, but with a decrease in several meas-ures of body weight, in contrast to the observational re

    Empirical Bayes factors for common hypothesis tests

    Full text link
    Bayes factors for composite hypotheses have difficulty in encoding vague prior knowledge, leading to conflicts between objectivity and sensitivity including the Jeffreys-Lindley paradox. To address these issues we revisit the posterior Bayes factor, in which the posterior distribution from the data at hand is re-used in the Bayes factor for the same data. We argue that this is biased when calibrated against proper Bayes factors, but propose bias adjustments to allow interpretation on the same scale. In the important case of a regular normal model, the bias in log scale is half the number of parameters. The resulting empirical Bayes factor is closely related to the widely applicable information criterion. We develop test-based empirical Bayes factors for several standard tests and propose an extension to multiple testing closely related to the optimal discovery procedure. When only a P-value is available, such as in non-parametric tests, we obtain a Bayes factor calibration of 10p. We propose interpreting the strength of Bayes factors on a logarithmic scale with base 3.73, reflecting the sharpest distinction between weaker and stronger belief. Empirical Bayes factors are a frequentist-Bayesian compromise expressing an evidential view of hypothesis testing

    Estimation of significance thresholds for genomewide association scans

    Get PDF
    The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 × 10−8. To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population

    Candidate gene-environment interactions in breast cancer.

    Get PDF
    Gene-environment interactions have the potential to shed light on biological processes leading to disease, identify individuals for whom risk factors are most relevant, and improve the accuracy of epidemiological risk models. We review the progress that has been made in investigating gene-environment interactions in the field of breast cancer. Although several large-scale analyses have been carried out, only a few significant interactions have been reported. One of these, an interaction between CASP8-rs1045485 and alcohol consumption has been replicated, but others have not, including LSP1- rs3817198 and parity, and 1p11.2-rs11249433 and ever being parous. False positive interactions may arise if the gene and environment are correlated and the causal variant is less frequent than the tag SNP. We conclude that while much progress has been made in this area it is still too soon to tell whether gene-environment interactions will fulfil their promise. Before we can make this assessment we will need to replicate (or refute) the reported interactions, identify the causal variants that underlie tag-SNP associations and validate the next generation of epidemiological risk models

    Accuracy of Gene Scores when Pruning Markers by Linkage Disequilibrium.

    Get PDF
    OBJECTIVE: Gene scores are often used to model the combined effects of genetic variants. When variants are in linkage disequilibrium, it is common to prune all variants except the most strongly associated. This avoids duplicating information but discards information when variants have independent effects. However, joint modelling of correlated variants increases the sampling error in the gene score. In recent applications, joint modelling has offered only small improvements in accuracy over pruning. We aimed to quantify the relationship between pruning and joint modelling in relation to sample size. METHODS: We derived the coefficient of determination R2 for a gene score constructed from pruned markers, and for one constructed from correlated markers with jointly estimated effects. RESULTS: Pruned scores tend to have slightly lower R2 than jointly modelled scores, but the differences are small at sample sizes up to 100,000. If the proportion of correlated variants is high, joint modelling can obtain modest improvements asymptotically. CONCLUSIONS: The small gains observed to date from joint modelling can be explained by sample size. As studies become larger, joint modelling will be useful for traits affected by many correlated variants, but the improvements may remain small. Pruning remains a useful heuristic for current studies

    Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia.

    Get PDF
    BACKGROUND: Genome-wide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets.Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a main-effect parameter for each SNP; one with a parameter for each haplotype. RESULTS: We show the haplotype model has higher power for rare untyped causal SNPs, the main-effects model has higher power for common untyped causal SNPs, and the proposed model generally has power in between the two others. We show that the relative power of the three methods is dependent on the number of marker haplotypes the causal allele is present on, which depends on the age of the mutation. Except in the case of a common causal variant in high LD with markers, all three multimarker models are superior in power to single-SNP tests.Including the adjacent statistical interactions results in lower inflation in test statistics when a realistic level of population stratification is present in a dataset.Using the multimarker models, we analyse data from the Molecular Genetics of Schizophrenia study. The multimarker models find potential associations that are not found by single-SNP tests. However, multimarker models also require stricter control of data quality since biases can have a larger inflationary effect on multimarker test statistics than on single-SNP test statistics. CONCLUSIONS: Analysing a GWAS with multimarker models can yield candidate regions which may contain rare untyped causal variants. This is useful for increasing prior odds of association in future whole-genome sequence analyses.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    A survey of current software for linkage analysis.

    Get PDF
    There is now a wide choice of software available for linkage analysis. The most well known packages are briefly reviewed here. The package with the most extensive range of analyses is GENEHUNTER, but for many of its functions there are other programs with better performance. These include FASTLINK and VITESSE for parametric analysis ALLEGRO and MERLIN for non-parametric analysis and SOLAR for variance components analysis. The computational limits of current approaches can be improved with SIMWALK2 and the promising new SUPERLINK program. Directions for future work include improved user interfaces and consensus formats for data input and exchange

    Criteria for evaluating risk prediction of multiple outcomes.

    Get PDF
    Risk prediction models have been developed in many contexts to classify individuals according to a single outcome, such as risk of a disease. Emerging "-omic" biomarkers provide panels of features that can simultaneously predict multiple outcomes from a single biological sample, creating issues of multiplicity reminiscent of exploratory hypothesis testing. Here I propose definitions of some basic criteria for evaluating prediction models of multiple outcomes. I define calibration in the multivariate setting and then distinguish between outcome-wise and individual-wise prediction, and within the latter between joint and panel-wise prediction. I give examples such as screening and early detection in which different senses of prediction may be more appropriate. In each case I propose definitions of sensitivity, specificity, concordance, positive and negative predictive value and relative utility. I link the definitions through a multivariate probit model, showing that the accuracy of a multivariate prediction model can be summarised by its covariance with a liability vector. I illustrate the concepts on a biomarker panel for early detection of eight cancers, and on polygenic risk scores for six common diseases

    The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background This paper describes an automated method for finding clusters of interconnected proteins in protein interaction networks and retrieving protein annotations associated with these clusters. Results Protein interaction graphs were separated into subgraphs of interconnected proteins, using the JUNG implementation of Girvan and Newman's Edge-Betweenness algorithm. Functions were sought for these subgraphs by detecting significant correlations with the distribution of Gene Ontology terms which had been used to annotate the proteins within each cluster. The method was implemented using freely available software (JUNG and the R statistical package). Protein clusters with significant correlations to functional annotations could be identified and included groups of proteins know to cooperate in cell metabolism. The method appears to be resilient against the presence of false positive interactions. Conclusion This method provides a useful tool for rapid screening of small to medium size protein interaction datasets.Published versio
    • …
    corecore