37,780 research outputs found

    Bayesian model selection in logistic regression for the detection of adverse drug reactions

    Full text link
    Motivation: Spontaneous adverse event reports have a high potential for detecting adverse drug reactions. However, due to their dimension, exploring such databases requires statistical methods. In this context, disproportionality measures are used. However, by projecting the data onto contingency tables, these methods become sensitive to the problem of co-prescriptions and masking effects. Recently, logistic regressions have been used with a Lasso type penalty to perform the detection of associations between drugs and adverse events. However, the choice of the penalty value is open to criticism while it strongly influences the results. Results: In this paper, we propose to use a logistic regression whose sparsity is viewed as a model selection challenge. Since the model space is huge, a Metropolis-Hastings algorithm carries out the model selection by maximizing the BIC criterion. Thus, we avoid the calibration of penalty or threshold. During our application on the French pharmacovigilance database, the proposed method is compared to well established approaches on a reference data set, and obtains better rates of positive and negative controls. However, many signals are not detected by the proposed method. So, we conclude that this method should be used in parallel to existing measures in pharmacovigilance.Comment: 7 pages, 3 figures, submitted to Biometrical Journa

    Asymptotic inference for semiparametric association models

    Full text link
    Association models for a pair of random elements XX and YY (e.g., vectors) are considered which specify the odds ratio function up to an unknown parameter \bolds\theta. These models are shown to be semiparametric in the sense that they do not restrict the marginal distributions of XX and YY. Inference for the odds ratio parameter \bolds\theta may be obtained from sampling either YY conditionally on XX or vice versa. Generalizing results from Prentice and Pyke, Weinberg and Wacholder and Scott and Wild, we show that asymptotic inference for \bolds\theta under sampling conditional on YY is the same as if sampling had been conditional on XX. Common regression models, for example, generalized linear models with canonical link or multivariate linear, respectively, logistic models, are association models where the regression parameter \bolds\beta is closely related to the odds ratio parameter \bolds\theta. Hence inference for \bolds\beta may be drawn from samples conditional on YY using an association model.Comment: Published in at http://dx.doi.org/10.1214/07-AOS572 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the Bayes-optimality of F-measure maximizers

    Get PDF
    The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems
    • …
    corecore