110 research outputs found

    Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care

    Full text link
    In the field of quality of health care measurement, one approach to assessing patient sickness at admission involves a logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing classical variable selection methods to find an ``optimal'' subset of 10--20 indicators. Such ``benefit-only'' methods ignore the considerable differences among the sickness indicators in cost of data collection, an issue that is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the U.S. and U.K.) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness). When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which costly variables that do not predict well enough should be omitted from the final scale. In this paper (a) we develop a method for solving this problem based on posterior model odds, arising from a prior distribution that (1) accounts for the cost of each variable and (2) results in a set of posterior model probabilities that corresponds to a generalized cost-adjusted version of the Bayesian information criterion (BIC), and (b) we compare this method with a decision-theoretic cost-benefit approach based on maximizing expected utility. We use reversible-jump Markov chain Monte Carlo (RJMCMC) methods to search the model space, and we check the stability of our findings with two variants of the MCMC model composition (MC3\mathit{MC}^3) algorithm.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS207 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Concentration of personal and household crimes in England and Wales

    Get PDF
    Crime is disproportionally concentrated in few areas. Though long-established, there remains uncertainty about the reasons for variation in the concentration of similar crime (repeats) or different crime (multiples). Wholly neglected have been composite crimes when more than one crime types coincide as parts of a single event. The research reported here disentangles area crime concentration into repeats, multiple and composite crimes. The results are based on estimated bivariate zero-inflated Poisson regression models with covariance structure which explicitly account for crime rarity and crime concentration. The implications of the results for criminological theorizing and as a possible basis for more equitable police funding are discussed

    Objective Bayesian Edge Screening and Structure Selection for Ising Networks

    Get PDF
    The Ising model is one of the most widely analyzed graphical models in network psychometrics. However, popular approaches to parameter estimation and structure selection for the Ising model cannot naturally express uncertainty about the estimated parameters or selected structures. To address this issue, this paper offers an objective Bayesian approach to parameter estimation and structure selection for the Ising model. Our methods build on a continuous spike-and-slab approach. We show that our methods consistently select the correct structure and provide a new objective method to set the spike-and-slab hyperparameters. To circumvent the exploration of the complete structure space, which is too large in practical situations, we propose a novel approach that first screens for promising edges and then only explore the space instantiated by these edges. We apply our proposed methods to estimate the network of depression and alcohol use disorder symptoms from symptom scores of over 26,000 subjects. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11336-022-09848-8

    Bayesian epidemic models for spatially aggregated count data

    Get PDF
    Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so we utilise a general class of stochastic regression models appropriate for spatio-temporal count data with an excess number of zeros. The developed regression framework does incorporate serial correlation and time varying covariates through an Ornstein Uhlenbeck process formulation. In addition, we explore the effect of different priors, including default options and variations of mixtures of g-priors. The effect of different distance kernels for the epidemic model component is investigated. We proceed by developing branching process-based methods for testing scenarios for disease control, thus linking traditional epidemiological models with stochastic epidemic processes, useful in policy-focused decision making. The approach is illustrated with an application to a sheep pox dataset from the Evros region, Greece

    Methods and tools for Bayesian variable selection and model averaging in normal linear regression

    Get PDF
    In this paper, we briefly review the main methodological aspects concerned with the application of the Bayesian approach to model choice and model averaging in the context of variable selection in regression models. This includes prior elicitation, summaries of the posterior distribution and computational strategies. We then examine and compare various publicly available R-packages, summarizing and explaining the differences between packages and giving recommendations for applied users. We find that all packages reviewed (can) lead to very similar results, but there are potentially important differences in flexibility and efficiency of the packages

    WoMMBAT: A user interface for hierarchical Bayesian estimation of working memory capacity

    Get PDF
    The change detection paradigm has become an important tool for researchers studying working memory. Change detection is especially useful for studying visual working memory, because recall paradigms are difficult to employ in the visual modality. Pashler (Perception & Psychophysics, 44, 369–378, 1988) and Cowan (Behavioral and Brain Sciences, 24, 87–114, 2001) suggested formulas for estimating working memory capacity from change detection data. Although these formulas have become widely used, Morey (Journal of Mathematical Psychology, 55, 8–24, 2011) showed that the formulas suffer from a number of issues, including inefficient use of information, bias, volatility, uninterpretable parameter estimates, and violation of ANOVA assumptions. Morey presented a hierarchical Bayesian extension of Pashler’s and Cowan’s basic models that mitigates these issues. Here, we present WoMMBAT (Working Memory Modeling using Bayesian Analysis Techniques) software for fitting Morey’s model to data. WoMMBAT has a graphical user interface, is freely available, and is cross-platform, running on Windows, Linux, and Mac operating systems

    Dynamic Mixture-of-Experts Models for Longitudinal and Discrete-Time Survival Data

    Full text link
    We propose a general class of flexible models for longitudinal data with special emphasis on discrete-time survival data. The model is a finite mixture model where the subjects are allowed to move between components through time. The time-varying probability of component memberships is modeled as a function of subject-specific time-varying covariates. This allows for interesting within-subject dynamics and manageable computations even with a large number of subjects. Each parameter in the component densities and in the mixing function is connected to its own set of covariates through a link function. The models are estimated using a Bayesian approach via a highly efficient Markov Chain Monte Carlo (MCMC) algorithm with tailored proposals and variable selection in all set of covariates. The focus of the paper is on models for discrete-time survival data with an application to bankruptcy prediction for Swedish firms, using both exponential and Weibull mixture components. The dynamic mixture-of-experts models are shown to have an interesting interpretation and to dramatically improve the out-of-sample predictive density forecasts compared to models with time-invariant mixture probabilities

    Agricultural, socioeconomic and environmental variables as risks for human verotoxigenic Escherichia coli (VTEC) infection in Finland

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Verotoxigenic <it>E. coli </it>(VTEC) is the cause of severe gastrointestinal infection especially among infants. Between 10 and 20 cases are reported annually to the National Infectious Disease Register (NIDR) in Finland. The aim of this study was to identify explanatory variables for VTEC infections reported to the NIDR in Finland between 1997 and 2006. We applied a hurdle model, applicable for a dataset with an excess of zeros.</p> <p>Methods</p> <p>We enrolled 131 domestically acquired primary cases of VTEC between 1997 and 2006 from routine surveillance data. The isolated strains were characterized by virulence type, serogroup, phage type and pulsed-field gel electrophoresis. By applying a two-part Bayesian hurdle model to infectious disease surveillance data, we were able to create a model in which the covariates were associated with the probability for occurrence of the cases in the logistic regression part and the magnitude of covariate changes in the Poisson regression part if cases do occur. The model also included spatial correlations between neighbouring municipalities.</p> <p>Results</p> <p>The average annual incidence rate was 4.8 cases per million inhabitants based on the cases as reported to the NIDR. Of the 131 cases, 74 VTEC O157 and 58 non-O157 strains were isolated (one person had dual infections). The number of bulls per human population and the proportion of the population with a higher education were associated with an increased occurrence and incidence of human VTEC infections in 70 (17%) of 416 of Finnish municipalities. In addition, the proportion of fresh water per area, the proportion of cultivated land per area and the proportion of low income households with children were associated with increased incidence of VTEC infections.</p> <p>Conclusions</p> <p>With hurdle models we were able to distinguish between risk factors for the occurrence of the disease and the incidence of the disease for data characterised by an excess of zeros. The density of bulls and the proportion of the population with higher education were significant both for occurrence and incidence, while the proportion of fresh water, cultivated land, and the proportion of low income households with children were significant for the incidence of the disease.</p

    The fallacy of placing confidence in confidence intervals

    Get PDF
    Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead
    corecore