117,233 research outputs found

    Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care

    Full text link
    In the field of quality of health care measurement, one approach to assessing patient sickness at admission involves a logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing classical variable selection methods to find an ``optimal'' subset of 10--20 indicators. Such ``benefit-only'' methods ignore the considerable differences among the sickness indicators in cost of data collection, an issue that is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the U.S. and U.K.) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness). When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which costly variables that do not predict well enough should be omitted from the final scale. In this paper (a) we develop a method for solving this problem based on posterior model odds, arising from a prior distribution that (1) accounts for the cost of each variable and (2) results in a set of posterior model probabilities that corresponds to a generalized cost-adjusted version of the Bayesian information criterion (BIC), and (b) we compare this method with a decision-theoretic cost-benefit approach based on maximizing expected utility. We use reversible-jump Markov chain Monte Carlo (RJMCMC) methods to search the model space, and we check the stability of our findings with two variants of the MCMC model composition (MC3\mathit{MC}^3) algorithm.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS207 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    Experiences in Bayesian Inference in Baltic Salmon Management

    Get PDF
    We review a success story regarding Bayesian inference in fisheries management in the Baltic Sea. The management of salmon fisheries is currently based on the results of a complex Bayesian population dynamic model, and managers and stakeholders use the probabilities in their discussions. We also discuss the technical and human challenges in using Bayesian modeling to give practical advice to the public and to government officials and suggest future areas in which it can be applied. In particular, large databases in fisheries science offer flexible ways to use hierarchical models to learn the population dynamics parameters for those by-catch species that do not have similar large stock-specific data sets like those that exist for many target species. This information is required if we are to understand the future ecosystem risks of fisheries.Comment: Published in at http://dx.doi.org/10.1214/13-STS431 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Stochastic Hybrid Framework for Driver Behavior Modeling Based on Hierarchical Dirichlet Process

    Full text link
    Scalability is one of the major issues for real-world Vehicle-to-Vehicle network realization. To tackle this challenge, a stochastic hybrid modeling framework based on a non-parametric Bayesian inference method, i.e., hierarchical Dirichlet process (HDP), is investigated in this paper. This framework is able to jointly model driver/vehicle behavior through forecasting the vehicle dynamical time-series. This modeling framework could be merged with the notion of model-based information networking, which is recently proposed in the vehicular literature, to overcome the scalability challenges in dense vehicular networks via broadcasting the behavioral models instead of raw information dissemination. This modeling approach has been applied on several scenarios from the realistic Safety Pilot Model Deployment (SPMD) driving data set and the results show a higher performance of this model in comparison with the zero-hold method as the baseline.Comment: This is the accepted version of the paper in 2018 IEEE 88th Vehicular Technology Conference (VTC2018-Fall) (references added, title and abstract modified
    corecore