117,233 research outputs found
Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of health care
In the field of quality of health care measurement, one approach to assessing
patient sickness at admission involves a logistic regression of mortality
within 30 days of admission on a fairly large number of sickness indicators (on
the order of 100) to construct a sickness scale, employing classical variable
selection methods to find an ``optimal'' subset of 10--20 indicators. Such
``benefit-only'' methods ignore the considerable differences among the sickness
indicators in cost of data collection, an issue that is crucial when admission
sickness is used to drive programs (now implemented or under consideration in
several countries, including the U.S. and U.K.) that attempt to identify
substandard hospitals by comparing observed and expected mortality rates (given
admission sickness). When both data-collection cost and accuracy of prediction
of 30-day mortality are considered, a large variable-selection problem arises
in which costly variables that do not predict well enough should be omitted
from the final scale. In this paper (a) we develop a method for solving this
problem based on posterior model odds, arising from a prior distribution that
(1) accounts for the cost of each variable and (2) results in a set of
posterior model probabilities that corresponds to a generalized cost-adjusted
version of the Bayesian information criterion (BIC), and (b) we compare this
method with a decision-theoretic cost-benefit approach based on maximizing
expected utility. We use reversible-jump Markov chain Monte Carlo (RJMCMC)
methods to search the model space, and we check the stability of our findings
with two variants of the MCMC model composition () algorithm.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS207 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Producing power-law distributions and damping word frequencies with two-stage language models
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s
Experiences in Bayesian Inference in Baltic Salmon Management
We review a success story regarding Bayesian inference in fisheries
management in the Baltic Sea. The management of salmon fisheries is currently
based on the results of a complex Bayesian population dynamic model, and
managers and stakeholders use the probabilities in their discussions. We also
discuss the technical and human challenges in using Bayesian modeling to give
practical advice to the public and to government officials and suggest future
areas in which it can be applied. In particular, large databases in fisheries
science offer flexible ways to use hierarchical models to learn the population
dynamics parameters for those by-catch species that do not have similar large
stock-specific data sets like those that exist for many target species. This
information is required if we are to understand the future ecosystem risks of
fisheries.Comment: Published in at http://dx.doi.org/10.1214/13-STS431 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Stochastic Hybrid Framework for Driver Behavior Modeling Based on Hierarchical Dirichlet Process
Scalability is one of the major issues for real-world Vehicle-to-Vehicle
network realization. To tackle this challenge, a stochastic hybrid modeling
framework based on a non-parametric Bayesian inference method, i.e.,
hierarchical Dirichlet process (HDP), is investigated in this paper. This
framework is able to jointly model driver/vehicle behavior through forecasting
the vehicle dynamical time-series. This modeling framework could be merged with
the notion of model-based information networking, which is recently proposed in
the vehicular literature, to overcome the scalability challenges in dense
vehicular networks via broadcasting the behavioral models instead of raw
information dissemination. This modeling approach has been applied on several
scenarios from the realistic Safety Pilot Model Deployment (SPMD) driving data
set and the results show a higher performance of this model in comparison with
the zero-hold method as the baseline.Comment: This is the accepted version of the paper in 2018 IEEE 88th Vehicular
Technology Conference (VTC2018-Fall) (references added, title and abstract
modified
- …