6,537 research outputs found

    Internal Medicine

    Get PDF
    Our objective was to develop a model to predict the length of stay of patients using data from MCV. We conducted our analysis using a dataset of over 130,000 patients described by 66 features. The features contained clinical characteristics (e.g. diagnosis), facility characteristics (e.g. bed type), and socioeconomic characteristics (e.g. insurance type). Our study was focused on patients that stayed in the hospital. To cope with data imperfections, such as missing data, we applied data cleaning methods. Using learned domain knowledge, we identified 9 features to build our predictive models: admit source, primary insurance, discharge disposition, admit unit, iso result, icu order, stepdown order, general care order, and age. Regression algorithms were then applied for length of stay prediction, using two views: one with the complete dataset, and the second decomposed independently into ten most popular diagnosis outcomes. We then used regression to model the length of stay using the whole dataset as well as splitting the patients by diagnosis. This division was dictated by a high variance within the data. Obtained machine learning models were embedded in a web application created via Angular. The app allows the user to pick which disease they are modeling, the specific model(s) to use, and the values for the variables. It then computes the result and displays visualization of the weights.https://scholarscompass.vcu.edu/capstone/1176/thumbnail.jp

    Variational Bayes with Intractable Likelihood

    Full text link
    Variational Bayes (VB) is rapidly becoming a popular tool for Bayesian inference in statistical modeling. However, the existing VB algorithms are restricted to cases where the likelihood is tractable, which precludes the use of VB in many interesting situations such as in state space models and in approximate Bayesian computation (ABC), where application of VB methods was previously impossible. This paper extends the scope of application of VB to cases where the likelihood is intractable, but can be estimated unbiasedly. The proposed VB method therefore makes it possible to carry out Bayesian inference in many statistical applications, including state space models and ABC. The method is generic in the sense that it can be applied to almost all statistical models without requiring too much model-based derivation, which is a drawback of many existing VB algorithms. We also show how the proposed method can be used to obtain highly accurate VB approximations of marginal posterior distributions.Comment: 40 pages, 6 figure

    Bayesian Deep Net GLM and GLMM

    Full text link
    Deep feedforward neural networks (DFNNs) are a powerful tool for functional approximation. We describe flexible versions of generalized linear and generalized linear mixed models incorporating basis functions formed by a DFNN. The consideration of neural networks with random effects is not widely used in the literature, perhaps because of the computational challenges of incorporating subject specific parameters into already complex models. Efficient computational methods for high-dimensional Bayesian inference are developed using Gaussian variational approximation, with a parsimonious but flexible factor parametrization of the covariance matrix. We implement natural gradient methods for the optimization, exploiting the factor structure of the variational covariance matrix in computation of the natural gradient. Our flexible DFNN models and Bayesian inference approach lead to a regression and classification method that has a high prediction accuracy, and is able to quantify the prediction uncertainty in a principled and convenient way. We also describe how to perform variable selection in our deep learning method. The proposed methods are illustrated in a wide range of simulated and real-data examples, and the results compare favourably to a state of the art flexible regression and classification method in the statistical literature, the Bayesian additive regression trees (BART) method. User-friendly software packages in Matlab, R and Python implementing the proposed methods are available at https://github.com/VBayesLabComment: 35 pages, 7 figure, 10 table

    Speeding Up MCMC by Delayed Acceptance and Data Subsampling

    Full text link
    The complexity of the Metropolis-Hastings (MH) algorithm arises from the requirement of a likelihood evaluation for the full data set in each iteration. Payne and Mallick (2015) propose to speed up the algorithm by a delayed acceptance approach where the acceptance decision proceeds in two stages. In the first stage, an estimate of the likelihood based on a random subsample determines if it is likely that the draw will be accepted and, if so, the second stage uses the full data likelihood to decide upon final acceptance. Evaluating the full data likelihood is thus avoided for draws that are unlikely to be accepted. We propose a more precise likelihood estimator which incorporates auxiliary information about the full data likelihood while only operating on a sparse set of the data. We prove that the resulting delayed acceptance MH is more efficient compared to that of Payne and Mallick (2015). The caveat of this approach is that the full data set needs to be evaluated in the second stage. We therefore propose to substitute this evaluation by an estimate and construct a state-dependent approximation thereof to use in the first stage. This results in an algorithm that (i) can use a smaller subsample m by leveraging on recent advances in Pseudo-Marginal MH (PMMH) and (ii) is provably within O(m2)O(m^{-2}) of the true posterior.Comment: Accepted for publication in Journal of Computational and Graphical Statistic

    Speeding Up MCMC by Efficient Data Subsampling

    Full text link
    We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework where the likelihood function for nn observations is estimated from a random subset of mm observations. We introduce a highly efficient unbiased estimator of the log-likelihood based on control variates, such that the computing cost is much smaller than that of the full log-likelihood in standard MCMC. The likelihood estimate is bias-corrected and used in two dependent pseudo-marginal algorithms to sample from a perturbed posterior, for which we derive the asymptotic error with respect to nn and mm, respectively. We propose a practical estimator of the error and show that the error is negligible even for a very small mm in our applications. We demonstrate that Subsampling MCMC is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.Comment: Main changes: The theory has been significantly revise
    corecore