7,191 research outputs found
Internal Medicine
Our objective was to develop a model to predict the length of stay of patients using data from MCV. We conducted our analysis using a dataset of over 130,000 patients described by 66 features. The features contained clinical characteristics (e.g. diagnosis), facility characteristics (e.g. bed type), and socioeconomic characteristics (e.g. insurance type). Our study was focused on patients that stayed in the hospital. To cope with data imperfections, such as missing data, we applied data cleaning methods. Using learned domain knowledge, we identified 9 features to build our predictive models: admit source, primary insurance, discharge disposition, admit unit, iso result, icu order, stepdown order, general care order, and age. Regression algorithms were then applied for length of stay prediction, using two views: one with the complete dataset, and the second decomposed independently into ten most popular diagnosis outcomes. We then used regression to model the length of stay using the whole dataset as well as splitting the patients by diagnosis. This division was dictated by a high variance within the data. Obtained machine learning models were embedded in a web application created via Angular. The app allows the user to pick which disease they are modeling, the specific model(s) to use, and the values for the variables. It then computes the result and displays visualization of the weights.https://scholarscompass.vcu.edu/capstone/1176/thumbnail.jp
Variational Bayes with Intractable Likelihood
Variational Bayes (VB) is rapidly becoming a popular tool for Bayesian
inference in statistical modeling. However, the existing VB algorithms are
restricted to cases where the likelihood is tractable, which precludes the use
of VB in many interesting situations such as in state space models and in
approximate Bayesian computation (ABC), where application of VB methods was
previously impossible. This paper extends the scope of application of VB to
cases where the likelihood is intractable, but can be estimated unbiasedly. The
proposed VB method therefore makes it possible to carry out Bayesian inference
in many statistical applications, including state space models and ABC. The
method is generic in the sense that it can be applied to almost all statistical
models without requiring too much model-based derivation, which is a drawback
of many existing VB algorithms. We also show how the proposed method can be
used to obtain highly accurate VB approximations of marginal posterior
distributions.Comment: 40 pages, 6 figure
Bayesian Deep Net GLM and GLMM
Deep feedforward neural networks (DFNNs) are a powerful tool for functional
approximation. We describe flexible versions of generalized linear and
generalized linear mixed models incorporating basis functions formed by a DFNN.
The consideration of neural networks with random effects is not widely used in
the literature, perhaps because of the computational challenges of
incorporating subject specific parameters into already complex models.
Efficient computational methods for high-dimensional Bayesian inference are
developed using Gaussian variational approximation, with a parsimonious but
flexible factor parametrization of the covariance matrix. We implement natural
gradient methods for the optimization, exploiting the factor structure of the
variational covariance matrix in computation of the natural gradient. Our
flexible DFNN models and Bayesian inference approach lead to a regression and
classification method that has a high prediction accuracy, and is able to
quantify the prediction uncertainty in a principled and convenient way. We also
describe how to perform variable selection in our deep learning method. The
proposed methods are illustrated in a wide range of simulated and real-data
examples, and the results compare favourably to a state of the art flexible
regression and classification method in the statistical literature, the
Bayesian additive regression trees (BART) method. User-friendly software
packages in Matlab, R and Python implementing the proposed methods are
available at https://github.com/VBayesLabComment: 35 pages, 7 figure, 10 table
Speeding Up MCMC by Delayed Acceptance and Data Subsampling
The complexity of the Metropolis-Hastings (MH) algorithm arises from the
requirement of a likelihood evaluation for the full data set in each iteration.
Payne and Mallick (2015) propose to speed up the algorithm by a delayed
acceptance approach where the acceptance decision proceeds in two stages. In
the first stage, an estimate of the likelihood based on a random subsample
determines if it is likely that the draw will be accepted and, if so, the
second stage uses the full data likelihood to decide upon final acceptance.
Evaluating the full data likelihood is thus avoided for draws that are unlikely
to be accepted. We propose a more precise likelihood estimator which
incorporates auxiliary information about the full data likelihood while only
operating on a sparse set of the data. We prove that the resulting delayed
acceptance MH is more efficient compared to that of Payne and Mallick (2015).
The caveat of this approach is that the full data set needs to be evaluated in
the second stage. We therefore propose to substitute this evaluation by an
estimate and construct a state-dependent approximation thereof to use in the
first stage. This results in an algorithm that (i) can use a smaller subsample
m by leveraging on recent advances in Pseudo-Marginal MH (PMMH) and (ii) is
provably within of the true posterior.Comment: Accepted for publication in Journal of Computational and Graphical
Statistic
Speeding Up MCMC by Efficient Data Subsampling
We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework
where the likelihood function for observations is estimated from a random
subset of observations. We introduce a highly efficient unbiased estimator
of the log-likelihood based on control variates, such that the computing cost
is much smaller than that of the full log-likelihood in standard MCMC. The
likelihood estimate is bias-corrected and used in two dependent pseudo-marginal
algorithms to sample from a perturbed posterior, for which we derive the
asymptotic error with respect to and , respectively. We propose a
practical estimator of the error and show that the error is negligible even for
a very small in our applications. We demonstrate that Subsampling MCMC is
substantially more efficient than standard MCMC in terms of sampling efficiency
for a given computational budget, and that it outperforms other subsampling
methods for MCMC proposed in the literature.Comment: Main changes: The theory has been significantly revise
- …