7 research outputs found
Generalized Dynamic Factor Models for Mixed-Measurement Time Series
<div><p>In this article, we propose generalized Bayesian dynamic factor models for jointly modeling mixed-measurement time series. The framework allows mixed-scale measurements associated with each time series, with different measurements having different distributions in the exponential family conditionally on time-varying latent factor(s). Efficient Bayesian computational algorithms are developed for posterior inference on both the latent factors and model parameters, based on a Metropolis–Hastings algorithm with adaptive proposals. The algorithm relies on a Greedy Density Kernel Approximation and parameter expansion with latent factor normalization. We tested the framework and algorithms in simulated studies and applied them to the analysis of intertwined credit and recovery risk for Moody’s rated firms from 1982 to 2008, illustrating the importance of jointly modeling mixed-measurement time series. The article has supplementary materials available online.</p></div
Robust Bayesian Inference via Coarsening
<p>The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a novel approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the event that the observed data are generated by the model, one conditions on the event that the model generates data close to the observed data, in a distributional sense. When closeness is defined in terms of relative entropy, the resulting “coarsened” posterior can be approximated by simply tempering the likelihood—that is, by raising the likelihood to a fractional power—thus, inference can usually be implemented via standard algorithms, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data using mixture models and autoregressive models of unknown order. Supplementary materials for this article are available online.</p
Bayesian Conditional Tensor Factorizations for High-Dimensional Classification
<p>In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near low-rank assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications. Supplementary materials for this article are available online.</p
Nonparametric Bayes Modeling of Populations of Networks
Replicated network data are increasingly available in many research fields. For example, in connectomic applications, interconnections among brain regions are collected for each patient under study, motivating statistical models which can flexibly characterize the probabilistic generative mechanism underlying these network-valued data. Available models for a single network are not designed specifically for inference on the entire probability mass function of a network-valued random variable and therefore lack flexibility in characterizing the distribution of relevant topological structures. We propose a flexible Bayesian nonparametric approach for modeling the population distribution of network-valued data. The joint distribution of the edges is defined via a mixture model that reduces dimensionality and efficiently incorporates network information within each mixture component by leveraging latent space representations. The formulation leads to an efficient Gibbs sampler and provides simple and coherent strategies for inference and goodness-of-fit assessments. We provide theoretical results on the flexibility of our model and illustrate improved performance—compared to state-of-the-art models—in simulations and application to human brain networks. Supplementary materials for this article are available online.</p
Online Variational Bayes Inference for High-Dimensional Correlated Data
<div><p>ABSTRACT</p><p>High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. The analysis of such data poses many computational challenges, especially when the observations are correlated over time and/or across space. In this paper we propose flexible hierarchical regression models for analyzing such data that accommodate serial and/or spatial correlation. We address the computational challenges involved in fitting these models by adopting an approximate inference framework. We develop an online variational Bayes algorithm that works by incrementally reading the data into memory one portion at a time. The performance of the method is assessed through simulation studies. The methodology is applied to analyze signal intensity in MRI images of subjects with knee osteoarthritis, using data from the Osteoarthritis Initiative.</p></div
MCMC for Imbalanced Categorical Data
<p>Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying different types of Markov chain Monte Carlo (MCMC) approaches. This article was motivated by an application to quantitative advertising in which we encountered extremely poor computational performance for data augmentation MCMC algorithms but obtained excellent performance for adaptive Metropolis. To obtain a deeper understanding of this behavior, we derive theoretical results on the computational complexity of commonly used data augmentation algorithms and the Random Walk Metropolis algorithm for highly imbalanced binary data. In this regime, our results show computational complexity of Metropolis is logarithmic in sample size, while data augmentation is polynomial in sample size. The root cause of this poor performance of data augmentation is a discrepancy between the rates at which the target density and MCMC step sizes concentrate. Our methods also show that MCMC algorithms that exhibit a similar discrepancy will fail in large samples—a result with substantial practical impact. Supplementary materials for this article are available online.</p
Personalised estimation of a woman’s most fertile days
<p><b>Objectives:</b> We propose a new, personalised approach of estimating a woman’s most fertile days that only requires recording the first day of menses and can use a smartphone to convey this information to the user so that she can plan or prevent pregnancy.</p> <p><b>Methods:</b> We performed a retrospective analysis of two cohort studies (a North Carolina-based study and the Early Pregnancy Study [EPS]) and a prospective multicentre trial (World Health Organization [WHO] study). The North Carolina study consisted of 68 sexually active women with either an intrauterine device or tubal ligation. The EPS comprised 221 women who planned to become pregnant and had no known fertility problems. The WHO study consisted of 706 women from five geographically and culturally diverse settings. Bayesian statistical methods were used to design our proposed method, Dynamic Optimal Timing (DOT). Simulation studies were used to estimate the cumulative pregnancy risk.</p> <p><b>Results:</b> For the proposed method, simulation analyses indicated a 4.4% cumulative probability of pregnancy over 13 cycles with correct use. After a calibration window, this method flagged between 11 and 13 days when unprotected intercourse should be avoided per cycle. Eligible women should have cycle lengths between 20 and 40 days with a variability range less than or equal to 9 days.</p> <p><b>Conclusions:</b> DOT can easily be implemented by computer or smartphone applications, allowing for women to make more informed decisions about their fertility. This approach is already incorporated into a patent-pending system and is available for free download on iPhones and Androids.</p
