508 research outputs found

    Estimating Dependency Structure as a Hidden Variable

    Get PDF
    This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification

    Bayesian Conditional Tensor Factorizations for High-Dimensional Classification

    Full text link
    In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully-structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near sparsity assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications

    Spatio-Temporal Low Count Processes with Application to Violent Crime Events

    Full text link
    There is significant interest in being able to predict where crimes will happen, for example to aid in the efficient tasking of police and other protective measures. We aim to model both the temporal and spatial dependencies often exhibited by violent crimes in order to make such predictions. The temporal variation of crimes typically follows patterns familiar in time series analysis, but the spatial patterns are irregular and do not vary smoothly across the area. Instead we find that spatially disjoint regions exhibit correlated crime patterns. It is this indeterminate inter-region correlation structure along with the low-count, discrete nature of counts of serious crimes that motivates our proposed forecasting tool. In particular, we propose to model the crime counts in each region using an integer-valued first order autoregressive process. We take a Bayesian nonparametric approach to flexibly discover a clustering of these region-specific time series. We then describe how to account for covariates within this framework. Both approaches adjust for seasonality. We demonstrate our approach through an analysis of weekly reported violent crimes in Washington, D.C. between 2001-2008. Our forecasts outperform standard methods while additionally providing useful tools such as prediction intervals

    Desiderata for a Predictive Theory of Statistics

    Get PDF
    In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six \u27desiderata\u27. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy
    • …
    corecore