508 research outputs found
Estimating Dependency Structure as a Hidden Variable
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification
Bayesian Conditional Tensor Factorizations for High-Dimensional Classification
In many application areas, data are collected on a categorical response and
high-dimensional categorical predictors, with the goals being to build a
parsimonious model for classification while doing inferences on the important
predictors. In settings such as genomics, there can be complex interactions
among the predictors. By using a carefully-structured Tucker factorization, we
define a model that can characterize any conditional probability, while
facilitating variable selection and modeling of higher-order interactions.
Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm
for posterior computation accommodating uncertainty in the predictors to be
included. Under near sparsity assumptions, the posterior distribution for the
conditional probability is shown to achieve close to the parametric rate of
contraction even in ultra high-dimensional settings. The methods are
illustrated using simulation examples and biomedical applications
Spatio-Temporal Low Count Processes with Application to Violent Crime Events
There is significant interest in being able to predict where crimes will
happen, for example to aid in the efficient tasking of police and other
protective measures. We aim to model both the temporal and spatial dependencies
often exhibited by violent crimes in order to make such predictions. The
temporal variation of crimes typically follows patterns familiar in time series
analysis, but the spatial patterns are irregular and do not vary smoothly
across the area. Instead we find that spatially disjoint regions exhibit
correlated crime patterns. It is this indeterminate inter-region correlation
structure along with the low-count, discrete nature of counts of serious crimes
that motivates our proposed forecasting tool. In particular, we propose to
model the crime counts in each region using an integer-valued first order
autoregressive process. We take a Bayesian nonparametric approach to flexibly
discover a clustering of these region-specific time series. We then describe
how to account for covariates within this framework. Both approaches adjust for
seasonality. We demonstrate our approach through an analysis of weekly reported
violent crimes in Washington, D.C. between 2001-2008. Our forecasts outperform
standard methods while additionally providing useful tools such as prediction
intervals
Desiderata for a Predictive Theory of Statistics
In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six \u27desiderata\u27. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy
- …