8,135 research outputs found
Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks
We present a procedure for effective estimation of entropy and mutual
information from small-sample data, and apply it to the problem of inferring
high-dimensional gene association networks. Specifically, we develop a
James-Stein-type shrinkage estimator, resulting in a procedure that is highly
efficient statistically as well as computationally. Despite its simplicity, we
show that it outperforms eight other entropy estimation procedures across a
diverse range of sampling scenarios and data-generating models, even in cases
of severe undersampling. We illustrate the approach by analyzing E. coli gene
expression data and computing an entropy-based gene-association network from
gene expression data. A computer program is available that implements the
proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl
A Nonparametric Bayesian Approach to Uncovering Rat Hippocampal Population Codes During Spatial Navigation
Rodent hippocampal population codes represent important spatial information
about the environment during navigation. Several computational methods have
been developed to uncover the neural representation of spatial topology
embedded in rodent hippocampal ensemble spike activity. Here we extend our
previous work and propose a nonparametric Bayesian approach to infer rat
hippocampal population codes during spatial navigation. To tackle the model
selection problem, we leverage a nonparametric Bayesian model. Specifically, to
analyze rat hippocampal ensemble spiking activity, we apply a hierarchical
Dirichlet process-hidden Markov model (HDP-HMM) using two Bayesian inference
methods, one based on Markov chain Monte Carlo (MCMC) and the other based on
variational Bayes (VB). We demonstrate the effectiveness of our Bayesian
approaches on recordings from a freely-behaving rat navigating in an open field
environment. We find that MCMC-based inference with Hamiltonian Monte Carlo
(HMC) hyperparameter sampling is flexible and efficient, and outperforms VB and
MCMC approaches with hyperparameters set by empirical Bayes
Revealing Relationships among Relevant Climate Variables with Information Theory
A primary objective of the NASA Earth-Sun Exploration Technology Office is to
understand the observed Earth climate variability, thus enabling the
determination and prediction of the climate's response to both natural and
human-induced forcing. We are currently developing a suite of computational
tools that will allow researchers to calculate, from data, a variety of
information-theoretic quantities such as mutual information, which can be used
to identify relationships among climate variables, and transfer entropy, which
indicates the possibility of causal interactions. Our tools estimate these
quantities along with their associated error bars, the latter of which is
critical for describing the degree of uncertainty in the estimates. This work
is based upon optimal binning techniques that we have developed for
piecewise-constant, histogram-style models of the underlying density functions.
Two useful side benefits have already been discovered. The first allows a
researcher to determine whether there exist sufficient data to estimate the
underlying probability density. The second permits one to determine an
acceptable degree of round-off when compressing data for efficient transfer and
storage. We also demonstrate how mutual information and transfer entropy can be
applied so as to allow researchers not only to identify relations among climate
variables, but also to characterize and quantify their possible causal
interactions.Comment: 14 pages, 5 figures, Proceedings of the Earth-Sun System Technology
Conference (ESTC 2005), Adelphi, M
Predictive Uncertainty through Quantization
High-risk domains require reliable confidence estimates from predictive
models. Deep latent variable models provide these, but suffer from the rigid
variational distributions used for tractable inference, which err on the side
of overconfidence. We propose Stochastic Quantized Activation Distributions
(SQUAD), which imposes a flexible yet tractable distribution over discretized
latent variables. The proposed method is scalable, self-normalizing and sample
efficient. We demonstrate that the model fully utilizes the flexible
distribution, learns interesting non-linearities, and provides predictive
uncertainty of competitive quality
Exact Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, moments,
and other quantities. We prove asymptotic convergence and consistency results,
and illustrate the behavior of our model on some prototypical functions.Comment: 32 LaTeX pages, 9 figures, 5 theorems, 1 algorith
- …