2,497 research outputs found
Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks
We present a procedure for effective estimation of entropy and mutual
information from small-sample data, and apply it to the problem of inferring
high-dimensional gene association networks. Specifically, we develop a
James-Stein-type shrinkage estimator, resulting in a procedure that is highly
efficient statistically as well as computationally. Despite its simplicity, we
show that it outperforms eight other entropy estimation procedures across a
diverse range of sampling scenarios and data-generating models, even in cases
of severe undersampling. We illustrate the approach by analyzing E. coli gene
expression data and computing an entropy-based gene-association network from
gene expression data. A computer program is available that implements the
proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl
Shrinkage Estimators in Online Experiments
We develop and analyze empirical Bayes Stein-type estimators for use in the
estimation of causal effects in large-scale online experiments. While online
experiments are generally thought to be distinguished by their large sample
size, we focus on the multiplicity of treatment groups. The typical analysis
practice is to use simple differences-in-means (perhaps with covariate
adjustment) as if all treatment arms were independent. In this work we develop
consistent, small bias, shrinkage estimators for this setting. In addition to
achieving lower mean squared error these estimators retain important
frequentist properties such as coverage under most reasonable scenarios. Modern
sequential methods of experimentation and optimization such as multi-armed
bandit optimization (where treatment allocations adapt over time to prior
responses) benefit from the use of our shrinkage estimators. Exploration under
empirical Bayes focuses more efficiently on near-optimal arms, improving the
resulting decisions made under uncertainty. We demonstrate these properties by
examining seventeen large-scale experiments conducted on Facebook from April to
June 2017
Shrinkage Confidence Procedures
The possibility of improving on the usual multivariate normal confidence was
first discussed in Stein (1962). Using the ideas of shrinkage, through Bayesian
and empirical Bayesian arguments, domination results, both analytic and
numerical, have been obtained. Here we trace some of the developments in
confidence set estimation.Comment: Published in at http://dx.doi.org/10.1214/10-STS319 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies
Batting average is one of the principle performance measures for an
individual baseball player. It is natural to statistically model this as a
binomial-variable proportion, with a given (observed) number of qualifying
attempts (called ``at-bats''), an observed number of successes (``hits'')
distributed according to the binomial distribution, and with a true (but
unknown) value of that represents the player's latent ability. This is a
common data structure in many statistical applications; and so the
methodological study here has implications for such a range of applications. We
look at batting records for each Major League player over the course of a
single season (2005). The primary focus is on using only the batting records
from an earlier part of the season (e.g., the first 3 months) in order to
estimate the batter's latent ability, , and consequently, also to predict
their batting-average performance for the remainder of the season. Since we are
using a season that has already concluded, we can then validate our estimation
performance by comparing the estimated values to the actual values for the
remainder of the season. The prediction methods to be investigated are
motivated from empirical Bayes and hierarchical Bayes interpretations. A newly
proposed nonparametric empirical Bayes procedure performs particularly well in
the basic analysis of the full data set, though less well with analyses
involving more homogeneous subsets of the data. In those more homogeneous
situations better performance is obtained from appropriate versions of more
familiar methods. In all situations the poorest performing choice is the
na\"{{\i}}ve predictor which directly uses the current average to predict the
future average.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS138 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Small Area Shrinkage Estimation
The need for small area estimates is increasingly felt in both the public and
private sectors in order to formulate their strategic plans. It is now widely
recognized that direct small area survey estimates are highly unreliable owing
to large standard errors and coefficients of variation. The reason behind this
is that a survey is usually designed to achieve a specified level of accuracy
at a higher level of geography than that of small areas. Lack of additional
resources makes it almost imperative to use the same data to produce small area
estimates. For example, if a survey is designed to estimate per capita income
for a state, the same survey data need to be used to produce similar estimates
for counties, subcounties and census divisions within that state. Thus, by
necessity, small area estimation needs explicit, or at least implicit, use of
models to link these areas. Improved small area estimates are found by
"borrowing strength" from similar neighboring areas.Comment: Published in at http://dx.doi.org/10.1214/11-STS374 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Blind Minimax Estimation
We consider the linear regression problem of estimating an unknown,
deterministic parameter vector based on measurements corrupted by colored
Gaussian noise. We present and analyze blind minimax estimators (BMEs), which
consist of a bounded parameter set minimax estimator, whose parameter set is
itself estimated from measurements. Thus, one does not require any prior
assumption or knowledge, and the proposed estimator can be applied to any
linear regression problem. We demonstrate analytically that the BMEs strictly
dominate the least-squares estimator, i.e., they achieve lower mean-squared
error for any value of the parameter vector. Both Stein's estimator and its
positive-part correction can be derived within the blind minimax framework.
Furthermore, our approach can be readily extended to a wider class of
estimation problems than Stein's estimator, which is defined only for white
noise and non-transformed measurements. We show through simulations that the
BMEs generally outperform previous extensions of Stein's technique.Comment: 12 pages, 7 figure
- …