14,532 research outputs found
Sequentiality and Adaptivity Gains in Active Hypothesis Testing
Consider a decision maker who is responsible to collect observations so as to
enhance his information in a speedy manner about an underlying phenomena of
interest. The policies under which the decision maker selects sensing actions
can be categorized based on the following two factors: i) sequential vs.
non-sequential; ii) adaptive vs. non-adaptive. Non-sequential policies collect
a fixed number of observation samples and make the final decision afterwards;
while under sequential policies, the sample size is not known initially and is
determined by the observation outcomes. Under adaptive policies, the decision
maker relies on the previous collected samples to select the next sensing
action; while under non-adaptive policies, the actions are selected independent
of the past observation outcomes.
In this paper, performance bounds are provided for the policies in each
category. Using these bounds, sequentiality gain and adaptivity gain, i.e., the
gains of sequential and adaptive selection of actions are characterized.Comment: 12 double-column pages, 1 figur
Applying MDL to Learning Best Model Granularity
The Minimum Description Length (MDL) principle is solidly based on a provably
ideal method of inference using Kolmogorov complexity. We test how the theory
behaves in practice on a general problem in model selection: that of learning
the best model granularity. The performance of a model depends critically on
the granularity, for example the choice of precision of the parameters. Too
high precision generally involves modeling of accidental noise and too low
precision may lead to confusion of models that should be distinguished. This
precision is often determined ad hoc. In MDL the best model is the one that
most compresses a two-part code of the data set: this embodies ``Occam's
Razor.'' In two quite different experimental settings the theoretical value
determined using MDL coincides with the best value found experimentally. In the
first experiment the task is to recognize isolated handwritten characters in
one subject's handwriting, irrespective of size and orientation. Based on a new
modification of elastic matching, using multiple prototypes per character, the
optimal prediction rate is predicted for the learned parameter (length of
sampling interval) considered most likely by MDL, which is shown to coincide
with the best value found experimentally. In the second experiment the task is
to model a robot arm with two degrees of freedom using a three layer
feed-forward neural network where we need to determine the number of nodes in
the hidden layer giving best modeling performance. The optimal model (the one
that extrapolizes best on unseen examples) is predicted for the number of nodes
in the hidden layer considered most likely by MDL, which again is found to
coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To
appea
Coherent frequentism
By representing the range of fair betting odds according to a pair of
confidence set estimators, dual probability measures on parameter space called
frequentist posteriors secure the coherence of subjective inference without any
prior distribution. The closure of the set of expected losses corresponding to
the dual frequentist posteriors constrains decisions without arbitrarily
forcing optimization under all circumstances. This decision theory reduces to
those that maximize expected utility when the pair of frequentist posteriors is
induced by an exact or approximate confidence set estimator or when an
automatic reduction rule is applied to the pair. In such cases, the resulting
frequentist posterior is coherent in the sense that, as a probability
distribution of the parameter of interest, it satisfies the axioms of the
decision-theoretic and logic-theoretic systems typically cited in support of
the Bayesian posterior. Unlike the p-value, the confidence level of an interval
hypothesis derived from such a measure is suitable as an estimator of the
indicator of hypothesis truth since it converges in sample-space probability to
1 if the hypothesis is true or to 0 otherwise under general conditions.Comment: The confidence-measure theory of inference and decision is explicitly
extended to vector parameters of interest. The derivation of upper and lower
confidence levels from valid and nonconservative set estimators is formalize
Bayesian interpolation
Although Bayesian analysis has been in use since Laplace, the Bayesian method of model-comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other data modeling problems. Regularizing constants are set by examining their posterior probability distribution. Alternative regularizers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. “Occam's razor” is automatically embodied by this process. The way in which Bayes infers the values of regularizing constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling
Predictive hypothesis identification
While statistics focusses on hypothesis testing and on estimating (properties
of) the true sampling distribution, in machine learning the performance of
learning algorithms on future data is the primary issue. In this paper we bridge
the gap with a general principle (PHI) that identifies hypotheses with best
predictive performance. This includes predictive point and interval estimation,
simple and composite hypothesis testing, (mixture) model selection, and
others as special cases. For concrete instantiations we will recover well-known
methods, variations thereof, and new ones. PHI nicely justifies, reconciles,
and blends (a reparametrization invariant variation of) MAP, ML, MDL, and
moment estimation. One particular feature of PHI is that it can genuinely
deal with nested hypotheses
Active sequential hypothesis testing
Consider a decision maker who is responsible to dynamically collect
observations so as to enhance his information about an underlying phenomena of
interest in a speedy manner while accounting for the penalty of wrong
declaration. Due to the sequential nature of the problem, the decision maker
relies on his current information state to adaptively select the most
``informative'' sensing action among the available ones. In this paper, using
results in dynamic programming, lower bounds for the optimal total cost are
established. The lower bounds characterize the fundamental limits on the
maximum achievable information acquisition rate and the optimal reliability.
Moreover, upper bounds are obtained via an analysis of two heuristic policies
for dynamic selection of actions. It is shown that the first proposed heuristic
achieves asymptotic optimality, where the notion of asymptotic optimality, due
to Chernoff, implies that the relative difference between the total cost
achieved by the proposed policy and the optimal total cost approaches zero as
the penalty of wrong declaration (hence the number of collected samples)
increases. The second heuristic is shown to achieve asymptotic optimality only
in a limited setting such as the problem of a noisy dynamic search. However, by
considering the dependency on the number of hypotheses, under a technical
condition, this second heuristic is shown to achieve a nonzero information
acquisition rate, establishing a lower bound for the maximum achievable rate
and error exponent. In the case of a noisy dynamic search with size-independent
noise, the obtained nonzero rate and error exponent are shown to be maximum.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1144 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …