Search CORE

14,532 research outputs found

Sequentiality and Adaptivity Gains in Active Hypothesis Testing

Author: Javidi Tara
Naghshvar Mohammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/11/2012
Field of study

Consider a decision maker who is responsible to collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest. The policies under which the decision maker selects sensing actions can be categorized based on the following two factors: i) sequential vs. non-sequential; ii) adaptive vs. non-adaptive. Non-sequential policies collect a fixed number of observation samples and make the final decision afterwards; while under sequential policies, the sample size is not known initially and is determined by the observation outcomes. Under adaptive policies, the decision maker relies on the previous collected samples to select the next sensing action; while under non-adaptive policies, the actions are selected independent of the past observation outcomes. In this paper, performance bounds are provided for the policies in each category. Using these bounds, sequentiality gain and adaptivity gain, i.e., the gains of sequential and adaptive selection of actions are characterized.Comment: 12 double-column pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Applying MDL to Learning Best Model Granularity

Author: Gao Qiong
Li Ming
Vitanyi Paul
Publication venue
Publication date: 01/01/2000
Field of study

The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies ``Occam's Razor.'' In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To appea

arXiv.org e-Print Archive

Elsevier - Publisher Connector

CWI's Institutional Repository

CERN Document Server

International Migration, Integration and Social Cohesion online publications

Coherent frequentism

Author: Datta G. S.
David R. Bickel
Fisher R. A.
Fisher R. A.
Gleser L. J.
Hacking I.
Hannig J.
Kaplan M.
Kyburg H. E. J.
Molchanov I.
Paris J. B.
Pawitan Y.
Smith C. A. B.
Walley P.
Welch B. L.
Wilkinson G. N.
Williamson J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

By representing the range of fair betting odds according to a pair of confidence set estimators, dual probability measures on parameter space called frequentist posteriors secure the coherence of subjective inference without any prior distribution. The closure of the set of expected losses corresponding to the dual frequentist posteriors constrains decisions without arbitrarily forcing optimization under all circumstances. This decision theory reduces to those that maximize expected utility when the pair of frequentist posteriors is induced by an exact or approximate confidence set estimator or when an automatic reduction rule is applied to the pair. In such cases, the resulting frequentist posterior is coherent in the sense that, as a probability distribution of the parameter of interest, it satisfies the axioms of the decision-theoretic and logic-theoretic systems typically cited in support of the Bayesian posterior. Unlike the p-value, the confidence level of an interval hypothesis derived from such a measure is suitable as an estimator of the indicator of hypothesis truth since it converges in sample-space probability to 1 if the hypothesis is true or to 0 otherwise under general conditions.Comment: The confidence-measure theory of inference and decision is explicitly extended to vector parameters of interest. The derivation of upper and lower confidence levels from valid and nonconservative set estimators is formalize

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bayesian interpolation

Author: MacKay David J. C.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/1992
Field of study

Although Bayesian analysis has been in use since Laplace, the Bayesian method of model-comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularization and model-comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other data modeling problems. Regularizing constants are set by examining their posterior probability distribution. Alternative regularizers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. “Occam's razor” is automatically embodied by this process. The way in which Bayes infers the values of regularizing constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling

CiteSeerX

Caltech Authors

Predictive hypothesis identification

Author: Hutter Marcus
Publication venue
Publication date: 01/09/2008
Field of study

While statistics focusses on hypothesis testing and on estimating (properties of) the true sampling distribution, in machine learning the performance of learning algorithms on future data is the primary issue. In this paper we bridge the gap with a general principle (PHI) that identifies hypotheses with best predictive performance. This includes predictive point and interval estimation, simple and composite hypothesis testing, (mixture) model selection, and others as special cases. For concrete instantiations we will recover well-known methods, variations thereof, and new ones. PHI nicely justifies, reconciles, and blends (a reparametrization invariant variation of) MAP, ML, MDL, and moment estimation. One particular feature of PHI is that it can genuinely deal with nested hypotheses

The Australian National University

Active sequential hypothesis testing

Author: Javidi Tara
Naghshvar Mohammad
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 18/12/2013
Field of study

Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information about an underlying phenomena of interest in a speedy manner while accounting for the penalty of wrong declaration. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most ``informative'' sensing action among the available ones. In this paper, using results in dynamic programming, lower bounds for the optimal total cost are established. The lower bounds characterize the fundamental limits on the maximum achievable information acquisition rate and the optimal reliability. Moreover, upper bounds are obtained via an analysis of two heuristic policies for dynamic selection of actions. It is shown that the first proposed heuristic achieves asymptotic optimality, where the notion of asymptotic optimality, due to Chernoff, implies that the relative difference between the total cost achieved by the proposed policy and the optimal total cost approaches zero as the penalty of wrong declaration (hence the number of collected samples) increases. The second heuristic is shown to achieve asymptotic optimality only in a limited setting such as the problem of a noisy dynamic search. However, by considering the dependency on the number of hypotheses, under a technical condition, this second heuristic is shown to achieve a nonzero information acquisition rate, establishing a lower bound for the maximum achievable rate and error exponent. In the case of a noisy dynamic search with size-independent noise, the obtained nonzero rate and error exponent are shown to be maximum.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1144 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref