12,768 research outputs found
Statistical comparisons of non-deterministic IR systems using two dimensional variance
Retrieval systems with non-deterministic output are widely used in information retrieval. Common examples include sampling, approximation algorithms, or interactive user input. The effectiveness of such systems differs not just for different topics, but also for different instances of the system. The inherent variance presents a dilemma - What is the best way to measure the effectiveness of a non-deterministic IR system? Existing approaches to IR evaluation do not consider this problem, or the potential impact on statistical significance. In this paper, we explore how such variance can affect system comparisons, and propose an evaluation framework and methodologies capable of doing this comparison. Using the context of distributed information retrieval as a case study for our investigation, we show that the approaches provide a consistent and reliable methodology to compare the effectiveness of a non-deterministic system with a deterministic or another non-deterministic system. In addition, we present a statistical best-practice that can be used to safely show how a non-deterministic IR system has equivalent effectiveness to another IR system, and how to avoid the common pitfall of misusing a lack of significance as a proof that two systems have equivalent effectiveness
Fractal image compression and the self-affinity assumption : a stochastic signal modelling perspective
Bibliography: p. 208-225.Fractal image compression is a comparatively new technique which has gained considerable attention in the popular technical press, and more recently in the research literature. The most significant advantages claimed are high reconstruction quality at low coding rates, rapid decoding, and "resolution independence" in the sense that an encoded image may be decoded at a higher resolution than the original. While many of the claims published in the popular technical press are clearly extravagant, it appears from the rapidly growing body of published research that fractal image compression is capable of performance comparable with that of other techniques enjoying the benefit of a considerably more robust theoretical foundation. . So called because of the similarities between the form of image representation and a mechanism widely used in generating deterministic fractal images, fractal compression represents an image by the parameters of a set of affine transforms on image blocks under which the image is approximately invariant. Although the conditions imposed on these transforms may be shown to be sufficient to guarantee that an approximation of the original image can be reconstructed, there is no obvious theoretical reason to expect this to represent an efficient representation for image coding purposes. The usual analogy with vector quantisation, in which each image is considered to be represented in terms of code vectors extracted from the image itself is instructive, but transforms the fundamental problem into one of understanding why this construction results in an efficient codebook. The signal property required for such a codebook to be effective, termed "self-affinity", is poorly understood. A stochastic signal model based examination of this property is the primary contribution of this dissertation. The most significant findings (subject to some important restrictions} are that "self-affinity" is not a natural consequence of common statistical assumptions but requires particular conditions which are inadequately characterised by second order statistics, and that "natural" images are only marginally "self-affine", to the extent that fractal image compression is effective, but not more so than comparable standard vector quantisation techniques
Making Wald Tests Work for Cointegrated Systems.
Wald tests of restrictions on the coefficients of vector autoregressive (VAR) processes are known to have nonstandard asymptotic properties for 1(1) and cointegrated systems of variables. A simple device is proposed which guarantees that Wald tests have asymptotic X2-distributions under general conditions. If the true generation process is a VAR(p) it is proposed to fit a VAR(p+l) to the data and perform a Wald test on the coefficients of the first p lags only. The power properties of the modified tests are studied both analytically and numerically by means of simple illustrative examples.
A computational framework for infinite-dimensional Bayesian inverse problems: Part II. Stochastic Newton MCMC with application to ice sheet flow inverse problems
We address the numerical solution of infinite-dimensional inverse problems in
the framework of Bayesian inference. In the Part I companion to this paper
(arXiv.org:1308.1313), we considered the linearized infinite-dimensional
inverse problem. Here in Part II, we relax the linearization assumption and
consider the fully nonlinear infinite-dimensional inverse problem using a
Markov chain Monte Carlo (MCMC) sampling method. To address the challenges of
sampling high-dimensional pdfs arising from Bayesian inverse problems governed
by PDEs, we build on the stochastic Newton MCMC method. This method exploits
problem structure by taking as a proposal density a local Gaussian
approximation of the posterior pdf, whose construction is made tractable by
invoking a low-rank approximation of its data misfit component of the Hessian.
Here we introduce an approximation of the stochastic Newton proposal in which
we compute the low-rank-based Hessian at just the MAP point, and then reuse
this Hessian at each MCMC step. We compare the performance of the proposed
method to the original stochastic Newton MCMC method and to an independence
sampler. The comparison of the three methods is conducted on a synthetic ice
sheet inverse problem. For this problem, the stochastic Newton MCMC method with
a MAP-based Hessian converges at least as rapidly as the original stochastic
Newton MCMC method, but is far cheaper since it avoids recomputing the Hessian
at each step. On the other hand, it is more expensive per sample than the
independence sampler; however, its convergence is significantly more rapid, and
thus overall it is much cheaper. Finally, we present extensive analysis and
interpretation of the posterior distribution, and classify directions in
parameter space based on the extent to which they are informed by the prior or
the observations.Comment: 31 page
- …