371,170 research outputs found
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
Approximation of empowerment in the continuous domain
The empowerment formalism offers a goal-independent utility function fully derived from an agent's embodiment. It produces intrinsic motivations which can be used to generate self-organizing behaviours in agents. One obstacle to the application of empowerment in more demanding (esp. continuous) domains is that previous ways of calculating empowerment have been very time consuming and only provided a proof-of-concept. In this paper we present a new approach to efficiently approximate empowerment as a parallel, linear, Gaussian channel capacity problem. We use pendulum balancing to demonstrate this new method, and compare it to earlier approximation methods.Peer reviewe
Audiovisual preservation strategies, data models and value-chains
This is a report on preservation strategies, models and value-chains for digital file-based audiovisual content. The report includes: (a)current and emerging value-chains and business-models for audiovisual preservation;(b) a comparison of preservation strategies for audiovisual content including their strengths and weaknesses, and(c) a review of current preservation metadata models, and requirements for extension to support audiovisual files
Importance Sampling: Intrinsic Dimension and Computational Cost
The basic idea of importance sampling is to use independent samples from a
proposal measure in order to approximate expectations with respect to a target
measure. It is key to understand how many samples are required in order to
guarantee accurate approximations. Intuitively, some notion of distance between
the target and the proposal should determine the computational cost of the
method. A major challenge is to quantify this distance in terms of parameters
or statistics that are pertinent for the practitioner. The subject has
attracted substantial interest from within a variety of communities. The
objective of this paper is to overview and unify the resulting literature by
creating an overarching framework. A general theory is presented, with a focus
on the use of importance sampling in Bayesian inverse problems and filtering.Comment: Statistical Scienc
- ā¦