360,714 research outputs found

    Early Accurate Results for Advanced Analytics on MapReduce

    Full text link
    Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to minimize the changes required to the MapReduce framework. Various tests of EARL of Hadoop are presented to characterize the frequent situations where EARL can provide major speed-ups over the current version of Hadoop.Comment: VLDB201

    Approximation of empowerment in the continuous domain

    Get PDF
    The empowerment formalism offers a goal-independent utility function fully derived from an agent's embodiment. It produces intrinsic motivations which can be used to generate self-organizing behaviours in agents. One obstacle to the application of empowerment in more demanding (esp. continuous) domains is that previous ways of calculating empowerment have been very time consuming and only provided a proof-of-concept. In this paper we present a new approach to efficiently approximate empowerment as a parallel, linear, Gaussian channel capacity problem. We use pendulum balancing to demonstrate this new method, and compare it to earlier approximation methods.Peer reviewe

    Audiovisual preservation strategies, data models and value-chains

    No full text
    This is a report on preservation strategies, models and value-chains for digital file-based audiovisual content. The report includes: (a)current and emerging value-chains and business-models for audiovisual preservation;(b) a comparison of preservation strategies for audiovisual content including their strengths and weaknesses, and(c) a review of current preservation metadata models, and requirements for extension to support audiovisual files

    Importance Sampling: Intrinsic Dimension and Computational Cost

    Get PDF
    The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering.Comment: Statistical Scienc

    Yambo: an \textit{ab initio} tool for excited state calculations

    Full text link
    {\tt yambo} is an {\it ab initio} code for calculating quasiparticle energies and optical properties of electronic systems within the framework of many-body perturbation theory and time-dependent density functional theory. Quasiparticle energies are calculated within the GWGW approximation for the self-energy. Optical properties are evaluated either by solving the Bethe--Salpeter equation or by using the adiabatic local density approximation. {\tt yambo} is a plane-wave code that, although particularly suited for calculations of periodic bulk systems, has been applied to a large variety of physical systems. {\tt yambo} relies on efficient numerical techniques devised to treat systems with reduced dimensionality, or with a large number of degrees of freedom. The code has a user-friendly command-line based interface, flexible I/O procedures and is interfaced to several publicly available density functional ground-state codes.Comment: This paper describes the features of the Yambo code, whose source is available under the GPL license at www.yambo-code.or
    corecore