36,222 research outputs found

    Fuzzy memoization for floating-point multimedia applications

    Get PDF
    Instruction memoization is a promising technique to reduce the power consumption and increase the performance of future low-end/mobile multimedia systems. Power and performance efficiency can be improved by reusing instances of an already executed operation. Unfortunately, this technique may not always be worth the effort due to the power consumption and area impact of the tables required to leverage an adequate level of reuse. In this paper, we introduce and evaluate a novel way of understanding multimedia floating-point operations based on the fuzzy computation paradigm: performance and power consumption can be improved at the cost of small precision losses in computation. By exploiting this implicit characteristic of multimedia applications, we propose a new technique called tolerant memoization. This technique expands the capabilities of classic memoization by associating entries with similar inputs to the same output. We evaluate this new technique by measuring the effect of tolerant memoization for floating-point operations in a low-power multimedia processor and discuss the trade-offs between performance and quality of the media outputs. We report energy improvements of 12 percent for a set of key multimedia applications with small LUT of 6 Kbytes, compared to 3 percent obtained using previously proposed techniques.Peer ReviewedPostprint (published version

    GeNN: a code generation framework for accelerated brain simulations

    Get PDF
    Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational models of large-scale neuronal networks to address this challenge. GeNN is an open source library that generates code to accelerate the execution of network simulations on NVIDIA GPUs, through a flexible and extensible interface, which does not require in-depth technical knowledge from the users. We present performance benchmarks showing that 200-fold speedup compared to a single core of a CPU can be achieved for a network of one million conductance based Hodgkin-Huxley neurons but that for other models the speedup can differ. GeNN is available for Linux, Mac OS X and Windows platforms. The source code, user manual, tutorials, Wiki, in-depth example projects and all other related information can be found on the project website http://genn-team.github.io/genn/

    Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations

    Get PDF
    Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and memory footprint and bandwidth. However, simply switching to lower-precision types typically results in increased numerical errors. We investigate approaches to improving the accuracy of reduced-precision fixed-point arithmetic types, using examples in an important domain for numerical computation in neuroscience: the solution of Ordinary Differential Equations (ODEs). The Izhikevich neuron model is used to demonstrate that rounding has an important role in producing accurate spike timings from explicit ODE solution algorithms. In particular, fixed-point arithmetic with stochastic rounding consistently results in smaller errors compared to single precision floating-point and fixed-point arithmetic with round-to-nearest across a range of neuron behaviours and ODE solvers. A computationally much cheaper alternative is also investigated, inspired by the concept of dither that is a widely understood mechanism for providing resolution below the least significant bit (LSB) in digital signal processing. These results will have implications for the solution of ODEs in other subject areas, and should also be directly relevant to the huge range of practical problems that are represented by Partial Differential Equations (PDEs).Comment: Submitted to Philosophical Transactions of the Royal Society

    Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program

    Get PDF
    We describe the parallel implementation of our generalized stellar atmosphere and NLTE radiative transfer computer program PHOENIX. We discuss the parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. Our implementation uses a MIMD design based on a relatively small number of MPI library calls. We report the results of test calculations on a number of different parallel computers and discuss the results of scalability tests.Comment: To appear in ApJ, 1997, vol 483. LaTeX, 34 pages, 3 Figures, uses AASTeX macros and styles natbib.sty, and psfig.st

    Classical and all-floating FETI methods for the simulation of arterial tissues

    Full text link
    High-resolution and anatomically realistic computer models of biological soft tissues play a significant role in the understanding of the function of cardiovascular components in health and disease. However, the computational effort to handle fine grids to resolve the geometries as well as sophisticated tissue models is very challenging. One possibility to derive a strongly scalable parallel solution algorithm is to consider finite element tearing and interconnecting (FETI) methods. In this study we propose and investigate the application of FETI methods to simulate the elastic behavior of biological soft tissues. As one particular example we choose the artery which is - as most other biological tissues - characterized by anisotropic and nonlinear material properties. We compare two specific approaches of FETI methods, classical and all-floating, and investigate the numerical behavior of different preconditioning techniques. In comparison to classical FETI, the all-floating approach has not only advantages concerning the implementation but in many cases also concerning the convergence of the global iterative solution method. This behavior is illustrated with numerical examples. We present results of linear elastic simulations to show convergence rates, as expected from the theory, and results from the more sophisticated nonlinear case where we apply a well-known anisotropic model to the realistic geometry of an artery. Although the FETI methods have a great applicability on artery simulations we will also discuss some limitations concerning the dependence on material parameters.Comment: 29 page

    Analog VLSI-Based Modeling of the Primate Oculomotor System

    Get PDF
    One way to understand a neurobiological system is by building a simulacrum that replicates its behavior in real time using similar constraints. Analog very large-scale integrated (VLSI) electronic circuit technology provides such an enabling technology. We here describe a neuromorphic system that is part of a long-term effort to understand the primate oculomotor system. It requires both fast sensory processing and fast motor control to interact with the world. A one-dimensional hardware model of the primate eye has been built that simulates the physical dynamics of the biological system. It is driven by two different analog VLSI chips, one mimicking cortical visual processing for target selection and tracking and another modeling brain stem circuits that drive the eye muscles. Our oculomotor plant demonstrates both smooth pursuit movements, driven by a retinal velocity error signal, and saccadic eye movements, controlled by retinal position error, and can reproduce several behavioral, stimulation, lesion, and adaptation experiments performed on primates

    Correlated Resource Models of Internet End Hosts

    Get PDF
    Understanding and modelling resources of Internet end hosts is essential for the design of desktop software and Internet-distributed applications. In this paper we develop a correlated resource model of Internet end hosts based on real trace data taken from the SETI@home project. This data covers a 5-year period with statistics for 2.7 million hosts. The resource model is based on statistical analysis of host computational power, memory, and storage as well as how these resources change over time and the correlations between them. We find that resources with few discrete values (core count, memory) are well modeled by exponential laws governing the change of relative resource quantities over time. Resources with a continuous range of values are well modeled with either correlated normal distributions (processor speed for integer operations and floating point operations) or log-normal distributions (available disk space). We validate and show the utility of the models by applying them to a resource allocation problem for Internet-distributed applications, and demonstrate their value over other models. We also make our trace data and tool for automatically generating realistic Internet end hosts publicly available
    corecore