17,229 research outputs found

    A Computation Core for Communication Refinement of Digital Signal Processing Algorithms

    Get PDF
    International audienceThe most popular Moore's law formulation, which states the number of transistors on integrated circuits doubles every 18 months, is said to hold for at least another two decades. According to this prediction, if we want to take advantage of technological evolutions, designer's productivity has to increase in the same proportions. To take up this challenge, system level design solutions have been set up, but many efforts have still to be done on system modelling and synthesis. In this paper we propose a computation core synthesis methodology that can be integrated on the communication refinement steps of electronic system level design tools. In the proposed approach, computation cores used for digital signal processing application specifications relying on coarse grain communications and synchronizations (e.g. matrix) can be refined into computation cores which can handle fine grain communications and synchronizations (e.g. scalar). Its originality is its ability to synthesize computation cores which can handle fine grain data consumptions and productions which respect the intrinsic partial orders of the algorithms while preserving their original functionalities. Such cores can be used to model fine grain input output overlapping or iteration pipelining. Our flow is based on the analysis of a fine grain signal flow graph used to extract fine grain synchronizations and algorithmic expressions

    A Computation Core for Communication Refinement of Digital Signal Processing Algorithms

    Full text link

    Acceleration of stereo-matching on multi-core CPU and GPU

    Get PDF
    This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

    A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

    Full text link
    Information in neural networks is represented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector-matrix multiply when inputs are multiplied by the neural network weights. Conventional processing architectures are not well suited for simulating neural networks, often requiring large amounts of energy and time. Additionally, synapses in biological neural networks are not binary connections, but exhibit a nonlinear response function as neurotransmitters are emitted and diffuse between neurons. Inspired by neuroscience principles, we present a digital neuromorphic architecture, the Spiking Temporal Processing Unit (STPU), capable of modeling arbitrary complex synaptic response functions without requiring additional hardware components. We consider the paradigm of spiking neurons with temporally coded information as opposed to non-spiking rate coded neurons used in most neural networks. In this paradigm we examine liquid state machines applied to speech recognition and show how a liquid state machine with temporal dynamics maps onto the STPU-demonstrating the flexibility and efficiency of the STPU for instantiating neural algorithms.Comment: 8 pages, 4 Figures, Preprint of 2017 IJCN

    Demonstration of Run-time Spatial Mapping of Streaming Applications to a Heterogeneous Multi-Processor System-on-Chip (MPSoC)

    Get PDF
    In this paper, the problem of spatial mapping is defined. Reasons are presented to show why performing spatial mappings at run-time is both necessary and desirable and criteria for the qualitative comparison of spatial mappings are introduced. An algorithm is described that implements a preliminary spatial mapper. The methods used in the algorithm are demonstrated with an illustrative example

    Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems

    Get PDF
    In this paper, we define the problem of spatial mapping. We present reasons why performing spatial mappings at run-time is both necessary and desirable. We propose what is—to our knowledge—the first attempt at a formal description of spatial mappings for the embedded real-time streaming application domain. Thereby, we introduce criteria for a qualitative comparison of these spatial mappings. As an illustration of how our formalization relates to practice, we relate our own spatial mapping algorithm to the formal model

    Design and Implementation of an Extensible Variable Resolution Bathymetric Estimator

    Get PDF
    For grid-based bathymetric estimation techniques, determining the right resolution at which to work is essential. Appropriate grid resolution can be related, roughly, to data density and thence to sonar characteristics, survey methodology, and depth. It is therefore variable in almost all survey scenarios, and methods of addressing this problem can have enormous impact on the correctness and efficiency of computational schemes of this kind. This paper describes the design and implementation of a bathymetric depth estimation algorithm that attempts to address this problem by combining the computational efficiency of locally regular grids with piecewise-variable estimation resolution to provide a single logical data structure and associated algorithms that can adjust to local data conditions, change resolution where required to best support the data, and operate over essentially arbitrarily large areas as a single unit. The algorithm, which is in part a development of CUBE, is modular and extensible, and is structured as a client-server application to support different implementation modalities. The algorithm is called “CUBE with Hierarchical Resolution Techniques”, or CHRT
    corecore