848 research outputs found

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    On the efficient representation and execution of deep acoustic models

    Full text link
    In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed quantization scheme leads to significant memory savings and enables the use of optimized hardware instructions for integer arithmetic, thus significantly reducing the cost of inference. Finally, we propose a "quantization aware" training process that applies the proposed scheme during network training and find that it allows us to recover most of the loss in accuracy introduced by quantization. We validate the proposed techniques by applying them to a long short-term memory-based acoustic model on an open-ended large vocabulary speech recognition task.Comment: Accepted conference paper: "The Annual Conference of the International Speech Communication Association (Interspeech), 2016

    An efficient multi-core SIMD implementation for H.264/AVC encoder

    Get PDF
    The optimization process of a H.264/AVC encoder on three different architectures is presented. The architectures are multi- and singlecore and SIMD instruction sets have different vector registers size. The need of code optimization is fundamental when addressing HD resolutions with real-time constraints. The encoder is subdivided in functional modules in order to better understand where the optimization is a key factor and to evaluate in details the performance improvement. Common issues in both partitioning a video encoder into parallel architectures and SIMD optimization are described, and author solutions are presented for all the architectures. Besides showing efficient video encoder implementations, one of the main purposes of this paper is to discuss how the characteristics of different architectures and different set of SIMD instructions can impact on the target application performance. Results about the achieved speedup are provided in order to compare the different implementations and evaluate the more suitable solutions for present and next generation video-coding algorithms

    Fulcrum: Flexible Network Coding for Heterogeneous Devices

    Get PDF
    Producción CientíficaWe introduce Fulcrum, a network coding framework that achieves three seemingly conflicting objectives: 1) to reduce the coding coefficient overhead down to nearly n bits per packet in a generation of n packets; 2) to conduct the network coding using only Galois field GF(2) operations at intermediate nodes if necessary, dramatically reducing computing complexity in the network; and 3) to deliver an end-to-end performance that is close to that of a high-field network coding system for high-end receivers, while simultaneously catering to low-end receivers that decode in GF(2). As a consequence of 1) and 3), Fulcrum has a unique trait missing so far in the network coding literature: providing the network with the flexibility to distribute computational complexity over different devices depending on their current load, network conditions, or energy constraints. At the core of our framework lies the idea of precoding at the sources using an expansion field GF(2 h ), h > 1, to increase the number of dimensions seen by the network. Fulcrum can use any high-field linear code for precoding, e.g., Reed-Solomon or Random Linear Network Coding (RLNC). Our analysis shows that the number of additional dimensions created during precoding controls the trade-off between delay, overhead, and computing complexity. Our implementation and measurements show that Fulcrum achieves similar decoding probabilities as high field RLNC but with encoders and decoders that are an order of magnitude faster.Green Mobile Cloud project (grant DFF-0602-01372B)Colorcast project (grant DFF-0602-02661B)TuneSCode project (grant DFF - 1335-00125)Danish Council for Independent Research (grant DFF-4002-00367)Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grants MTM2012-36917-C03-03 / MTM2015-65764-C3-2-P / MTM2015-69138-REDT)Agencia Estatal de Investigación - Fondo Social Europeo (grant RYC-2016-20208)Aarhus Universitets Forskningsfond Starting (grant AUFF-2017-FLS-7-1
    • …
    corecore