224 research outputs found

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    SIMD based multicore processor for image and video processing

    Get PDF
    制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

    Digital Surveillance Based on Video CODEC System-on-a-Chip (SoC) Platforms

    Get PDF
    Today, most conventional surveillance networks are based on analog system, which has a lot of constraints like manpower and high-bandwidth requirements. It becomes the barrier for today’s surveillance network development. This dissertation describes a digital surveillance network architecture based on the H.264 coding/decoding (CODEC) System-on-a-Chip (SoC) platform. The proposed digital surveillance network architecture includes three major layers: software layer, hardware layer, and the network layer. The following outlines the contributions to the proposed digital surveillance network architecture. (1) We implement an object recognition system and an object categorization system on the software layer by applying several Digital Image Processing (DIP) algorithms. (2) For better compression ratio and higher video quality transfer, we implement two new modules on the hardware layer of the H.264 CODEC core, i.e., the background elimination module and the Directional Discrete Cosine Transform (DDCT) module. (3) Furthermore, we introduce a Digital Signal Processor (DSP) sub-system on the main bus of H.264 SoC platforms as the major hardware support system for our software architecture. Thus we combine the software and hardware platforms to be an intelligent surveillance node. Lab results show that the proposed surveillance node can dramatically save the network resources like bandwidth and storage capacity

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Pyramic array: An FPGA based platform for many-channel audio acquisition

    Get PDF
    Array processing of audio data has many interesting applications: acoustic beamforming, source separation, indoor localization, room geometry estimation, etc. Recent advances in MEMS has produced tiny microphones, analog or even with digital converter integrated. This opens the door to create arrays with a massive number of microphones. We dub such an array many-channel by analogy to many-core processors.Microphone arrays techniques present compelling applications for robotic implementations. Those techniques can allow robots to listen to their environment and infer clues from it. Such features might enable capabilities such as natural interaction with humans, interpreting spoken commands or the localization of victims during search and rescue tasks. However, under noisy conditions robotic implementations of microphone arrays might degrade their precision when localizing sound sources. For practical applications, human hearing still leaves behind microphone arrays. Daniel Kisch is an example of how humans are able to efficiently perform echo-localization to recognize their environment, even in noisy and reverberant environments. For ubiquitous computing, another limitation of acoustic localization algorithms is within their capabilities of performing real-time Digital Signal Processing (DSP) operations. To tackle those problems, tradeoffs between size, weight, cost and power consumption compromise the design of acoustic sensors for practical applications. This work presents the design and operation of a large microphone array for DSP applications in realistic environments. To address those problems this project introduces the Pyramic sound capture system designed at LAP in EPFL. Pyramic is a custom hardware which possesses 48 microphones dis- tributed in the edges of a tetrahedron. The microphone arrays interact with a Terasic DE1-SoC board from Altera Cyclone V family devices, which combines a Hard Processor System (HPS) and a Field Programmable Gate Array (FPGA) in the same die. The HPS part integrates a dual- core ARM-based Cortex-A9 processor, which combined with the power of FPGA design suitable for processing multichannel microphone signals. This thesis explains the implementation of the Pyramic array. Moreover, FPGA-based hardware accelerators have been designed to imple- ment a Master SPI communication with the array and a parallel 48 channels FIR filters cascade of the audio data for delay-and-sum beamforming applications. Additionally, the configura- tion of the HPS part allows the Pyramic array to be controlled through a Linux based OS. The main purpose of the project is to develop a flexible platform in which real-time echo-location algorithms can be implemented. The effectiveness of the Pyramic array design is illustrated by testing the recorded data with offline direction of arrival algorithms developed at LCAV in EPFL

    A configurable vector processor for accelerating speech coding algorithms

    Get PDF
    The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

    Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

    Get PDF

    VLSI implementation of a massively parallel wavelet based zerotree coder for the intelligent pixel array

    Get PDF
    In the span of a few years, mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technologic fronts. Mobile video communications in particular encompasses a number of technical hurdles that generally steer technological advancements towards devices that are low in complexity, low in power usage yet perform the given task efficiently. Devices of this nature have been made available through the use of massively parallel processing arrays such as the Intelligent Pixel Processing Array. The Intelligent Pixel Processing array is a novel concept that integrates a parallel image capture mechanism, a parallel processing component and a parallel display component into a single chip solution geared toward mobile communications environments, be it a PDA based system or the video communicator wristwatch portrayed in Dick Tracy episodes. This thesis details work performed to provide an efficient, low power, low complexity solution surrounding the massively parallel implementation of a zerotree entropy codec for the Intelligent Pixel Array
    corecore