43 research outputs found

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Towards flexible hardware/software encoding using H.264

    Get PDF
    As the electronics world continues to expand, bringing smaller and more portable devices to consumers, demands for media access continue to rise. Consumers are seeking the ability to view the wealth of information available on the Internet from devices such as smart phones, tablets, and music players. In addition to Internet browsing, smart phones and tablets in particular look to reinvent phone communication by adding video chat through services such as Skype and FaceTime. Bringing video to mobile platforms requires trade-offs between size, channel capacity, hardware cost, quality, loading times and power consumption. H.264, the current standard for video encoding specifies multiple profiles to support different modes of operation and environments. Creating an H.264 video encoder for a mobile platform requires a proper balance between the aforementioned trade-offs while maintaining flexibility in a real time environment such as video chatting. The goal of this thesis was to investigate the trade-offs of implementing the H.264 Baseline encoding process specifically at low bit rates in hardware and software using Field Programmable Gate Array (FPGA) reconfigurable resources with an embedded processor core on the same chip. To further preserve encoding flexibility, existing encoding parameters were left intact. The Joint Model (JM) Reference encoder modified to include only the Baseline Profile was used as an initial reference point to evaluate the efficacy of the finished encoder. To improve upon the initial software implementation, major software bottlenecks were identified and hardware accelerators were designed aimed at producing a speedup capable of encoding 176x144 or Quarter Common Intermediate Format (QCIF) videos in real-time at 24 Frames Per Second (FPS) or greater. Finally, the hardware/software implementation was analyzed in comparison with the original JM Reference software encoder. This analysis included FPS, bit rate, encoding time, luminance Peak Signal-to-Noise Ratio (Y-PSNR) and associated hardware costs

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    DRM analysis using a simulator of multiprocessor embedded system

    Get PDF
    Mestrado em Engenharia Electrónica e TelecomunicaçõesOs sistemas multiprocessador são uma tecnologia emergente. O projecto Hijdra, que está a ser desenvolvido na “NXP semiconductors Research” é um sistema multiprocessador de tempo real que corre aplicações com constrangimentos do tipo “hard” e “soft”. Nestes sistemas, os processadores comunicam através de uma rede de silício. As aplicações que correm no sistema multiprocessador consistem em múltiplas tarefas que correm em processadores embutidos. Achar soluções para o mapeamento das tarefas é o maior problema destes sistemas. Uma aplicação para este sistema que tem vindo a ser estudada é o “Car Radio”. Esta dissertação diz respeito a uma aplicação de rádio digital (DRM) na arquitectura Hijdra. Neste contexto, uma aplicação de um receptor de DRM foi estudada. Um modelo de análise de “Data Flow” foi extraído a partir da aplicação, foi estudada a latência introduzida na rede de silício pela introdução de um novo processador (acelerador de Viterbi) e foi estudada a possibilidade do mapeamento das várias tarefas da aplicação em diferentes processadores a correr em paralelo. Muitas estratégias ainda ficaram por definir a fim de optimizar o desempenho da aplicação do receptor de DRM de modo a esta poder trabalhar de uma forma mais eficaz. ABSTRACT: Multiprocessor systems are an emerging technology. The Hijdra project being developed at NXP semiconductors Research is a multiprocessor system running with both hard and soft real time streaming media jobs. These jobs consist of multiple tasks running on embedded multiprocessors. Finding good solutions for job mapping is the main problem of these systems. One application which has being studied for Hijdra is the “Car Radio”. This thesis concerns the study of a digital radio receptor application (DRM) in Hijdra architecture. In this context, a data flow model of analysis was extracted from the application, the latency introduced by the addition of a new tile (Viterbi accelerator) and eventual speed gains were studied and the possibility of mapping the different tasks of the application in different processors was foreseen. Many strategies were yet to be defined in order to optimize the application performance so it can work more effectively in the multiprocessor system

    Hardware Acceleration of the Robust Header Compression (RoHC) Algorithm

    Get PDF
    With the proliferation of Long Term Evolution (LTE) networks, many cellular carriers are embracing the emerging eld of mobile Voice over Internet Protocol (VoIP). The robust header compression (RoHC) framework was introduced as a part of the LTE Layer 2 stack to compress the large headers of the VoIP packets before transmitted over LTE IP-based architectures. The headers, which are encapsulated Real-time Transport Protocol (RTP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, are large compared to the small payload. This header-compression scheme is especially useful for ecient utilization of the radio bandwidth and network resources. In an LTE base-station implementation, RoHC is a processing-intensive algorithm that may be the bottleneck of the system, and thus, may be the limiting factor when it comes to number of users served. In this thesis, a hardware-software and a full-hardware solution are proposed, targeting LTE base-stations to accelerate this computationally intensive algorithm and enhance the throughput and the capacity of the system. The results of both solutions are discussed and compared with respect to design metrics like throughput, capacity, power consumption, chip area and exibility. This comparison is instrumental in taking architectural level trade-o decisions in-order to meet the present day requirements and also be ready to support future evolution. In terms of throughput, a gain of 20% (6250 packets/sec can be processed at a frequency of 150 MHz) is achieved in the HW-SW solution compared to the SW-Only solution by implementing the Cyclic Redundancy Check (CRC) and the Least Signicant Bit(LSB) encoding blocks as hardware accelerators . Whereas, a Full-HW implementation leads to a throughput of 45 times (244000 packets/sec can be processed at a frequency of 100 MHz) the throughput of the SW-Only solution. However, the full-HW solution consumes more Lookup Tables (LUTs) when it is synthesized on an Field-Programmable Gate Array (FPGA) platform compared to the HW-SW solution. In Arria II GX, the HW-SW and the full-HW solutions use 2578 and 7477 LUTs and consume 1.5 and 0.9 Watts, respectively. Finally, both solutions are synthesized and veried on Altera's Arria II GX FPGA

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Challenges and solutions in H.265/HEVC for integrating consumer electronics in professional video systems

    Get PDF

    Platforms for handling and development of audiovisual data

    Get PDF
    Estágio realizado na MOG Solutions e orientado por Vítor TeixeiraTese de mestrado integrado. Engenharia Informátca e Computação. Faculdade de Engenharia. Universidade do Porto. 200
    corecore