19 research outputs found

    Low complexity hardware oriented H.264/AVC motion estimation algorithm and related low power and low cost architecture design

    Get PDF
    制度:新 ; 報告番号:甲2999号 ; 学位の種類:博士(工学) ; 授与年月日:2010/3/15 ; 早大学位記番号:新525

    Analog parallel processor solutions for video encoding

    Get PDF
    This thesis deals with Cellular Nonlinear Network (CNN) analog parallel processor networks and their implementations in current video coding standards. The target applications are low-power video encoders within 3rd generation mobile terminals. The video codecs of such mobile terminals are defined by either the MPEG-4/H.263 or H.264 video standard. All of these standards are based on the block-based hybrid approach. As block-based motion estimation (ME) is responsible for most of the power consumption of such hybrid video encoders, this thesis deals mostly with low-power ME implementations. Low-power solutions are introduced at both the algorithmic and hardware levels. On the algorithmic level, the introduced implementations are derived from a segmentation algorithm, which has previously been partly realized. The first introduced algorithm reduces the computational complexity of ME within an object-based MPEG-4 encoder. The use of this algorithm enables a 60% drop in the power consumption of Full Search ME. The second algorithm calculates a near-optimal block-size partition for H.264 motion estimation. With this algorithm, the use of computationally complex Lagrange optimization in H.264 ME is not required. The third algorithm reduces the shape bit-rate of an object-based MPEG-4 encoder. On the hardware level a CNN-type ME architecture is introduced. The architecture includes connections and circuitry to fully realize block-based ME. The analog ME implemented with this architecture is capable of lower power than comparable digital realizations. A 9×9 test chip has also been realized. Additionally implemented is a digital predictive ME realization that takes advantage of the introduced partition algorithm. Although the IC layout of the ME algorithm was drawn, the design was verified as an FPGA.reviewe

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Optimisation énergétique de processus de traitement du signal et ses applications au décodage vidéo

    Get PDF
    Consumer electronics offer today more and more features (video, audio, GPS, Internet) and connectivity means (multi-radio systems with WiFi, Bluetooth, UMTS, HSPA, LTE-advanced ... ). The power demand of these devices is growing for the digital part especially for the processing chip. To support this ever increasing computing demand, processor architectures have evolved with multicore processors, graphics processors (GPU) and ether dedicated hardware accelerators. However, the evolution of battery technology is itself slower. Therefore, the autonomy of embedded systems is now under a great pressure. Among the new functionalities supported by mobile devices, video services take a prominent place. lndeed, recent analyzes show that they will represent 70% of mobile Internet traffic by 2016. Accompanying this growth, new technologies are emerging for new services and applications. Among them HEVC (High Efficiency Video Coding) can double the data compression while maintaining a subjective quality equivalent to its predecessor, the H.264 standard. ln a digital circuit, the total power consumption is made of static power and dynamic power. Most of modern hardware architectures implement means to control the power consumption of the system. Dynamic Voltage and Frequency Scaling (DVFS) mainly reduces the dynamic power of the circuit. This technique aims to adapt the power of the processor (and therefore its consumption) to the actual load needed by the application. To control the static power, Dynamic Power Management (DPM or sleep modes) aims to stop the voltage supplies associated with specific areas of the chip. ln this thesis, we first present a model of the energy consumed by the circuit integrating DPM and DVFS modes. This model is generalized to multi-core integrated circuits and to a rapid prototyping tool. Thus, the optimal operating point of a circuit, i.e. the operating frequency and the number of active cores, is identified. Secondly, the HEVC application is integrated to a multicore architecture coupled with a sophisticated DVFS mechanism. We show that this application can be implemented efficiently on general purpose processors (GPP) while minimizing the power consumption. Finally, and to get further energy gain, we propose a modified HEVC decoder that is capable to tune its energy gains together with a decoding quality trade-off.Aujourd'hui, les appareils électroniques offrent de plus en plus de fonctionnalités (vidéo, audio, GPS, internet) et des connectivités variées (multi-systèmes de radio avec WiFi, Bluetooth, UMTS, HSPA, LTE-advanced ... ). La demande en puissance de ces appareils est donc grandissante pour la partie numérique et notamment le processeur de calcul. Pour répondre à ce besoin sans cesse croissant de nouvelles fonctionnalités et donc de puissance de calcul, les architectures des processeurs ont beaucoup évolué : processeurs multi-coeurs, processeurs graphiques (GPU) et autres accélérateurs matériels dédiés. Cependant, alors que de nouvelles architectures matérielles peinent à répondre aux exigences de performance, l'évolution de la technologie des batteries est quant à elle encore plus lente. En conséquence, l'autonomie des systèmes embarqués est aujourd'hui sous pression. Parmi les nouveaux services supportés par les terminaux mobiles, la vidéo prend une place prépondérante. En effet, des analyses récentes de tendance montrent qu'elle représentera 70 % du trafic internet mobile dès 2016. Accompagnant cette croissance, de nouvelles technologies émergent permettant de nouveaux services et applications. Parmi elles, HEVC (High Efficiency Video Coding) permet de doubler la compression de données tout en garantissant une qualité subjective équivalente à son prédécesseur, la norme H.264. Dans un circuit numérique, la consommation provient de deux éléments: la puissance statique et la puissance dynamique. La plupart des architectures matérielles récentes mettent en oeuvre des procédés permettant de contrôler la puissance du système. Le changement dynamique du couple tension/fréquence appelé Dynamic Voltage and Frequency Scaling (DVFS) agit principalement sur la puissance dynamique du circuit. Cette technique permet d'adapter la puissance du processeur (et donc sa consommation) à la charge réelle nécessaire pour une application. Pour contrôler la puissance statique, le Dynamic Power Management (DPM, ou modes de veille) consistant à arrêter les alimentations associées à des zones spécifiques de la puce. Dans cette thèse, nous présentons d'abord une modélisation de l'énergie consommée par le circuit intégrant les modes DVFS et DPM. Cette modélisation est généralisée au circuit multi-coeurs et intégrée à un outil de prototypage rapide. Ainsi le point de fonctionnement optimal d'un circuit, la fréquence de fonctionnement et le nombre de coeurs actifs, est identifié. Dans un second temps, l'application HEVC est intégrée à une architecture multi-coeurs avec une adaptation dynamique de la fréquence de développement. Nous montrons que cette application peut être implémentée efficacement sur des processeurs généralistes (GPP) tout en minimisant la puissance consommée. Enfin, et pour aller plus loin dans les gains en énergie, nous proposons une modification du décodeur HEVC qui permet à un décodeur de baisser encore plus sa consommation en fonction du budget énergétique disponible localement

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    Matrix Transform Imager Architecture for On-Chip Low-Power Image Processing

    Get PDF
    Camera-on-a-chip systems have tried to include carefully chosen signal processing units for better functionality, performance and also to broaden the applications they can be used for. Image processing sensors have been possible due advances in CMOS active pixel sensors (APS) and neuromorphic focal plane imagers. Some of the advantages of these systems are compact size, high speed and parallelism, low power dissipation, and dense system integration. One can envision using these chips for portable and inexpensive video cameras on hand-held devices like personal digital assistants (PDA) or cell-phones In neuromorphic modeling of the retina it would be very nice to have processing capabilities at the focal plane while retaining the density of typical APS imager designs. Unfortunately, these two goals have been mostly incompatible. We introduce our MAtrix Transform Imager Architecture (MATIA) that uses analog floating--gate devices to make it possible to have computational imagers with high pixel densities. The core imager performs computations at the pixel plane, but still has a fill-factor of 46 percent - comparable to the high fill-factors of APS imagers. The processing is performed continuously on the image via programmable matrix operations that can operate on the entire image or blocks within the image. The resulting data-flow architecture can directly perform all kinds of block matrix image transforms. Since the imager operates in the subthreshold region and thus has low power consumption, this architecture can be used as a low-power front end for any system that utilizes these computations. Various compression algorithms (e.g. JPEG), that use block matrix transforms, can be implemented using this architecture. Since MATIA can be used for gradient computations, cheap image tracking devices can be implemented using this architecture. Other applications of this architecture can range from stand-alone universal transform imager systems to systems that can compute stereoscopic depth.Ph.D.Committee Chair: Hasler, Paul; Committee Member: David Anderson; Committee Member: DeWeerth, Steve; Committee Member: Jackson, Joel; Committee Member: Smith, Mar

    Architecture exploration and VLSI design of multi-symbol arithmetic encoders for the AV1 coding format

    Get PDF
    To reduce the impact of videos in the global Internet capacity, companies rely upon video coding standards and formats, also known as codecs, to reduce the overall sizes of videos before transmitting or storing them. AV1, which arises as a promising state-of-the-art and royalties-free video coding format first released in 2018, aims to reduce the sizes of videos by applying novel techniques to boost AV1’s compression results. Amongst its core components, AV1 comprises an entropy coding block, which is re sponsible for losslessly encoding symbols generated by other core modules (e.g., intra prediction, motion compensation, etc.). The arithmetic encoder, which is part of the en tropy encoder, is a bottleneck due to its difficulty to work with parallelizations, and relies upon two primary operations: CDF Operation and Boolean Operation, where CDF stands for Cumulative Distribution Function. This thesis proposes a baseline VLSI design, which was named AE-AV1, as the first ever AV1 arithmetic encoder found in the literature, and capable of reaching ultra-high performance (i.e., processing of 8K@120fps videos in real-time). Moreover, additional versions of this architecture were proposed as AE-AV1-LP and AE-AV1-MB, which are, respectively, a low-power version and a novel design applying a Multi-Boolean technique also introduced in this thesis. All the herein proposed designs were synthesized using the Cadence™ RC tool and the ST 65nm PDK. As the AV1 is well-known for being an open-source alternative in the video coding industry, the AE-AV1 architecture was also synthesized from Verilog to GDSII layout using a fully open-source ASIC flow (i.e., OpenROAD tool, OpenLane flow, and ASAP7 and SkyWater 130nm PDKs). The architectures were capable of reaching frequencies of 581 MHz, 563 MHz and 590 MHz for the versions AE-AV1, AE-AV1-LP and AE-AV1-MB 2-bool, respectively. With regard to throughput rates, all herein introduced architectures are capable of reaching 8K@120fps real-time video processing with rates of 1.032 Gbits/sec, 0.999 Gbits/sec and 1.117 Gbits/sec respectively.Para reduzir o impacto dos vídeos na capacidade global de Internet, as empresas contam com padrões e formatos de codificação de vídeo, também conhecidos como codecs, para reduzir os tamanhos dos vídeos antes de transmiti-los ou armazená-los. O AV1, que surge como um promissor formato de codificação de vídeo de última geração e livre de royal ties lançado pela primeira vez em 2018, visa reduzir os tamanhos dos vídeos aplicando técnicas inovadoras e aprimoradas para aumentar os resultados de compactação do AV1. Entre seus componentes principais, o AV1 compreende um bloco de codificação de en tropia, que é responsável pela codificação sem perdas de símbolos gerados por outros módulos (por exemplo, predição intra-quadro, compensação de movimento, etc.). O co dificador aritmético, que faz parte do codificador de entropia, é um gargalo devido à sua dificuldade em trabalhar com paralelizações e conta com duas operações principais: CDF Operation e Boolean Operation, onde CDF representa Cumulative Distribution Function. Esta dissertação propõe um projeto VLSI digital, nomeado AE-AV1, como o primeiro codificador aritmético AV1 encontrado na literatura e capaz de atingir desempenho ultra high (ou seja, processamento de vídeos 8K@120fps em tempo real). Além disso, ver sões adicionais desta arquitetura foram propostas como AE-AV1-LP e AE-AV1-MB, que são, respectivamente, uma versão de baixo consumo (low-power) e um design inovador aplicando uma técnica Multi-Boolean também introduzida nesta dissertação. Todos os projetos aqui propostos foram sintetizados usando a ferramenta Cadence™ RC e o PDK ST 65nm. Como o AV1 é conhecido por ser uma alternativa de código aberto na indús tria de codificação de vídeo, a arquitetura AE-AV1 também foi sintetizada de Verilog a layout GDSII usando um fluxo ASIC totalmente de código aberto (ou seja, ferramenta OpenROAD, fluxo OpenLane e PDKs ASAP7 e SkyWater 130nm). As arquiteturas foram capazes de atingir frequências de 581 MHz, 563 MHz e 590 MHz nas versões AE-AV1, AE-AV1-LP e AE-AV1-MB 2-bool, respectivamente. Com relação às vazões, todas as arquiteturas são capazes de processar vídeos 8K@120fps em tempo real com taxas de 1.032 Gbits/seg, 0.999 Gbits/seg e 1.117 Gbits/seg respectivamente

    Development of Low Power Image Compression Techniques

    Get PDF
    Digital camera is the main medium for digital photography. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. As a result the new algorithm implemented camera can be used for capturing more images then the previous one. 1) Discrete Cosine Transform (DCT) based JPEG is an accepted standard for lossy compression of still image. Quantisation is mainly responsible for the amount loss in the image quality in the process of lossy compression. A new Energy Quantisation (EQ) method proposed for speeding up the coding and decoding procedure while preserving image qu..
    corecore