Search CORE

13 research outputs found

Decoder Hardware Architecture for HEVC

Author: C-T Huang
C-T Huang
D Marpe
D Zhou
DF Finchelstein
DF Finchelstein
J Vanne
K Kawakami
K Xu
P Tummeltshammer
P Zhang
T Xanthopoulos
V Sze
V Sze
Y Yi
YC Yang
Z Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This chapter provides an overview of the design challenges faced in the implementation of hardware HEVC decoders. These challenges can be attributed to the larger and diverse coding block sizes and transform sizes, the larger interpolation filter for motion compensation, the increased number of steps in intra prediction and the introduction of a new in-loop filter. Several solutions to address these implementation challenges are discussed. As a reference, results for an HEVC decoder test chip are also presented.Texas Instruments Incorporate

DSpace@MIT

Crossref

Image and Video Coding/Transcoding: A Rate Distortion Approach

Author: Yu Xiang
Publication venue: 'University of Waterloo'
Publication date: 01/01/2008
Field of study

Due to the lossy nature of image/video compression and the expensive bandwidth and computation resources in a multimedia system, one of the key design issues for image and video coding/transcoding is to optimize trade-off among distortion, rate, and/or complexity. This thesis studies the application of rate distortion (RD) optimization approaches to image and video coding/transcoding for exploring the best RD performance of a video codec compatible to the newest video coding standard H.264 and for designing computationally efficient down-sampling algorithms with high visual fidelity in the discrete Cosine transform (DCT) domain. RD optimization for video coding in this thesis considers two objectives, i.e., to achieve the best encoding efficiency in terms of minimizing the actual RD cost and to maintain decoding compatibility with the newest video coding standard H.264. By the actual RD cost, we mean a cost based on the final reconstruction error and the entire coding rate. Specifically, an operational RD method is proposed based on a soft decision quantization (SDQ) mechanism, which has its root in a fundamental RD theoretic study on fixed-slope lossy data compression. Using SDQ instead of hard decision quantization, we establish a general framework in which motion prediction, quantization, and entropy coding in a hybrid video coding scheme such as H.264 are jointly designed to minimize the actual RD cost on a frame basis. The proposed framework is applicable to optimize any hybrid video coding scheme, provided that specific algorithms are designed corresponding to coding syntaxes of a given standard codec, so as to maintain compatibility with the standard. Corresponding to the baseline profile syntaxes and the main profile syntaxes of H.264, respectively, we have proposed three RD algorithms---a graph-based algorithm for SDQ given motion prediction and quantization step sizes, an algorithm for residual coding optimization given motion prediction, and an iterative overall algorithm for jointly optimizing motion prediction, quantization, and entropy coding---with them embedded in the indicated order. Among the three algorithms, the SDQ design is the core, which is developed based on a given entropy coding method. Specifically, two SDQ algorithms have been developed based on the context adaptive variable length coding (CAVLC) in H.264 baseline profile and the context adaptive binary arithmetic coding (CABAC) in H.264 main profile, respectively. Experimental results for the H.264 baseline codec optimization show that for a set of typical testing sequences, the proposed RD method for H.264 baseline coding achieves a better trade-off between rate and distortion, i.e., 12\% rate reduction on average at the same distortion (ranging from 30dB to 38dB by PSNR) when compared with the RD optimization method implemented in H.264 baseline reference codec. Experimental results for optimizing H.264 main profile coding with CABAC show 10\% rate reduction over a main profile reference codec using CABAC, which also suggests 20\% rate reduction over the RD optimization method implemented in H.264 baseline reference codec, leading to our claim of having developed the best codec in terms of RD performance, while maintaining the compatibility with H.264. By investigating trade-off between distortion and complexity, we have also proposed a designing framework for image/video transcoding with spatial resolution reduction, i.e., to down-sample compressed images/video with an arbitrary ratio in the DCT domain. First, we derive a set of DCT-domain down-sampling methods, which can be represented by a linear transform with double-sided matrix multiplication (LTDS) in the DCT domain. Then, for a pre-selected pixel-domain down-sampling method, we formulate an optimization problem for finding an LTDS to approximate the given pixel-domain method to achieve the best trade-off between visual quality and computational complexity. The problem is then solved by modeling an LTDS with a multi-layer perceptron network and using a structural learning with forgetting algorithm for training the network. Finally, by selecting a pixel-domain reference method with the popular Butterworth lowpass filtering and cubic B-spline interpolation, the proposed framework discovers an LTDS with better visual quality and lower computational complexity when compared with state-of-the-art methods in the literature

University of Waterloo's Institutional Repository

Recommended from our members

Design Space Exploration of Accelerators for Warehouse Scale Computing

Author: Lottarini Andrea
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

With Moore’s law grinding to a halt, accelerators are one of the ways that new silicon can improve performance, and they are already a key component in modern datacenters. Accelerators are integrated circuits that implement parts of an application with the objective of higher energy efficiency compared to execution on a standard general purpose CPU. Many accelerators can target any particular workload, generally with a wide range of performance, and costs such as area or power. Exploring these design choices, called Design Space Exploration (DSE), is a crucial step in trying to find the most efficient accelerator design, the one that produces the largest reduction of the total cost of ownership. This work aims to improve this design space exploration phase for accelerators and to avoid pitfalls in the process. This dissertation supports the thesis that early design choices – including the level of specialization – are critical for accelerator development and therefore require benchmarks reflective of production workloads. We present three studies that support this thesis. First, we show how to benchmark datacenter applications by creating a benchmark for large video sharing infrastructures. Then, we present two studies focused on accelerators for analytical query processing. The first is an analysis on the impact of Network on Chip specialization while the second analyses the impact of the level of specialization. The first part of this dissertation introduces vbench: a video transcoding benchmark tailored to the growing video-as-a-service market. Video transcoding is not accurately represented in current computer architecture benchmarks such as SPEC or PARSEC. Despite posing a big computational burden for cloud video providers, such as YouTube and Facebook, it is not included in cloud benchmarks such as CloudSuite. Using vbench, we found that the microarchitectural profile of video transcoding is highly dependent on the input video, that SIMD extensions provide limited benefits, and that commercial hardware transcoders impose tradeoffs that are not ideal for cloud video providers. Our benchmark should spur architectural innovations for this critical workload. This work shows how to benchmark a real world warehouse scale application and the possible pitfalls in case of a mischaracterization. When considering accelerators for the different, but no less important, application of analytical query processing, design space exploration plays a critical role. We analyzed the Q100, a class of accelerators for this application domain, using TPC-H as the reference benchmark. We found that the hardware computational blocks have to be tailored to the requirements of the application, but also the Network on Chip (NoC) can be specialized. We developed an algorithm capable of producing more effective Q100 designs by tailoring the NoC to the communication requirements of the system. Our algorithm is capable of producing designs that are Pareto optimal compared to standard NoC topologies. This shows how NoC specialization is highly effective for accelerators and it should be an integral part of design space exploration for large accelerators’ designs. The third part of this dissertation analyzes the impact of the level of specialization, e.g. using an ASIC or Coarse Grain Reconfigurable Architecture (CGRA) implementation, on an accelerator performance. We developed a CGRA architecture capable of executing SQL query plans. We compare this architecture against Q100, an ASIC that targets the same class of workloads. Despite being less specialized, this programmable architecture shows comparable performance to the Q100 given an area and power budget. Resource usage explains this counterintuitive result, since a well programmed, homogeneous array of resources is able to more effectively harness silicon for the workload at hand. This suggests that a balanced accelerator research portfolio must include alternative programmable architectures – and their software stacks

Columbia University Academic Commons

Energy-aware adaptive solutions for multimedia delivery to wireless devices

Author: Kennedy Martin
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/11/2017
Field of study

The functionality of smart mobile devices is improving rapidly but these devices are limited in terms of practical use because of battery-life. This situation cannot be remedied by simply installing batteries with higher capacities in the devices. There are strict limitations in the design of a smartphone, in terms of physical space, that prohibit this “quick-fix” from being possible. The solution instead lies with the creation of an intelligent, dynamic mechanism for utilizing the hardware components on a device in an energy-efficient manner, while also maintaining the Quality of Service (QoS) requirements of the applications running on the device. This thesis proposes the following Energy-aware Adaptive Solutions (EASE): 1. BaSe-AMy: the Battery and Stream-aware Adaptive Multimedia Delivery (BaSe-AMy) algorithm assesses battery-life, network characteristics, video-stream properties and device hardware information, in order to dynamically reduce the power consumption of the device while streaming video. The algorithm computes the most efficient strategy for altering the characteristics of the stream, the playback of the video, and the hardware utilization of the device, dynamically, while meeting application’s QoS requirements. 2. PowerHop: an algorithm which assesses network conditions, device power consumption, neighboring node devices and QoS requirements to decide whether to adapt the transmission power or the number of hops that a device uses for communication. PowerHop’s ability to dynamically reduce the transmission power of the device’s Wireless Network Interface Card (WNIC) provides scope for reducing the power consumption of the device. In this case shorter transmission distances with multiple hops can be utilized to maintain network range. 3. A comprehensive survey of adaptive energy optimizations in multimedia-centric wireless devices is also provided. Additional contributions: 1. A custom video comparison tool was developed to facilitate objective assessment of streamed videos. 2. A new solution for high-accuracy mobile power logging was designed and implemented

Irish Universities

DCU Online Research Access Service

Efficient streaming for high fidelity imaging

Author: McNamee Joshua
Publication venue
Publication date
Field of study

Researchers and practitioners of graphics, visualisation and imaging have an ever-expanding list of technologies to account for, including (but not limited to) HDR, VR, 4K, 360°, light field and wide colour gamut. As these technologies move from theory to practice, the methods of encoding and transmitting this information need to become more advanced and capable year on year, placing greater demands on latency, bandwidth, and encoding performance. High dynamic range (HDR) video is still in its infancy; the tools for capture, transmission and display of true HDR content are still restricted to professional technicians. Meanwhile, computer graphics are nowadays near-ubiquitous, but to achieve the highest fidelity in real or even reasonable time a user must be located at or near a supercomputer or other specialist workstation. These physical requirements mean that it is not always possible to demonstrate these graphics in any given place at any time, and when the graphics in question are intended to provide a virtual reality experience, the constrains on performance and latency are even tighter. This thesis presents an overall framework for adapting upcoming imaging technologies for efficient streaming, constituting novel work across three areas of imaging technology. Over the course of the thesis, high dynamic range capture, transmission and display is considered, before specifically focusing on the transmission and display of high fidelity rendered graphics, including HDR graphics. Finally, this thesis considers the technical challenges posed by incoming head-mounted displays (HMDs). In addition, a full literature review is presented across all three of these areas, detailing state-of-the-art methods for approaching all three problem sets. In the area of high dynamic range capture, transmission and display, a framework is presented and evaluated for efficient processing, streaming and encoding of high dynamic range video using general-purpose graphics processing unit (GPGPU) technologies. For remote rendering, state-of-the-art methods of augmenting a streamed graphical render are adapted to incorporate HDR video and high fidelity graphics rendering, specifically with regards to path tracing. Finally, a novel method is proposed for streaming graphics to a HMD for virtual reality (VR). This method utilises 360° projections to transmit and reproject stereo imagery to a HMD with minimal latency, with an adaptation for the rapid local production of depth maps

Warwick Research Archives Portal Repository

High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration

Author: Bezati Endri
Publication venue: Lausanne, EPFL
Publication date: 26/05/2015
Field of study

The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promising behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for synthesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations

Infoscience - École polytechnique fédérale de Lausanne

Robust Visual Heart Rate Estimation

Author: Špetlík Radim
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date
Field of study

Je představena nová metoda odhadu srdeční frekvence, HR-CNN - dvoustupňová konvoluční neuronová síť. Síť je trénována end-to-end alternující optimalizací a je robustní vůči změnám osvětlení a relativnímu pohybu snímaného objektu a kamery. Síť funguje dobře s nepřesně registrovaným obličejem z komerčního obličejového detektoru. Z rozsáhlého rozboru relevantních zdrojů vyplývají klíčové faktory omezující přesnost a reprodukovatelnost metod jako: (i) nedostatek veřejně dostupných datových sad a nedostatečně popsané experimenty v publikovaných článcích, (ii) použití nespolehlivého pulzního oximetru pro referenční ground-truth, (iii) chybějící standardní experimentální protokoly. Je představena nová veřejně dostupná datová sada ECG-Fitness, která obsahuje 205 minutových videí, v nichž 17 dobrovolníků cvičí na posilovacích strojích. Dobrovolníci provádí celkem 4 aktivity (rozhovor, veslování, cvičení na stepperu a na rotopedu). Každá aktivita je zachycena dvěma RGB kamerami, z nichž jedna je připevněna k právě používanému posilovacímu stroji, který výrazně vibruje, a druhá je uchycena na samostatně stojícím stativu. Aktivity "veslování" a "rozhovor" opakují dobrovolníci dvakrát. Při druhém opakování jsou osvětleni halogenovou lampou. 4 dobrovolníci jsou osvětleni LED světlem ve všech šesti videích. HR-CNN má o více jak polovinu lepší výsledky než dosud publikované metody. Každá aktivita v ECG-Fitness datasetu představuje jinou kombinaci realistických výzev. HR-CNN má nejlepší výsledky v případě aktivity "veslování" s průměrnou absolutní chybou 3.94 a nejhorší v případě aktivity "rozhovor" s průměrnou absolutní chybou 15.57.A novel heart rate estimator, HR-CNN - a two-step convolutional neural network, is presented. The network is trained end-to-end by alternating optimization to be robust to illumination changes and relative movement of the subject and the camera. The network works well with images of the face roughly aligned by an of-the-shelf commercial frontal face detector. An extensive review of the literature on visual heart rate estimation identifies key factors limiting the performance and reproducibility of the methods as: (i) a lack of publicly available datasets and incomplete description of published experiments, (ii) use of unreliable pulse oximeters for the ground-truth reference, (iii) missing standard experimental protocols. A new challenging publicly available ECG-Fitness dataset with 205 sixty-second videos of subjects performing physical exercises is introduced. The dataset includes 17 subjects performing 4 activities (talking, rowing, exercising on a stepper and a stationary bike) captured by two RGB cameras, one attached to the currently used fitness machine that significantly vibrates, the other one to a separately standing tripod. With each subject, "rowing" and "talking" activity is repeated with a halogen lamp lighting. In case of 4 subjects, the whole recording session is also lighted by an LED light. HR-CNN outperforms the published methods on the dataset reducing error by more than a half. Each ECG-Fitness activity contains a different combination of realistic challenges. The HR-CNN method performs the best in case of the "rowing" activity with the mean absolute error 3.94, and the worst in case of the "talking" activity with the mean absolute error 15.57

Digital Library of the Czech Technical University in Prague

Depth-Map Image Compression Based on Region and Contour Modeling

Author: Schiopu Ionut
Publication venue: Tampere University of Technology
Publication date: 01/01/2016
Field of study

In this thesis, the problem of depth-map image compression is treated. The compilation of articles included in the thesis provides methodological contributions in the fields of lossless and lossy compression of depth-map images.The first group of methods addresses the lossless compression problem. The introduced methods are using the approach of representing the depth-map image in terms of regions and contours. In the depth-map image, a segmentation defines the regions, by grouping pixels having similar properties, and separates them using (region) contours. The depth-map image is encoded by the contours and the auxiliary information needed to reconstruct the depth values in each region.One way of encoding the contours is to describe them using two matrices of horizontal and vertical contour edges. The matrices are encoded using template context coding where each context tree is optimally pruned. In certain contexts, the contour edges are found deterministically using only the currently available information. Another way of encoding the contours is to describe them as a sequence of contour segments. Each such segment is defined by an anchor (starting) point and a string of contour edges, equivalent to a string of chain-code symbols. Here we propose efficient ways to select and encode the anchor points and to generate contour segments by using a contour crossing point analysis and by imposing rules that help in minimizing the number of anchor points.The regions are reconstructed at the decoder using predictive coding or the piecewise constant model representation. In the first approach, the large constant regions are found and one depth value is encoded for each such region. For the rest of the image, suitable regions are generated by constraining the local variation of the depth level from one pixel to another. The nonlinear predictors selected specifically for each region are combining the results of several linear predictors, each fitting optimally a subset of pixels belonging to the local neighborhood. In the second approach, the depth value of a given region is encoded using the depth values of the neighboring regions already encoded. The natural smoothness of the depth variation and the mutual exclusiveness of the values in neighboring regions are exploited to efficiently predict and encode the current region's depth value.The second group of methods is studying the lossy compression problem. In a first contribution, different segmentations are generated by varying the threshold for the depth local variability. A lossy depth-map image is obtained for each segmentation and is encoded based on predictive coding, quantization and context tree coding. In another contribution, the lossy versions of one image are created either by successively merging the constant regions of the original image, or by iteratively splitting the regions of a template image using horizontal or vertical line segments. Merging and splitting decisions are greedily taken, according to the best slope towards the next point in the rate-distortion curve. An entropy coding algorithm is used to encode each image.We propose also a progressive coding method for coding the sequence of lossy versions of a depth-map image. The bitstream is encoded so that any lossy version of the original image is generated, starting from a very low resolution up to lossless reconstruction. The partitions of the lossy versions into regions are assumed to be nested so that a higher resolution image is obtained by splitting some regions of a lower resolution image. A current image in the sequence is encoded using the a priori information from a previously encoded image: the anchor points are encoded relative to the already encoded contour points; the depth information of the newly resulting regions is recovered using the depth value of the parent region.As a final contribution, the dissertation includes a study of the parameterization of planar models. The quantized heights at three-pixel locations are used to compute the optimal plane for each region. The three-pixel locations are selected so that the distortion due to the approximation of the plane over the region is minimized. The planar model and the piecewise constant model are competing in the merging process, where the two regions to be merged are those ensuring the optimal slope in the rate-distortion curve

Trepo - Institutional Repository of Tampere University

Practical Real-Time with Look-Ahead Scheduling

Author: Roitzsch Michael
Publication venue
Publication date: 19/09/2013
Field of study

In my dissertation, I present ATLAS — the Auto-Training Look-Ahead Scheduler. ATLAS improves service to applications with regard to two non-functional properties: timeliness and overload detection. Timeliness is an important requirement to ensure user interface responsiveness and the smoothness of multimedia operations. Overload can occur when applications ask for more computation time than the machine can offer. Interactive systems have to handle overload situations dynamically at runtime. ATLAS provides timely service to applications, accessible through an easy-to-use interface. Deadlines specify timing requirements, workload metrics describe jobs. ATLAS employs machine learning to predict job execution times. Deadline misses are detected before they occur, so applications can react early.:1 Introduction 2 Anatomy of a Desktop Application 3 Real Simple Real-Time 4 Execution Time Prediction 5 System Scheduler 6 Timely Service 7 The Road Ahead Bibliography Inde

Technische Universität Dresden: Qucosa

JTIT

Author
Publication venue: 'National Institute of Telecommunications'
Publication date
Field of study

kwartalni

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library