245 research outputs found

    DyPS: Dynamic Processor Switching for Energy-Aware Video Decoding on Multi-core SoCs

    Full text link
    In addition to General Purpose Processors (GPP), Multicore SoCs equipping modern mobile devices contain specialized Digital Signal Processor designed with the aim to provide better performance and low energy consumption properties. However, the experimental measurements we have achieved revealed that system overhead, in case of DSP video decoding, causes drastic performances drop and energy efficiency as compared to the GPP decoding. This paper describes DyPS, a new approach for energy-aware processor switching (GPP or DSP) according to the video quality . We show the pertinence of our solution in the context of adaptive video decoding and describe an implementation on an embedded Linux operating system with the help of the GStreamer framework. A simple case study showed that DyPS achieves 30% energy saving while sustaining the decoding performanc

    Distributed Coding/Decoding Complexity in Video Sensor Networks

    Get PDF
    Video Sensor Networks (VSNs) are recent communication infrastructures used to capture and transmit dense visual information from an application context. In such large scale environments which include video coding, transmission and display/storage, there are several open problems to overcome in practical implementations. This paper addresses the most relevant challenges posed by VSNs, namely stringent bandwidth usage and processing time/power constraints. In particular, the paper proposes a novel VSN architecture where large sets of visual sensors with embedded processors are used for compression and transmission of coded streams to gateways, which in turn transrate the incoming streams and adapt them to the variable complexity requirements of both the sensor encoders and end-user decoder terminals. Such gateways provide real-time transcoding functionalities for bandwidth adaptation and coding/decoding complexity distribution by transferring the most complex video encoding/decoding tasks to the transcoding gateway at the expense of a limited increase in bit rate. Then, a method to reduce the decoding complexity, suitable for system-on-chip implementation, is proposed to operate at the transcoding gateway whenever decoders with constrained resources are targeted. The results show that the proposed method achieves good performance and its inclusion into the VSN infrastructure provides an additional level of complexity control functionality

    An efficient hardware architecture for H.264 adaptive deblocking filter algorithm

    Get PDF
    This paper presents an efficient hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264 video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 72 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can code 30 CIF frames (352x288) per second

    Motion correlation based low complexity and low power schemes for video codec

    Get PDF
    制度:新 ; 報告番号:甲3750号 ; 学位の種類:博士(工学) ; 授与年月日:2012/11/19 ; 早大学位記番号:新6121Waseda Universit

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

    Get PDF

    Statistical framework for video decoding complexity modeling and prediction

    Get PDF
    Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time - feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding

    Parallel HEVC Decoding on Multi- and Many-core Architectures : A Power and Performance Analysis

    Get PDF
    The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially

    Flexible Macroblock Ordering for Context-Aware Ultrasound Video Transmission over Mobile WiMAX

    Get PDF
    The most recent network technologies are enabling a variety of new applications, thanks to the provision of increased bandwidth and better management of Quality of Service. Nevertheless, telemedical services involving multimedia data are still lagging behind, due to the concern of the end users, that is, clinicians and also patients, about the low quality provided. Indeed, emerging network technologies should be appropriately exploited by designing the transmission strategy focusing on quality provision for end users. Stemming from this principle, we propose here a context-aware transmission strategy for medical video transmission over WiMAX systems. Context, in terms of regions of interest (ROI) in a specific session, is taken into account for the identification of multiple regions of interest, and compression/transmission strategies are tailored to such context information. We present a methodology based on H.264 medical video compression and Flexible Macroblock Ordering (FMO) for ROI identification. Two different unequal error protection methodologies, providing higher protection to the most diagnostically relevant data, are presented

    A Deeply Pipelined CABAC Decoder for HEVC Supporting Level 6.2 High-tier Applications

    Get PDF
    High Efficiency Video Coding (HEVC) is the latest video coding standard that specifies video resolutions up to 8K Ultra-HD (UHD) at 120 fps to support the next decade of video applications. This results in high-throughput requirements for the context adaptive binary arithmetic coding (CABAC) entropy decoder, which was already a well-known bottleneck in H.264/AVC. To address the throughput challenges, several modifications were made to CABAC during the standardization of HEVC. This work leverages these improvements in the design of a high-throughput HEVC CABAC decoder. It also supports the high-level parallel processing tools introduced by HEVC, including tile and wavefront parallel processing. The proposed design uses a deeply pipelined architecture to achieve a high clock rate. Additional techniques such as the state prefetch logic, latched-based context memory, and separate finite state machines are applied to minimize stall cycles, while multibypass- bin decoding is used to further increase the throughput. The design is implemented in an IBM 45nm SOI process. After place-and-route, its operating frequency reaches 1.6 GHz. The corresponding throughputs achieve up to 1696 and 2314 Mbin/s under common and theoretical worst-case test conditions, respectively. The results show that the design is sufficient to decode in real-time high-tier video bitstreams at level 6.2 (8K UHD at 120 fps), or main-tier bitstreams at level 5.1 (4K UHD at 60 fps) for applications requiring sub-frame latency, such as video conferencing
    corecore