34 research outputs found

    A 249-Mpixel/s HEVC Video-Decoder Chip for 4K Ultra-HD Applications

    Get PDF
    High Efficiency Video Coding, the latest video standard, uses larger and variable-sized coding units and longer interpolation filters than [H.264 over AVC] to better exploit redundancy in video signals. These algorithmic techniques enable a 50% decrease in bitrate at the cost of computational complexity, external memory bandwidth, and, for ASIC implementations, on-chip SRAM of the video codec. This paper describes architectural optimizations for an HEVC video decoder chip. The chip uses a two-stage subpipelining scheme to reduce on-chip SRAM by 56 kbytes-a 32% reduction. A high-throughput read-only cache combined with DRAM-latency-aware memory mapping reduces DRAM bandwidth by 67%. The chip is built for HEVC Working Draft 4 Low Complexity configuration and occupies 1.77 mm[superscript 2] in 40-nm CMOS. It performs 4K Ultra HD 30-fps video decoding at 200 MHz while consuming 1.19 [nJ over pixel] of normalized system power.Texas Instruments Incorporate

    Decoder Hardware Architecture for HEVC

    Get PDF
    This chapter provides an overview of the design challenges faced in the implementation of hardware HEVC decoders. These challenges can be attributed to the larger and diverse coding block sizes and transform sizes, the larger interpolation filter for motion compensation, the increased number of steps in intra prediction and the introduction of a new in-loop filter. Several solutions to address these implementation challenges are discussed. As a reference, results for an HEVC decoder test chip are also presented.Texas Instruments Incorporate

    Scalability of parallel video decoding on heterogeneous manycore architectures

    Get PDF
    This paper presents an analysis of the scalability of the parallel video decoding on heterogeneous many core architectures. As benchmark, we use a highly parallel H.264/AVC video decoder that generates a large number of independent tasks. In order to translate task-level parallelism into performance gains both the video decoder and the architecture have been optimized. The video decoder was modified for exploiting coarse-grain frame-level parallelism in the entropy decoding kernel which has been considered the main bottleneck. Second, a heterogeneous combination of cores is evaluated for executing different type of tasks. Finally, an evaluation of the memory requirements of the whole system has been carried out. Experiments conducted using a trace-driven simulation methodology shows that the evaluated system exhibits a good parallel scalability up to 68 cores. At this point the parallel video decoder is able to decode more than 200 HD frames per second using simple low power processors.Postprint (published version

    SIMD based multicore processor for image and video processing

    Get PDF
    制度:新 ; 報告番号:甲3602号 ; 学位の種類:博士(工学) ; 授与年月日:2012/3/15 ; 早大学位記番号:新595

    A fixed-point simd array processor and its applications to video compression coding

    Get PDF
    A review of image compressing algorithms and their processor architectures -- MPEG standard -- Motion estimation algorithm -- Processor architecture review -- SIMD architecture of the pulse chip -- Implementing a convolution on pulse -- The convolution algorithm versus pulse architectural features -- Structure of the convolution software -- Motion estimation algorithms and implementations -- Motion estimation algorithm -- Gradual ful search method and full search algorithm with the pulse chip -- DCT & IDCT algorithms and implementations -- Image processing with pulse chips and a C40 processor

    Exploration of communication strategies for computation intensive Systems-On-Chip

    Get PDF

    Energy-Efficient Computing for Mobile Signal Processing

    Full text link
    Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research. SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs. Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures. This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd

    Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

    Get PDF

    Design and Analysis of Medium Access Control Protocols for Broadband Wireless Networks

    Get PDF
    The next-generation wireless networks are expected to integrate diverse network architectures and various wireless access technologies to provide a robust solution for ubiquitous broadband wireless access, such as wireless local area networks (WLANs), Ultra-Wideband (UWB), and millimeter-wave (mmWave) based wireless personal area networks (WPANs), etc. To enhance the spectral efficiency and link reliability, smart antenna systems have been proposed as a promising candidate for future broadband access networks. To effectively exploit the increased capabilities of the emerging wireless networks, the different network characteristics and the underlying physical layer features need to be considered in the medium access control (MAC) design, which plays a critical role in providing efficient and fair resource sharing among multiple users. In this thesis, we comprehensively investigate the MAC design in both single- and multi-hop broadband wireless networks, with and without infrastructure support. We first develop mathematical models to identify the performance bottlenecks and constraints in the design and operation of existing MAC. We then use a cross-layer approach to mitigate the identified bottleneck problems. Finally, by evaluating the performance of the proposed protocols with analytical models and extensive simulations, we determine the optimal protocol parameters to maximize the network performance. In specific, a generic analytical framework is developed for capacity study of an IEEE 802.11 WLAN in support of non-persistent asymmetric traffic flows. The analysis can be applied for effective admission control to guarantee the quality of service (QoS) performance of multimedia applications. As the access point (AP) becomes the bottleneck in an infrastructure based WLAN, we explore the multiple-input multiple-output (MIMO) capability in the future IEEE 802.11n WLANs and propose a MIMO-aware multi-user (MU) MAC. By exploiting the multi-user degree of freedom in a MIMO system to allow the AP to communicate with multiple users in the downlink simultaneously, the proposed MU MAC can minimize the AP-bottleneck effect and significantly improve the network capacity. Other enhanced MAC mechanisms, e.g., frame aggregation and bidirectional transmissions, are also studied. Furthermore, different from a narrowband system where simultaneous transmissions by nearby neighbors collide with each other, wideband system can support multiple concurrent transmissions if the multi-user interference can be properly managed. Taking advantage of the salient features of UWB and mmWave communications, we propose an exclusive region (ER) based MAC protocol to exploit the spatial multiplexing gain of centralized UWB and mmWave based wireless networks. Moreover, instead of studying the asymptotic capacity bounds of arbitrary networks which may be too loose to be useful in realistic networks, we derive the expected capacity or transport capacity of UWB and mmWave based networks with random topology. The analysis reveals the main factors affecting the network (transport) capacity, and how to determine the best protocol parameters to maximize the network capacity. In addition, due to limited transmission range, multi-hop relay is necessary to extend the communication coverage of UWB networks. A simple, scalable, and distributed UWB MAC protocol is crucial for efficiently utilizing the large bandwidth of UWB channels and enabling numerous new applications cost-effectively. To address this issue, we further design a distributed asynchronous ER based MAC for multi-hop UWB networks and derive the optimal ER size towards the maximum network throughput. The proposed MAC can significantly improve both network throughput and fairness performance, while the throughput and fairness are usually treated as a tradeoff in other MAC protocols

    Future benefits and applications of intelligent on-board processing to VSAT services

    Get PDF
    The trends and roles of VSAT services in the year 2010 time frame are examined based on an overall network and service model for that period. An estimate of the VSAT traffic is then made and the service and general network requirements are identified. In order to accommodate these traffic needs, four satellite VSAT architectures based on the use of fixed or scanning multibeam antennas in conjunction with IF switching or onboard regeneration and baseband processing are suggested. The performance of each of these architectures is assessed and the key enabling technologies are identified
    corecore