132 research outputs found

    Workload model for video decoding and its applications

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Signal processing for improved MPEG-based communication systems

    Get PDF

    Quality-aware performance analysis for multimedia MPSoC platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Reconfigurable Computing For Video Coding

    Get PDF
    Video coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8*8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. Compared to fixed hardware architecture using ASIC design approach, reconfigurable hardware design approach has higher flexibility, lower cost, and faster time-to-market. We propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture for DCT computations can compute different number of DCT coefficients in the zig-zag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. Prediction algorithm of zero quantized DCT (ZQDCT) to control the run-time reconfiguration of the proposed scalable architecture has been used, and 12 different modes of DCT computations including zonal coding, multi-block processing, and parallel-sequential stage modes are supported to reduce power consumptions, required hardware resources, and computation time with a small quality degradation. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration to meet the requirements set by the users

    Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

    Get PDF
    Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

    QoS framework for video streaming in home networks

    Get PDF
    In this thesis we present a new SNR scalable video coding scheme. An important advantage of the proposed scheme is that it requires just a standard video decoder for processing each layer. The quality of the delivered video depends on the allocation of bit rates to the base and enhancement layers. For a given total bit rate, the combination with a bigger base layer delivers higher quality. The absence of dependencies between frames in enhancement layers makes the system resilient to losses of arbitrary frames from an enhancement layer. Furthermore, that property can be used in a more controlled fashion. An important characteristic of any video streaming scheme is the ability to handle network bandwidth fluctuations. We made a streaming technique that observes the network conditions and based on the observations reconfigures the layer configuration in order to achieve the best possible quality. A change of the network conditions forces a change in the number of layers or the bit rate of these layers. Knowledge of the network conditions allows delivery of a video of higher quality by choosing an optimal layer configuration. When the network degrades, the amount of data transmitted per second is decreased by skipping frames from an enhancement layer on the sender side. The presented video coding scheme allows skipping any frame from an enhancement layer, thus enabling an efficient real-time control over transmission at the network level and fine-grained control over the decoding of video data. The methodology proposed is not MPEG-2 specific and can be applied to other coding standards. We made a terminal resource manager that enables trade-offs between quality and resource consumption due to the use of scalable video coding in combination with scalable video algorithms. The controller developed for the decoding process optimizes the perceived quality with respect to the CPU power available and the amount of input data. The controller does not depend on the type of scalability technique and can therefore be used with any scalable video. The controller uses the strategy that is created offline by means of a Markov Decision Process. During the evaluation it was found that the correctness of the controller behavior depends on the correctness of parameter settings for MDP, so user tests should be employed to find the optimal settings
    corecore