108 research outputs found

    Computational Complexity Optimization on H.264 Scalable/Multiview Video Coding

    Get PDF
    The H.264/MPEG-4 Advanced Video Coding (AVC) standard is a high efficiency and flexible video coding standard compared to previous standards. The high efficiency is achieved by utilizing a comprehensive full search motion estimation method. Although the H.264 standard improves the visual quality at low bitrates, it enormously increases the computational complexity. The research described in this thesis focuses on optimization of the computational complexity on H.264 scalable and multiview video coding. Nowadays, video application areas range from multimedia messaging and mobile to high definition television, and they use different type of transmission systems. The Scalable Video Coding (SVC) extension of the H.264/AVC standard is able to scale the video stream in order to adapt to a variety of devices with different capabilities. Furthermore, a rate control scheme is utilized to improve the visual quality under the constraints of capability and channel bandwidth. However, the computational complexity is increased. A simplified rate control scheme is proposed to reduce the computational complexity. In the proposed scheme, the quantisation parameter can be computed directly instead of using the exhaustive Rate-Quantization model. The linear Mean Absolute Distortion (MAD) prediction model is used to predict the scene change, and the quantisation parameter will be increased directly by a threshold when the scene changes abruptly; otherwise, the comprehensive Rate-Quantisation model will be used. Results show that the optimized rate control scheme is efficient on time saving. Multiview Video Coding (MVC) is efficient on reducing the huge amount of data in multiple-view video coding. The inter-view reference frames from the adjacent views are exploited for prediction in addition to the temporal prediction. However, due to the increase in the number of reference frames, the computational complexity is also increased. In order to manage the reference frame efficiently, a phase correlation algorithm is utilized to remove the inefficient inter-view reference frame from the reference list. The dependency between the inter-view reference frame and current frame is decided based on the phase correlation coefficients. If the inter-view reference frame is highly related to the current frame, it is still enabled in the reference list; otherwise, it will be disabled. The experimental results show that the proposed scheme is efficient on time saving and without loss in visual quality and increase in bitrate. The proposed optimization algorithms are efficient in reducing the computational complexity on H.264/AVC extension. The low computational complexity algorithm is useful in the design of future video coding standards, especially on low power handheld devices

    Depth-based Multi-View 3D Video Coding

    Get PDF

    Edge-based 3-D camera motion estimation with application to video coding

    Full text link

    A network transparent, retained mode multimedia processing framework for the Linux operating system environment

    Get PDF
    Die Arbeit präsentiert ein Multimedia-Framework für Linux, das im Unterschied zu früheren Arbeiten auf den Ideen "retained-mode processing" und "lazy evaluation" basiert: Statt Transformationen unmittelbar auszuführen, wird eine abstrakte Repräsentation aller Medienelemente aufgebaut. "renderer"-Treiber fungieren als Übersetzer, die diese Darstellung zur Laufzeit in konkrete Operationen umsetzen, wobei das Datenmodell zahlreiche Optimierungen zur Reduktion der Anzahl der Schritte oder der Minimierung von Kommunikation erlaubt. Dies erlaubt ein stark vereinfachtes Programmiermodell bei gleichzeitiger Effizienzsteigerung. "renderer"-Treiber können zur Ausführung von Transformationen den lokalen Prozessor verwenden, oder können die Operationen delegieren. In der Arbeit wird eine Erweiterung des X Window Systems um Mechanismen zur Medienverarbeitung vorgestellt, sowie ein "renderer"-Treiber, der diese zur Delegation der Verarbeitung nutzt

    Scalable video compression with optimized visual performance and random accessibility

    Full text link
    This thesis is concerned with maximizing the coding efficiency, random accessibility and visual performance of scalable compressed video. The unifying theme behind this work is the use of finely embedded localized coding structures, which govern the extent to which these goals may be jointly achieved. The first part focuses on scalable volumetric image compression. We investigate 3D transform and coding techniques which exploit inter-slice statistical redundancies without compromising slice accessibility. Our study shows that the motion-compensated temporal discrete wavelet transform (MC-TDWT) practically achieves an upper bound to the compression efficiency of slice transforms. From a video coding perspective, we find that most of the coding gain is attributed to offsetting the learning penalty in adaptive arithmetic coding through 3D code-block extension, rather than inter-frame context modelling. The second aspect of this thesis examines random accessibility. Accessibility refers to the ease with which a region of interest is accessed (subband samples needed for reconstruction are retrieved) from a compressed video bitstream, subject to spatiotemporal code-block constraints. We investigate the fundamental implications of motion compensation for random access efficiency and the compression performance of scalable interactive video. We demonstrate that inclusion of motion compensation operators within the lifting steps of a temporal subband transform incurs a random access penalty which depends on the characteristics of the motion field. The final aspect of this thesis aims to minimize the perceptual impact of visible distortion in scalable reconstructed video. We present a visual optimization strategy based on distortion scaling which raises the distortion-length slope of perceptually significant samples. This alters the codestream embedding order during post-compression rate-distortion optimization, thus allowing visually sensitive sites to be encoded with higher fidelity at a given bit-rate. For visual sensitivity analysis, we propose a contrast perception model that incorporates an adaptive masking slope. This versatile feature provides a context which models perceptual significance. It enables scene structures that otherwise suffer significant degradation to be preserved at lower bit-rates. The novelty in our approach derives from a set of "perceptual mappings" which account for quantization noise shaping effects induced by motion-compensated temporal synthesis. The proposed technique reduces wavelet compression artefacts and improves the perceptual quality of video

    Improved quality block-based low bit rate video coding.

    Get PDF
    The aim of this research is to develop algorithms for enhancing the subjective quality and coding efficiency of standard block-based video coders. In the past few years, numerous video coding standards based on motion-compensated block-transform structure have been established where block-based motion estimation is used for reducing the correlation between consecutive images and block transform is used for coding the resulting motion-compensated residual images. Due to the use of predictive differential coding and variable length coding techniques, the output data rate exhibits extreme fluctuations. A rate control algorithm is devised for achieving a stable output data rate. This rate control algorithm, which is essentially a bit-rate estimation algorithm, is then employed in a bit-allocation algorithm for improving the visual quality of the coded images, based on some prior knowledge of the images. Block-based hybrid coders achieve high compression ratio mainly due to the employment of a motion estimation and compensation stage in the coding process. The conventional bit-allocation strategy for these coders simply assigns the bits required by the motion vectors and the rest to the residual image. However, at very low bit-rates, this bit-allocation strategy is inadequate as the motion vector bits takes up a considerable portion of the total bit-rate. A rate-constrained selection algorithm is presented where an analysis-by-synthesis approach is used for choosing the best motion vectors in term of resulting bit rate and image quality. This selection algorithm is then implemented for mode selection. A simple algorithm based on the above-mentioned bit-rate estimation algorithm is developed for the latter to reduce the computational complexity. For very low bit-rate applications, it is well-known that block-based coders suffer from blocking artifacts. A coding mode is presented for reducing these annoying artifacts by coding a down-sampled version of the residual image with a smaller quantisation step size. Its applications for adaptive source/channel coding and for coding fast changing sequences are examined

    Irregular Variable Length Coding

    Get PDF
    In this thesis, we introduce Irregular Variable Length Coding (IrVLC) and investigate its applications, characteristics and performance in the context of digital multimedia broadcast telecommunications. During IrVLC encoding, the multimedia signal is represented using a sequence of concatenated binary codewords. These are selected from a codebook, comprising a number of codewords, which, in turn, comprise various numbers of bits. However, during IrVLC encoding, the multimedia signal is decomposed into particular fractions, each of which is represented using a different codebook. This is in contrast to regular Variable Length Coding (VLC), in which the entire multimedia signal is encoded using the same codebook. The application of IrVLCs to joint source and channel coding is investigated in the context of a video transmission scheme. Our novel video codec represents the video signal using tessellations of Variable-Dimension Vector Quantisation (VDVQ) tiles. These are selected from a codebook, comprising a number of tiles having various dimensions. The selected tessellation of VDVQ tiles is signalled using a corresponding sequence of concatenated codewords from a Variable Length Error Correction (VLEC) codebook. This VLEC codebook represents a specific joint source and channel coding case of VLCs, which facilitates both compression and error correction. However, during video encoding, only particular combinations of the VDVQ tiles will perfectly tessellate, owing to their various dimensions. As a result, only particular sub-sets of the VDVQ codebook and, hence, of the VLEC codebook may be employed to convey particular fractions of the video signal. Therefore, our novel video codec can be said to employ IrVLCs. The employment of IrVLCs to facilitate Unequal Error Protection (UEP) is also demonstrated. This may be applied when various fractions of the source signal have different error sensitivities, as is typical in audio, speech, image and video signals, for example. Here, different VLEC codebooks having appropriately selected error correction capabilities may be employed to encode the particular fractions of the source signal. This approach may be expected to yield a higher reconstruction quality than equal protection in cases where the various fractions of the source signal have different error sensitivities. Finally, this thesis investigates the application of IrVLCs to near-capacity operation using EXtrinsic Information Transfer (EXIT) chart analysis. Here, a number of component VLEC codebooks having different inverted EXIT functions are employed to encode particular fractions of the source symbol frame. We show that the composite inverted IrVLC EXIT function may be obtained as a weighted average of the inverted component VLC EXIT functions. Additionally, EXIT chart matching is employed to shape the inverted IrVLC EXIT function to match the EXIT function of a serially concatenated inner channel code, creating a narrow but still open EXIT chart tunnel. In this way, iterative decoding convergence to an infinitesimally low probability of error is facilitated at near-capacity channel SNRs

    Multimedia Applications of the Wavelet Transform

    Get PDF
    This dissertation investigates novel applications of the wavelet transform in the analysis and compression of audio, still images, and video. Most recently, some surveys have been published on the restoration of noisy audio signals. Based on these, we have developed a wavelet-based denoising program for audio signals that allows flexible parameter settings. The multiscale property of the wavelet transform can successfully be exploited for the detection of semantic structures in images: A comparison of the coefficients allows the extraction of a predominant structure. This idea forms the basis of our semiautomatic edge detection algorithm. Empirical evaluations and the resulting recommendations follow. In the context of the teleteaching project Virtual University of the Upper Rhine Valley (VIROR), many lectures were transmitted between remote locations. We thus encountered the problem of scalability of a video stream for different access bandwidths in the Internet. A substantial contribution of this dissertation is the introduction of the wavelet transform into hierarchical video coding and the recommendation of parameter settings based on empirical surveys. Furthermore, a prototype implementation proves the principal feasibility of a wavelet-based, nearly arbitrarily scalable application. Mathematical transformations constitute a commonly underestimated problem for students in their first semesters of study. Motivated by the VIROR project, we spent a considerable amount of time and effort on the exploration of approaches to enhance mathematical topics with multimedia; both the technical design and the didactic integration into the curriculum are discussed. In a large field trial on "traditional teaching versus multimedia-enhanced teaching", the objective knowledge gained by the students was measured. This allows us to objectively rate positive the efficiency of our teaching modules
    • …