14 research outputs found

    Real-time interactive video streaming over lossy networks: high performance low delay error resilient algorithms

    Get PDF
    According to Cisco's latest forecast, two-thirds of the world's mobile data traffic and 62 percent of the consumer Internet traffic will be video data by the end of 2016. However, the wireless networks and Internet are unreliable, where the video traffic may undergo packet loss and delay. Thus robust video streaming over unreliable networks, i.e., Internet, wireless networks, is of great importance in facing this challenge. Specifically, for the real-time interactive video streaming applications, such as video conference and video telephony, the allowed end-to-end delay is limited, which makes the robust video streaming an even more difficult task. In this thesis, we are going to investigate robust video streaming for real-time interactive applications, where the tolerated end-to-end delay is limited. Intra macroblock refreshment is an effective tool to stop error propagations in the prediction loop of video decoder, whereas redundant coding is a commonly used method to prevent error from happening for video transmission over lossy networks. In this thesis two schemes that jointly use intra macroblock refreshment and redundant coding are proposed. In these schemes, in addition to intra coding, we proposed to add two redundant coding methods to enhance the transmission robustness of the coded bitstreams. The selection of error resilient coding tools, i.e., intra coding and/or redundant coding, and the parameters for redundant coding are determined using the end-to-end rate-distortion optimization. Another category of methods to provide error resilient capacity is using forward error correction (FEC) codes. FEC is widely studied to protect streamed video over unreliable networks, with Reed-Solomon (RS) erasure codes as its commonly used implementation method. As a block-based error correcting code, on the one hand, enlarging the block size can enhance the performance of the RS codes; on the other hand, large block size leads to long delay which is not tolerable for real-time video applications. In this thesis two sub-GOP (Group of Pictures, formed by I-frame and all the following P/B-frames) based FEC schemes are proposed to improve the performance of Reed-Solomon codes for real-time interactive video applications. The first one, named DSGF (Dynamic sub-GOP FEC Coding), is designed for the ideal case, where no transmission network delay is taken into consideration. The second one, named RVS-LE (Real-time Video Streaming scheme exploiting the Late- and Early-arrival packets), is more practical, where the video transmission network delay is considered, and the late- and early-arrival packets are fully exploited. Of the two approaches, the sub-GOP, which contains more than one video frame, is dynamically tuned and used as the RS coding block to get the optimal performance. For the proposed DSGF approach, although the overall error resilient performance is higher than the conventional FEC schemes, that protect the streamed video frame by frame, its video quality fluctuates within the Sub-GOP. To mitigate this problem, in this thesis, another real-time video streaming scheme using randomized expanding Reed-Solomon code is proposed. In this scheme, the Reed-Solomon coding block includes not only the video packets of the current frame, but also all the video packets of previous frames in the current group of pictures (GOP). At the decoding side, the parity-check equations of the current frameare jointly solved with all the parity-check equations of the previous frames. Since video packets of the following frames are not encompassed in the RS coding block, no delay will be caused for waiting for the video or parity packets of the following frames both at encoding and decoding sides. The main contribution of this thesis is investigating the trade-off between the video transmission delay caused by FEC encoding/decoding dependency, the FEC error-resilient performance, and the computational complexity. By leveraging the methods proposed in this thesis, proper error-resilient tools and system parameters could be selected based on the video sequence characteristics, the application requirements, and the available channel bandwidth and computational resources. For example, for the applications that can tolerate relatively long delay, sub-GOP based approach is a suitable solution. For the applications where the end-to-end delay is stringent and the computational resource is sufficient (e.g. CPU is fast), it could be a wise choice to use the randomized expanding Reed-Solomon code

    Intra Coding Strategy for Video Error Resiliency: Behavioral Analysis

    Get PDF
    One challenge in video transmission is to deal with packet loss. Since the compressed video streams are sensitive to data loss, the error resiliency of the encoded video becomes important. When video data is lost and retransmission is not possible, the missed data should be concealed. But loss concealment causes distortion in the lossy frame which also propagates into the next frames even if their data are received correctly. One promising solution to mitigate this error propagation is intra coding. There are three approaches for intra coding: intra coding of a number of blocks selected randomly or regularly, intra coding of some specific blocks selected by an appropriate cost function, or intra coding of a whole frame. But Intra coding reduces the compression ratio; therefore, there exists a trade-off between bitrate and error resiliency achieved by intra coding. In this paper, we study and show the best strategy for getting the best rate-distortion performance. Considering the error propagation, an objective function is formulated, and with some approximations, this objective function is simplified and solved. The solution demonstrates that periodical I-frame coding is preferred over coding only a number of blocks as intra mode in P-frames. Through examination of various test sequences, it is shown that the best intra frame period depends on the coding bitrate as well as the packet loss rate. We then propose a scheme to estimate this period from curve fitting of the experimental results, and show that our proposed scheme outperforms other methods of intra coding especially for higher loss rates and coding bitrates

    Video QoS/QoE over IEEE802.11n/ac: A Contemporary Survey

    Get PDF
    The demand for video applications over wireless networks has tremendously increased, and IEEE 802.11 standards have provided higher support for video transmission. However, providing Quality of Service (QoS) and Quality of Experience (QoE) for video over WLAN is still a challenge due to the error sensitivity of compressed video and dynamic channels. This thesis presents a contemporary survey study on video QoS/QoE over WLAN issues and solutions. The objective of the study is to provide an overview of the issues by conducting a background study on the video codecs and their features and characteristics, followed by studying QoS and QoE support in IEEE 802.11 standards. Since IEEE 802.11n is the current standard that is mostly deployed worldwide and IEEE 802.11ac is the upcoming standard, this survey study aims to investigate the most recent video QoS/QoE solutions based on these two standards. The solutions are divided into two broad categories, academic solutions, and vendor solutions. Academic solutions are mostly based on three main layers, namely Application, Media Access Control (MAC) and Physical (PHY) which are further divided into two major categories, single-layer solutions, and cross-layer solutions. Single-layer solutions are those which focus on a single layer to enhance the video transmission performance over WLAN. Cross-layer solutions involve two or more layers to provide a single QoS solution for video over WLAN. This thesis has also presented and technically analyzed QoS solutions by three popular vendors. This thesis concludes that single-layer solutions are not directly related to video QoS/QoE, and cross-layer solutions are performing better than single-layer solutions, but they are much more complicated and not easy to be implemented. Most vendors rely on their network infrastructure to provide QoS for multimedia applications. They have their techniques and mechanisms, but the concept of providing QoS/QoE for video is almost the same because they are using the same standards and rely on Wi-Fi Multimedia (WMM) to provide QoS

    Error-resilient multi-view video plus depth based 3-D video coding

    Get PDF
    Three Dimensional (3-D) video, by definition, is a collection of signals that can provide depth perception of a 3-D scene. With the development of 3-D display technologies and interactive multimedia systems, 3-D video has attracted significant interest from both industries and academia with a variety of applications. In order to provide desired services in various 3-D video applications, the multiview video plus depth (MVD) representation, which can facilitate the generation of virtual views, has been determined to be the best format for 3-D video data. Similar to 2-D video, compressed 3-D video is highly sensitive to transmission errors due to errors propagated from the current frame to the future predicted frames. Moreover, since the virtual views required for auto-stereoscopic displays are rendered from the compressed texture videos and depth maps, transmission errors of the distorted texture videos and depth maps can be further propagated to the virtual views. Besides, the distortions in texture and depth show different effects on the rendering views. Therefore, compared to the reliability of the transmission of the 2-D video, error-resilient texture video and depth map coding are facing major new challenges. This research concentrates on improving the error resilience performance of MVD-based 3-D video in packet loss scenarios. Based on the analysis of the propagating behaviour of transmission errors, a Wyner-Ziv (WZ)-based error-resilient algorithm is first designed for coding of the multi-view video data or depth data. In this scheme, an auxiliary redundant stream encoded according to WZ principle is employed to protect a primary stream encoded with standard multi-view video coding codec. Then, considering the fact that different combinations of texture and depth coding mode will exhibit varying robustness to transmission errors, a rate-distortion optimized mode switching scheme is proposed to strike the optimal trade-off between robustness and compression effciency. In this approach, the texture and depth modes are jointly optimized by minimizing the overall distortion of both the coded and synthesized views subject to a given bit rate. Finally, this study extends the research on the reliable transmission of view synthesis prediction (VSP)-based 3-D video. In order to mitigate the prediction position error caused by packet losses in the depth map, a novel disparity vector correction algorithm is developed, where the corrected disparity vector is calculated from the depth error. To facilitate decoder error concealment, the depth error is recursively estimated at the decoder. The contributions of this dissertation are multifold. First, the proposed WZbased error-resilient algorithm can accurately characterize the effect of transmission error on multi-view distortion at the transform domain in consideration of both temporal and inter-view error propagation, and based on the estimated distortion, this algorithm can perform optimal WZ bit allocation at the encoder through explicitly developing a sophisticated rate allocation strategy. This proposed algorithm is able to provide a finer granularity in performing rate adaptivity and unequal error protection for multi-view data, not only at the frame level, but also at the bit-plane level. Secondly, in the proposed mode switching scheme, a new analytic model is formulated to optimally estimate the view synthesis distortion due to packet losses, in which the compound impact of the transmission distortions of both the texture video and the depth map on the quality of the synthesized view is mathematically analysed. The accuracy of this view synthesis distortion model is demonstrated via simulation results and, further, the estimated distortion is integrated into a rate-distortion framework for optimal mode switching to achieve substantial performance gains over state-of-the-art algorithms. Last, but not least, this dissertation provides a preliminary investigation of VSP-based 3-D video over unreliable channel. In the proposed disparity vector correction algorithm, the pixel-level depth map error can be precisely estimated at the decoder without the deterministic knowledge of the error-free reconstructed depth. The approximation of the innovation term involved in depth error estimation is proved theoretically. This algorithm is very useful to conceal the position-erroneous pixels whose disparity vectors are correctly received

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Novel source coding methods for optimising real time video codecs.

    Get PDF
    The quality of the decoded video is affected by errors occurring in the various layers of the protocol stack. In this thesis, disjoint errors occurring in different layers of the protocol stack are investigated with the primary objective of demonstrating the flexibility of the source coding layer. In the first part of the thesis, the errors occurring in the editing layer, due to the coexistence of different video standards in the broadcast market, are addressed. The problems investigated are ‘Field Reversal’ and ‘Mixed Pulldown’. Field Reversal is caused when the interlaced video fields are not shown in the same order as they were captured. This results in a shaky video display, as the fields are not displayed in chronological order. Additionally, Mixed Pulldown occurs when the video frame-rate is up-sampled and down-sampled, when digitised film material is being standardised to suit standard televisions. Novel image processing algorithms are proposed to solve these problems from the source coding layer. In the second part of the thesis, the errors occurring in the transmission layer due to data corruption are addressed. The usage of block level source error-resilient methods over bit level channel coding methods are investigated and improvements are suggested. The secondary objective of the thesis is to optimise the proposed algorithm’s architecture for real-time implementation, since the problems are of a commercial nature. The Field Reversal and Mixed Pulldown algorithms were tested in real time at MTV (Music Television) and are made available commercially through ‘Cerify’, a Linux-based media testing box manufactured by Tektronix Plc. The channel error-resilient algorithms were tested in a laboratory environment using Matlab and performance improvements are obtained
    corecore