8 research outputs found

    Sparse/DCT (S/DCT) Two-Layered Representation of Prediction Residuals for Video Coding

    Full text link

    Rate control and bit allocations for JPEG transcoding

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 50-51).An image transcoder that produces a baseline JPEG file from a baseline JPEG input is developed. The goal is to produce a high quality image while accurately meeting a filesize target and keeping computational complexity-especially the memory usage and number of passes at the input image--low. Building upon the work of He and Mitra, the JPEG transcoder exploits a linear relationship between the number of zero-valued quantized DCT coefficients and the bitrate. Using this relationship and a histogram of coefficients, it is possible to determine an effective way to scale the quantization tables of an image to approach a target filesize. As the image is being transcoded, an intra-image process makes minor corrections, saving more bits as needed throughout the transcoding of the image. This intra-image process decrements specific coefficients, minimizing the change in value (and hence image quality) while maximizing the savings in bitrate. The result is a fast JPEG transcoder that reliably achieves a target filesize and preserves as much image quality as possible. The proposed transcoder and several variations were tested on a set of twenty-nine images that gave a fair representation of typical JPEG photos. The evaluation metric consisted of three parts: first, the accuracy and precision of the output filesize with respect to the target filesize; second, the PSNR of the output image with respect to the original image; and third, the subjective visual image quality.by Ricky D. Nguyen.M.Eng

    Bit Rate Control for Real-time Multipoint Video Conferencing

    Get PDF
    With the rapid development of video compression and network technology, real-time video communications has become a popular part of our daily life. Rate control is needed to satisfy the expectation of high quality and to make it possible to transmit over limited bandwidth. The objective of this thesis is to design a rate control scheme for a real-time Transcoding-Compositing Multipoint Video Conferencing System, which operates exclusively in the DCT domain. In this Transcoding-Compositing system, the mode of the composited frame should firstly be decided before encoding the composited image. A mode decision method relying on Karhunen-Loeve scene change detection is proposed. A new linear source Rate-Distortion model is developed in the - domain ( is the percentage of zero), based on which rate control scheme is designed. The designed rate control scheme is parted into three levels: Frame Level, Sub-frame Level, and Macroblock Level. Frame Level rate control decides the bit budget for each frame based on the buffer fullness. Sub-frame Level rate control optimizes the distribution of the bit budget among the decimated sub-images. Based on the linear source model, Macroblock Level rate control carries out an adaptive procedure to precisely control the number of encoding bits for each sub-image

    Rate-adaptive H.264 for TCP/IP networks

    Get PDF
    While there has always been a tremendous demand for streaming video over TCP/IP networks, the nature of the application still presents some challenging issues. These applications that transmit multimedia data over best-effort networks like the Internet must cope with the changing network behavior; specifically, the source encoder rate should be controlled based on feedback from a channel estimator that probes the network periodically. First, one such Multimedia Streaming TCP-Friendly Protocol (MSTFP) is considered, which iteratively integrates forward estimation of network status with feedback control to closely track the varying network characteristics. Second, a network-adaptive embedded bit stream is generated using a r-domain rate controller. The conceptual elegance of this r-domain framework stems from the fact that the coding bit rate ) (R is approximately linear in the percentage of zeros among the quantized spatial transform coefficients ) ( r , as opposed to the more traditional, complex and highly nonlinear ) ( Q R characterization. Though the r-model has been successfully implemented on a few other video codecs, its application to the emerging video coding standard H.264 is considered. The extensive experimental results show thatrobust rate control, similar or improved Peak Signal to Noise Ratio (PSNR), and a faster implementation

    DCT Video Compositing with Embedded Zerotree Coding for Multi-Point Video Conferencing

    Get PDF
    In this thesis, DCT domain video compositing with embedded zerotree coding for multi-point video conferencing is considered. In a typical video compositing system, video sequences coming from different sources are composited into one video stream and sent using a single channel to the receiver points. There are mainly three stages of video compositing: decoding of incoming video streams, decimation of video frames, andencoding of the composited video. Conventional spatial domain video compositing requires transformations between the DCT and the spatial domains increasing the complexity of computations. The advantage of the DCT domain video compositing is that the decoding, decimation and encoding remain fully in the DCT domain resulting in faster processing time and better quality of the composited videos. The composited videos are encoded via a DCT based embedded zerotree coder which was originally developed for wavelet coding. An adaptive arithmetic coder is used to encode the symbols obtained from the DCT based zerotree codingresulting in embedded bit stream. By using the embedded zerotree coder the quality of the composited videos is improved when compared to a conventional encoder. An advanced versionof zerotree coder is also used to increase the performance of the compositing system. Another improvement is due to the use of local cosine transform to decrease the blocking effect at low bit rates. We also apply the proposed DCT decimation/interpolation for single stream video coding achieving better quality than regular encoding process at low bit rates. The bit rate control problem is easily solved by taking the advantage the embedded property of zerotree coding since the coding control parameter is the bit rate itself. We also achieve the optimum bit rate allocation among the composited frames in a GOP without using subframe layer bit rate allocation, since zerotree coding uses successive approximation quantization allowing DCT coefficients to be encoded in descending significance order

    Rate distortion control in digital video coding

    Get PDF
    Lossy compression is widely applied for coding visual information in applications such as entertainment in order to achieve a high compression ratio. In this case, the video quality worsens as the compression ratio increases. Rate control tries to use the bit budget properly so the visual distortion is minimized. Rate control for H.264, the state-of-the-art hybrid video coder, is investigated. Based on the Rate-Distortion (R-D) slope analysis, an operational rate distortion optimization scheme for H.264 using Lagrangian multiplier method is proposed. The scheme tries to find the best path of quantization parameter (OP) options at each macroblock. The proposed scheme provides a smoother rate control that is able to cover a wider range of bit rates and for many sequences it outperforms the H.264 (JM92 version) rate control scheme in the sense of PSNR. The Bath University Matching Pursuit (BUMP) project develops a new matching pursuit (MP) technique as an alternative to transform video coders. By combining MP with precision limited quantization (PLO) and multi-pass embedded residual group encoder (MERGE), a very efficient coder is built that is able to produce an embedded bit stream, which is highly desirable for rate control. The problem of optimal bit allocation with a BUMP based video coder is investigated. An ad hoc scheme of simply limiting the maximum atom number shows an obvious performance improvement, which indicates a potential of efficiency improvement. An in depth study on the bit Rate-Atom character has been carried out and a rate estimation model has been proposed. The model gives a theoretical description of how the oit number changes. An adaptive rate estimation algorithm has been proposed. Experiments show that the algorithm provides extremely high estimation accuracy. The proposed R-D source model is then applied to bit allocation in the BUMP based video coder. An R-D slope unifying scheme was applied to optimize the performance of the coder'. It adopts the R-D model and fits well within the BUMP coder. The optimization can be performed in a straightforward way. Experiments show that the proposed method greatly improved performance of BUMP video coder, and outperforms H.264 in low and medium bit rates by up to 2 dB.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Adaptation en temps réel pour une meilleure qualité d'expérience en réalité augmentée

    Get PDF
    In the framework of mobile augmented reality, a video stream is sent to the user with the help of a wireless communication link. To guarantee an efficient transmission, the video stream rate is controlled by adapting the encoding parameters such as to follow a given bandwidth. The rate can be reduced by reducing the frame rate and/or by choosing a higher compression factor for the video stream. These parameter modifications impact both the level of detail and the fluidity perceived by the user, and thus his/her subjective appreciation. The experience perceived by the user also depends on the context. During a rapid head motion, the notion of fluidity is more important than for a fixed head position. We propose an end-to-end adaptation scheme which enables the encoding of parameters such as to provide the best experience for the user regarding the dynamical context. For example, when the user moves quickly his/her head, the frame is compressed more to increase the frame rate and hence achieve a better perception of the motion. The lack of direct measurement for the subjective user experience is addressed with the design of objective metrics and a generic model to predict the user quality of experience in real time. A rate control strategy based on a systems approach is deployed to manage the multiple encoding parameters which control the stream rate. The encoder is modeled in an abstract manner as a single-variable linear system, where the content variation is taken as a perturbation. A stable and efficient controller is designed for the abstract model of the encoder. To implement the designed controller, the parameter combinations for the real encoder corresponding to the single input of the abstract model should be determined. A new one-pass algorithm determines this correspondence in real time based on a mapping method. Then, the proposed contextual adaptation enables to get the encoding parameter combination that maximizes the quality of experience using an appropriate model. Finally, the global adaptation scheme combines the rate control, the mapping method and the contextual adaptation for real-time implementation. Simulation and experiments illustrate the approach and the global adaptation scheme is validated through different scenarios

    Low-complexity high prediction accuracy visual quality metrics and their applications in H.264/AVC encoding mode decision process

    Get PDF
    In this thesis, we develop a new general framework for computing full reference image quality scores in the discrete wavelet domain using the Haar wavelet. The proposed framework presents an excellent tradeoff between accuracy and complexity. In our framework, quality metrics are categorized as either map-based, which generate a quality (distortion) map to be pooled for the final score, e.g., structural similarity (SSIM), or non map-based, which only give a final score, e.g., Peak signal-to-noise ratio (PSNR). For mapbased metrics, the proposed framework defines a contrast map in the wavelet domain for pooling the quality maps. We also derive a formula to enable the framework to automatically calculate the appropriate level of wavelet decomposition for error-based metrics at a desired viewing distance. To consider the effect of very fine image details in quality assessment, the proposed method defines a multi-level edge map for each image, which comprises only the most informative image subbands. To clarify the application of the framework in computing quality scores, we give some examples showing how the framework can be applied to improve well-known metrics such as SSIM, visual information fidelity (VIF), PSNR, and absolute difference. We compare the complexity of various algorithms obtained by the framework to the Intel IPP-based H.264 baseline profile encoding using C/C++ implementations. We evaluate the overall performance of the proposed metrics, including their prediction accuracy, on two well-known image quality databases and one video quality database. All the simulation results confirm the efficiency of the proposed framework and quality assessment metrics in improving the prediction accuracy and also reduction of the computational complexity. For example, by using the framework, we can compute the VIF at about 5% of the complexity of its original version, but with higher accuracy. In the next step, we study how H.264 coding mode decision can benefit from our developed metrics. We integrate the proposed SSEA metric as the distortion measure inside the H.264 mode decision process. The H.264/AVC JM reference software is used as the implementation and verification platform. We propose a search algorithm to determine the Lagrange multiplier value for each quantization parameter (QP). The search is applied on three different types of video sequences having various motion activity features, and the resulting Lagrange multiplier values are tabulated for each of them. Based on our proposed Framework we propose a new quality metric PSNRA, and use it in this part (mode decision). The simulated rate-distortion (RD) curves show that at the same PSNRA, with the SSEA-based mode decision, the bitrate is reduced about 5% on average compared to the conventional SSE-based approach for the sequences with low and medium motion activities. It is notable that the computational complexity is not increased at all by using the proposed SSEA-based approach instead of the conventional SSE-based method. Therefore, the proposed mode decision algorithm can be used in real-time video coding
    corecore