23 research outputs found

    Livrable D3.3 of the PERSEE project : 2D coding tools

    Get PDF
    49Livrable D3.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.3 du projet. Son titre : 2D coding tool

    State of the art in 2D content representation and compression

    Get PDF
    Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet

    Intra-Key-Frame Coding and Side Information Generation Schemes in Distributed Video Coding

    Get PDF
    In this thesis investigation has been made to propose improved schemes for intra-key-frame coding and side information (SI) generation in a distributed video coding (DVC) framework. From the DVC developments in last few years it has been observed that schemes put more thrust on intra-frame coding and better quality side information (SI) generation. In fact both are interrelated as SI generation is dependent on decoded key frame quality. Hence superior quality key frames generated through intra-key frame coding will in turn are utilized to generate good quality SI frames. As a result, DVC needs less number of parity bits to reconstruct the WZ frames at the decoder. Keeping this in mind, we have proposed two schemes for intra-key frame coding namely, (a) Borrows Wheeler Transform based H.264/AVC (Intra) intra-frame coding (BWT-H.264/AVC(Intra)) (b) Dictionary based H.264/AVC (Intra) intra-frame coding using orthogonal matching pursuit (DBOMP-H.264/AVC (Intra)) BWT-H.264/AVC (Intra) scheme is a modified version of H.264/AVC (Intra) scheme where a regularized bit stream is generated prior to compression. This scheme results in higher compression efficiency as well as high quality decoded key frames. DBOMP-H.264/AVC (Intra) scheme is based on an adaptive dictionary and H.264/AVC (Intra) intra-frame coding. The traditional transform is replaced with a dictionary trained with K-singular value decomposition (K-SVD) algorithm. The dictionary elements are coded using orthogonal matching pursuit (OMP). Further, two side information generation schemes have been suggested namely, (a) Multilayer Perceptron based side information generation (MLP - SI) (b) Multivariable support vector regression based side information generation (MSVR-SI) MLP-SI scheme utilizes a multilayer perceptron (MLP) to estimate SI frames from the decoded key frames block-by-block. The network is trained offline using training patterns from different frames collected from standard video sequences. MSVR-SI scheme uses an optimized multi variable support vector regression (M-SVR) to generate SI frames from decoded key frames block-by-block. Like MLP, the training for M-SVR is made offline with known training patterns apriori. Both intra-key-frame coding and SI generation schemes are embedded in the Stanford based DVC architecture and studied individually to compare performances with their competitive schemes. Visual as well as quantitative evaluations have been made to show the efficacy of the schemes. To exploit the usefulness of intra-frame coding schemes in SI generation, four hybrid schemes have been formulated by combining the aforesaid suggested schemes as follows: (a) BWT-MLP scheme that uses BWT-H.264/AVC (Intra) intra-frame coding scheme and MLP-SI side information generation scheme. (b) BWT-MSVR scheme, where we utilize BWT-H.264/AVC (Intra) for intra-frame coding followed by MSVR-SI based side information generation. (c) DBOMP-MLP scheme is an outcome of putting DBOMP-H.264/AVC (Intra) intra-frame coding and MLP-SI side information generation schemes. (d) DBOMP-MSVR scheme deals with DBOMP-H.264/AVC (Intra) intra-frame coding and MSVR-SI side information generation together. The hybrid schemes are also incorporated into the Stanford based DVC architecture and simulation has been carried out on standard video sequences. The performance analysis with respect to overall rate distortion, number requests per SI frame, temporal evaluation, and decoding time requirement has been made to derive an overall conclusion

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Optimized Adaptive Encoding Based on Visual Attention

    Get PDF

    Video Analysis and Indexing

    Get PDF
    corecore