1,556 research outputs found

    Object-based video representations: shape compression and object segmentation

    Get PDF
    Object-based video representations are considered to be useful for easing the process of multimedia content production and enhancing user interactivity in multimedia productions. Object-based video presents several new technical challenges, however. Firstly, as with conventional video representations, compression of the video data is a requirement. For object-based representations, it is necessary to compress the shape of each video object as it moves in time. This amounts to the compression of moving binary images. This is achieved by the use of a technique called context-based arithmetic encoding. The technique is utilised by applying it to rectangular pixel blocks and as such it is consistent with the standard tools of video compression. The blockbased application also facilitates well the exploitation of temporal redundancy in the sequence of binary shapes. For the first time, context-based arithmetic encoding is used in conjunction with motion compensation to provide inter-frame compression. The method, described in this thesis, has been thoroughly tested throughout the MPEG-4 core experiment process and due to favourable results, it has been adopted as part of the MPEG-4 video standard. The second challenge lies in the acquisition of the video objects. Under normal conditions, a video sequence is captured as a sequence of frames and there is no inherent information about what objects are in the sequence, not to mention information relating to the shape of each object. Some means for segmenting semantic objects from general video sequences is required. For this purpose, several image analysis tools may be of help and in particular, it is believed that video object tracking algorithms will be important. A new tracking algorithm is developed based on piecewise polynomial motion representations and statistical estimation tools, e.g. the expectationmaximisation method and the minimum description length principle

    Offline and Online Optical Flow Enhancement for Deep Video Compression

    Full text link
    Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows, however, may be less suitable for video compression due to the following two factors. First, the optical flow estimation networks were trained to perform inter-frame prediction as accurately as possible, but the optical flows themselves may cost too many bits to encode. Second, the optical flow estimation networks were trained on synthetic data, and may not generalize well enough to real-world videos. We address the twofold limitations by enhancing the optical flows in two stages: offline and online. In the offline stage, we fine-tune a trained optical flow estimation network with the motion information provided by a traditional (non-deep) video compression scheme, e.g. H.266/VVC, as we believe the motion information of H.266/VVC achieves a better rate-distortion trade-off. In the online stage, we further optimize the latent features of the optical flows with a gradient descent-based algorithm for the video to be compressed, so as to enhance the adaptivity of the optical flows. We conduct experiments on a state-of-the-art deep video compression scheme, DCVC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 12.8% bitrate saving on the tested videos, without increasing the model or computational complexity of the decoder side.Comment: 9 pages, 6 figure

    Nouvelles mĂ©thodes de prĂ©diction inter-images pour la compression d’images et de vidĂ©os

    Get PDF
    Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilitĂ© des dispositifs de capture vidĂ©o et des nouvelles pratiques liĂ©es aux rĂ©seaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidĂ©os constituent aujourd’hui une partie importante de donnĂ©es transmises sur internet. Les applications de streaming vidĂ©o reprĂ©sentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont dĂ©jĂ  stockĂ©es dans le cloud et des millions y sont tĂ©lĂ©chargĂ©s chaque jour. Les besoins toujours croissants en streaming et stockage nĂ©cessitent donc une amĂ©lioration constante des outils de compression d’image et de vidĂ©o. Cette thĂšse vise Ă  explorer des nouvelles approches pour amĂ©liorer les mĂ©thodes actuelles de prĂ©diction inter-images. De telles mĂ©thodes tirent parti des redondances entre images similaires, et ont Ă©tĂ© dĂ©veloppĂ©es Ă  l’origine dans le contexte de la vidĂ©o compression. Dans une premiĂšre partie, de nouveaux outils de prĂ©diction inter globaux et locaux sont associĂ©s pour amĂ©liorer l’efficacitĂ© des schĂ©mas de compression de bases de donnĂ©es d’image. En associant une compensation gĂ©omĂ©trique et photomĂ©trique globale avec une prĂ©diction linĂ©aire locale, des amĂ©liorations significatives peuvent ĂȘtre obtenues. Une seconde approche est ensuite proposĂ©e qui introduit un schĂ©ma deprĂ©diction inter par rĂ©gions. La mĂ©thode proposĂ©e est en mesure d’amĂ©liorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions gĂ©omĂ©triques et photomĂ©triques Ă  une Ă©chelle semi locale. Cette approche est ensuite adaptĂ©e et validĂ©e dans le cadre de la compression vidĂ©o. Des amĂ©liorations en rĂ©duction de dĂ©bit sont obtenues, en particulier pour les sĂ©quences prĂ©sentant des mouvements complexes rĂ©els tels que des zooms et des rotations. La derniĂšre partie de la thĂšse se concentre sur l’étude des mĂ©thodes d’apprentissage en profondeur dans le cadre de la prĂ©diction inter. Ces derniĂšres annĂ©es, les rĂ©seaux de neurones profonds ont obtenu des rĂ©sultats impressionnants pour un grand nombre de tĂąches de vision par ordinateur. Les mĂ©thodes basĂ©es sur l’apprentissage en profondeur proposĂ©esĂ  l’origine pour de l’interpolation d’images sont Ă©tudiĂ©es ici dans le contexte de la compression vidĂ©o. Des amĂ©liorations en terme de performances de codage sont obtenues par rapport aux mĂ©thodes d’estimation et de compensation de mouvements traditionnelles. Ces rĂ©sultats mettent en Ă©vidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidĂ©o

    Investigation of Different Video Compression Schemes Using Neural Networks

    Get PDF
    Image/Video compression has great significance in the communication of motion pictures and still images. The need for compression has resulted in the development of various techniques including transform coding, vector quantization and neural networks. this thesis neural network based methods are investigated to achieve good compression ratios while maintaining the image quality. Parts of this investigation include motion detection, and weight retraining. An adaptive technique is employed to improve the video frame quality for a given compression ratio by frequently updating the weights obtained from training. More specifically, weight retraining is performed only when the error exceeds a given threshold value. Image quality is measured objectively, using the peak signal-to-noise ratio versus performance measure. Results show the improved performance of the proposed architecture compared to existing approaches. The proposed method is implemented in MATLAB and the results obtained such as compression ratio versus signalto- noise ratio are presented

    Study on Fast Affine Motion Parameter Estimation for Efficient Video Coding

    Get PDF
    Tohoku University性ç”ș真侀郎èȘČ

    Motion hints based video coding

    Full text link
    The persistent growth of video-based applications is heavily dependent on the advancements in video coding systems. Modern video codecs use the motion model itself to describe the geometric boundaries of moving objects in video sequences and thereby spend a significant portion of their bit rate refining the motion description in regions where motion discontinuities exist. This explicit communication of motion introduces redundancy, since some aspects of the motion can at least partially be inferred from the reference frames. In this thesis work, a novel bi-directional motion hints based prediction paradigm is proposed that moves away from the traditional redundant approach of careful partitioning around object boundaries by exploiting the spatial structure of the reference frames to infer appropriate boundaries for the intermediate ones. Motion hint provide a global description of motion over specific domain. Fundamentally this is related to the segmentation of foreground from background regions where the foreground and background motions are the motion hints. The appealing thing about motion hints is that they are continuous and invertible, even though the observed motion field for a frame is discontinuous and non-invertible. Experimental results show that at low bit rate applications, the motion hints based coder achieved a rate-distortion (RD) gain of 0.81 dB, or equivalently 13.38% savings in bit rate over the H.264/AVC reference. In a hybrid setting, this gain increased to 0.94 dB and 20.41% bit rebate is obtained. If both low and high bit rate scenarios are considered then the hybrid coder showed a RD performance of 0.80 dB, or equivalently 16.57% savings in bit rate. The usage of higher fractional pixel accurate motion hint, predictive coding of motion hint, a memory-based initialization for motion hint estimation improved the RD gain to 0.85 dB and 17.55% of bit rebate. The prediction framework is highly flexible in the sense that the motion model order for the hints can be content adaptive i.e. it can accommodate different motion models like affine, elastic, etc. Detecting motion discontinuity macroblocks (MBs) is a challenging task and the prediction paradigm managed to detect a significant number of such MBs. If the motion hints based prediction is used as a prediction mode for MBs, at low bit rates almost 50% of the motion discontinuity MBs chose to use affine hint mode and this number increased to 60% if elastic hint is used
    • 

    corecore