2,169 research outputs found

    OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression

    Get PDF
    International audienceState-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not welldefined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold

    Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model

    Full text link
    Omnidirectional video enables spherical stimuli with the 360×180∘360 \times 180^ \circ viewing range. Meanwhile, only the viewport region of omnidirectional video can be seen by the observer through head movement (HM), and an even smaller region within the viewport can be clearly perceived through eye movement (EM). Thus, the subjective quality of omnidirectional video may be correlated with HM and EM of human behavior. To fill in the gap between subjective quality and human behavior, this paper proposes a large-scale visual quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset provides not only the subjective quality scores of sequences but also the HM and EM data of subjects. By mining our dataset, we find that the subjective quality of omnidirectional video is indeed related to HM and EM. Hence, we develop a deep learning model, which embeds HM and EM, for objective VQA on omnidirectional video. Experimental results show that our model significantly improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201

    Motion and disparity estimation with self adapted evolutionary strategy in 3D video coding

    Get PDF
    Real world information, obtained by humans is three dimensional (3-D). In experimental user-trials, subjective assessments have clearly demonstrated the increased impact of 3-D pictures compared to conventional flat-picture techniques. It is reasonable, therefore, that we humans want an imaging system that produces pictures that are as natural and real as things we see and experience every day. Three-dimensional imaging and hence, 3-D television (3DTV) are very promising approaches expected to satisfy these desires. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. In this paper, we propose a novel approach to use Evolutionary Strategy (ES) for joint motion and disparity estimation to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression using a self adapted ES. A half pixel refinement algorithm is then applied by interpolating macro blocks in the previous frame to further improve the video quality. Experimental results demonstrate that the proposed adaptable ES with Half Pixel Joint Motion and Disparity Estimation can up to 1.5 dB objective quality gain without any additional computational cost over our previous algorithm.1Furthermore, the proposed technique get similar objective quality compared to the full search algorithm by reducing the computational cost up to 90%

    A joint motion & disparity motion estimation technique for 3D integral video compression using evolutionary strategy

    Get PDF
    3D imaging techniques have the potential to establish a future mass-market in the fields of entertainment and communications. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. Just like any digital video, 3D video sequences must also be compressed in order to make it suitable for consumer domain applications. However, ordinary compression techniques found in state-of-the-art video coding standards such as H.264, MPEG-4 and MPEG-2 are not capable of producing enough compression while preserving the 3D clues. Fortunately, a huge amount of redundancies can be found in an integral video sequence in terms of motion and disparity. This paper discusses a novel approach to use both motion and disparity information to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression. We further propose an optimization technique based on evolutionary strategies to minimize the computational complexity of the joint motion disparity estimation. Experimental results demonstrate that Joint Motion and Disparity Estimation can achieve over 1 dB objective quality gain over normal motion estimation. Once combined with Evolutionary strategy, this can achieve up to 94% computational cost saving

    Quality Evaluation and Nonuniform Compression of Geometrically Distorted Images Using the Quadtree Distortion Map

    Get PDF
    The paper presents an analysis of the effects of lossy compression algorithms applied to images affected by geometrical distortion. It will be shown that the encoding-decoding process results in a nonhomogeneous image degradation in the geometrically corrected image, due to the different amount of information associated to each pixel. A distortion measure named quadtree distortion map (QDM) able to quantify this aspect is proposed. Furthermore, QDM is exploited to achieve adaptive compression of geometrically distorted pictures, in order to ensure a uniform quality on the final image. Tests are performed using JPEG and JPEG2000 coding standards in order to quantitatively and qualitatively assess the performance of the proposed method

    TransformĂ©es basĂ©es graphes pour la compression de nouvelles modalitĂ©s d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilitĂ© de nouveaux types de camĂ©ras capturant des informations gĂ©omĂ©triques supplĂ©mentaires, ainsi que de l'Ă©mergence de nouvelles modalitĂ©s d'image telles que les champs de lumiĂšre et les images omnidirectionnelles, il est nĂ©cessaire de stocker et de diffuser une quantitĂ© Ă©norme de hautes dimensions. Les exigences croissantes en matiĂšre de streaming et de stockage de ces nouvelles modalitĂ©s d’image nĂ©cessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces donnĂ©es. Cette thĂšse a pour but d'explorer de nouvelles approches basĂ©es sur les graphes pour adapter les techniques de codage de transformĂ©es d'image aux types de donnĂ©es Ă©mergents oĂč les informations Ă©chantillonnĂ©es reposent sur des structures irrĂ©guliĂšres. Dans une premiĂšre contribution, de nouvelles transformĂ©es basĂ©es sur des graphes locaux sont conçues pour des reprĂ©sentations compactes des champs de lumiĂšre. En tirant parti d’une conception minutieuse des supports de transformĂ©es locaux et d’une procĂ©dure d’optimisation locale des fonctions de base , il est possible d’amĂ©liorer considĂ©rablement le compaction d'Ă©nergie. NĂ©anmoins, la localisation des supports ne permettait pas d'exploiter les dĂ©pendances Ă  long terme du signal. Cela a conduit Ă  une deuxiĂšme contribution oĂč diffĂ©rentes stratĂ©gies d'Ă©chantillonnage sont Ă©tudiĂ©es. CouplĂ©s Ă  de nouvelles mĂ©thodes de prĂ©diction, ils ont conduit Ă  des rĂ©sultats trĂšs importants en ce qui concerne la compression quasi sans perte de champs de lumiĂšre statiques. La troisiĂšme partie de la thĂšse porte sur la dĂ©finition de sous-graphes optimisĂ©s en distorsion de dĂ©bit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de libertĂ© aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou dĂ©finir un modĂšle (ensemble de poids sur les arĂȘtes) qui pourrait ne pas ĂȘtre entiĂšrement fiable pour la conception de transformĂ©es. La derniĂšre partie de la thĂšse est consacrĂ©e Ă  l'analyse thĂ©orique de l'effet de l'incertitude sur l'efficacitĂ© des transformĂ©es basĂ©es graphes
    • 

    corecore