7 research outputs found

    Intra- and Inter-reasoning Graph Convolutional Network for Saliency Prediction on 360° Images

    Get PDF
    Cubic projection can be utilized to divide 360° images into multiple rectilinear images, with little distortion. However, the existing saliency prediction models fail to integrate semantic information of these images. In this paper, we address this by proposing an intra- and inter-reasoning graph convolutional network for saliency prediction on 360 ° images (SalReGCN360). The whole framework contains six sub-networks, each of which contains two branches. In the training phase, after utilizing Multiple Cubic Projection (MCP), six rectilinear images are simultaneously put into corresponding sub-networks. In one of the branches, the global features of a single rectilinear image are extracted by the intra-graph inference module to finely predict local saliency of 360 ° images. In the other branch, the contextual features are extracted by the inter-graph inference module to effectively integrate semantic information of six rectilinear images. Finally, the feature maps are generated by the two branches fusion, and six corresponding rectilinear saliency maps are predicted. Extensive experiments on two popular saliency datasets illustrate the superiority of the proposed model, especially the improvement in KLD metric

    Object Detection in Omnidirectional Images

    Get PDF
    Nowadays, computer vision (CV) is widely used to solve real-world problems, which pose increasingly higher challenges. In this context, the use of omnidirectional video in a growing number of applications, along with the fast development of Deep Learning (DL) algorithms for object detection, drives the need for further research to improve existing methods originally developed for conventional 2D planar images. However, the geometric distortion that common sphere-to-plane projections produce, mostly visible in objects near the poles, in addition to the lack of omnidirectional open-source labeled image datasets has made an accurate spherical image-based object detection algorithm a hard goal to achieve. This work is a contribution to develop datasets and machine learning models particularly suited for omnidirectional images, represented in planar format through the well-known Equirectangular Projection (ERP). To this aim, DL methods are explored to improve the detection of visual objects in omnidirectional images, by considering the inherent distortions of ERP. An experimental study was, firstly, carried out to find out whether the error rate and type of detection errors were related to the characteristics of ERP images. Such study revealed that the error rate of object detection using existing DL models with ERP images, actually, depends on the object spherical location in the image. Then, based on such findings, a new object detection framework is proposed to obtain a uniform error rate across the whole spherical image regions. The results show that the pre and post-processing stages of the implemented framework effectively contribute to reducing the performance dependency on the image region, evaluated by the above-mentioned metric

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews

    Saliency-based navigation in omnidirectional image

    No full text
    International audienceOmnidirectional images describe the color information at a given position from all directions. Affordable 360° cameras have recently been developed leading to an explosion of the 360° data shared on the social networks. However, an omnidirectional image does not contain interesting content everywhere. Some part of the images are indeed more likely to be looked at by some users than others. Knowing these regions of interest might be useful for 360° image compression, streaming, retargeting or even editing. In this paper, a new approach based on 2D image saliency is proposed both to model the user navigation within a 360° image, and to detect which parts of an omnidirectional content might draw users’ attentio