20 research outputs found

    Blind Omnidirectional Image Quality Assessment with Viewport Oriented Graph Convolutional Networks

    Full text link
    Quality assessment of omnidirectional images has become increasingly urgent due to the rapid growth of virtual reality applications. Different from traditional 2D images and videos, omnidirectional contents can provide consumers with freely changeable viewports and a larger field of view covering the 360∘×180∘360^{\circ}\times180^{\circ} spherical surface, which makes the objective quality assessment of omnidirectional images more challenging. In this paper, motivated by the characteristics of the human vision system (HVS) and the viewing process of omnidirectional contents, we propose a novel Viewport oriented Graph Convolution Network (VGCN) for blind omnidirectional image quality assessment (IQA). Generally, observers tend to give the subjective rating of a 360-degree image after passing and aggregating different viewports information when browsing the spherical scenery. Therefore, in order to model the mutual dependency of viewports in the omnidirectional image, we build a spatial viewport graph. Specifically, the graph nodes are first defined with selected viewports with higher probabilities to be seen, which is inspired by the HVS that human beings are more sensitive to structural information. Then, these nodes are connected by spatial relations to capture interactions among them. Finally, reasoning on the proposed graph is performed via graph convolutional networks. Moreover, we simultaneously obtain global quality using the entire omnidirectional image without viewport sampling to boost the performance according to the viewing experience. Experimental results demonstrate that our proposed model outperforms state-of-the-art full-reference and no-reference IQA metrics on two public omnidirectional IQA databases

    No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

    Full text link
    360-degree/omnidirectional images (OIs) have achieved remarkable attentions due to the increasing applications of virtual reality (VR). Compared to conventional 2D images, OIs can provide more immersive experience to consumers, benefitting from the higher resolution and plentiful field of views (FoVs). Moreover, observing OIs is usually in the head mounted display (HMD) without references. Therefore, an efficient blind quality assessment method, which is specifically designed for 360-degree images, is urgently desired. In this paper, motivated by the characteristics of the human visual system (HVS) and the viewing process of VR visual contents, we propose a novel and effective no-reference omnidirectional image quality assessment (NR OIQA) algorithm by Multi-Frequency Information and Local-Global Naturalness (MFILGN). Specifically, inspired by the frequency-dependent property of visual cortex, we first decompose the projected equirectangular projection (ERP) maps into wavelet subbands. Then, the entropy intensities of low and high frequency subbands are exploited to measure the multi-frequency information of OIs. Besides, except for considering the global naturalness of ERP maps, owing to the browsed FoVs, we extract the natural scene statistics features from each viewport image as the measure of local naturalness. With the proposed multi-frequency information measurement and local-global naturalness measurement, we utilize support vector regression as the final image quality regressor to train the quality evaluation model from visual quality-related features to human ratings. To our knowledge, the proposed model is the first no-reference quality assessment method for 360-degreee images that combines multi-frequency information and image naturalness. Experimental results on two publicly available OIQA databases demonstrate that our proposed MFILGN outperforms state-of-the-art approaches

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    Deep Multi-Scale Features Learning for Distorted Image Quality Assessment

    Full text link
    Image quality assessment (IQA) aims to estimate human perception based image visual quality. Although existing deep neural networks (DNNs) have shown significant effectiveness for tackling the IQA problem, it still needs to improve the DNN-based quality assessment models by exploiting efficient multi-scale features. In this paper, motivated by the human visual system (HVS) combining multi-scale features for perception, we propose to use pyramid features learning to build a DNN with hierarchical multi-scale features for distorted image quality prediction. Our model is based on both residual maps and distorted images in luminance domain, where the proposed network contains spatial pyramid pooling and feature pyramid from the network structure. Our proposed network is optimized in a deep end-to-end supervision manner. To validate the effectiveness of the proposed method, extensive experiments are conducted on four widely-used image quality assessment databases, demonstrating the superiority of our algorithm

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Multiple View Texture Mapping: A Rendering Approach Designed for Driving Simulation

    Get PDF
    Simulation provides a safe and controlled environment ideal for human testing [49, 142, 120]. Simulation of real environments has reached new heights in terms of photo-realism. Often, a team of professional graphical artists would have to be hired to compete with modern commercial simulators. Meanwhile, machine vision methods are currently being developed that attempt to automatically provide geometrically consistent and photo-realistic 3D models of real scenes [189, 139, 115, 19, 140, 111, 132]. Often the only requirement is a set of images of that scene. A road engineer wishing to simulate the environment of a real road for driving experiments could potentially use these tools. This thesis develops a driving simulator that uses machine vision methods to reconstruct a real road automatically. A computer graphics method called projective texture mapping is applied to enhance the photo-realism of the 3D models[144, 43]. This essentially creates a virtual projector in the 3D environment to automatically assign image coordinates to a 3D model. These principles are demonstrated using custom shaders developed for an OpenGL rendering pipeline. Projective texture mapping presents a list of challenges to overcome, these include reverse projection and projection onto surfaces not immediately in front of the projector [53]. A significant challenge was the removal of dynamic foreground objects. 3D reconstruction systems create 3D models based on static objects captured in images. Dynamic objects are rarely reconstructed. Projective texture mapping of images, including these dynamic objects, can result in visual artefacts. A workflow is developed to resolve this, resulting in videos and 3D reconstructions of streets with no moving vehicles on the scene. The final simulator using 3D reconstruction and projective texture mapping is then developed. The rendering camera had a motion model introduced to enable human interaction. The final system is presented, experimentally tested, and future potential works are discussed

    Human Body Scattering Effects at Millimeter Waves Frequencies for Future 5G Systems and Beyond

    Full text link
    [ES] Se espera que las futuras comunicaciones móviles experimenten una revolución técnica que vaya más allá de las velocidades de datos de Gbps y reduzca las latencias de las velocidades de datos a niveles muy cercanos al milisegundo. Se han investigado nuevas tecnologías habilitadoras para lograr estas exigentes especificaciones. Y la utilización de las bandas de ondas milimétricas, donde hay mucho espectro disponible, es una de ellas. Debido a las numerosas dificultades técnicas asociadas a la utilización de esta banda de frecuencias, se necesitan complicados modelos de canal para anticipar las características del canal de radio y evaluar con precisión el rendimiento de los sistemas celulares en milimétricas. En concreto, los modelos de propagación más precisos son los basados en técnicas de trazado de rayos deterministas. Pero estas técnicas tienen el estigma de ser computacionalmente exigentes, y esto dificulta su uso para caracterizar el canal de radio en escenarios interiores complejos y dinámicos. La complejidad de la caracterización de estos escenarios depende en gran medida de la interacción del cuerpo humano con el entorno radioeléctrico, que en las ondas milimétricas suele ser destructiva y muy impredecible. Por otro lado, en los últimos años, la industria de los videojuegos ha desarrollado potentes herramientas para entornos hiperrealistas, donde la mayor parte de los avances en esta emulación de la realidad tienen que ver con el manejo de la luz. Así, los motores gráficos de estas plataformas se han vuelto cada vez más eficientes para manejar grandes volúmenes de información, por lo que son ideales para emular el comportamiento de la propagación de las ondas de radio, así como para reconstruir un escenario interior complejo. Por ello, en esta Tesis se ha aprovechado la capacidad computacional de este tipo de herramientas para evaluar el canal radioeléctrico milimétricas de la forma más eficiente posible. Esta Tesis ofrece unas pautas para optimizar la propagación de la señal en milimétricas en un entorno interior dinámico y complejo, para lo cual se proponen tres objetivos principales. El primer objetivo es evaluar los efectos de dispersión del cuerpo humano cuando interactúa con el canal de propagación. Una vez evaluado, se propuso un modelo matemático y geométrico simplificado para calcular este efecto de forma fiable y rápida. Otro objetivo fue el diseño de un reflector pasivo modular en milimétricas, que optimiza la cobertura en entornos de interior, evitando la interferencia del ser humano en la propagación. Y, por último, se diseñó un sistema de apuntamiento del haz predictivo en tiempo real, para que opere con el sistema de radiación en milimétricas, cuyo objetivo es evitar las pérdidas de propagación causadas por el cuerpo humano en entornos interiores dinámicos y complejos.[CA] S'espera que les futures comunicacions mòbils experimenten una revolució tècnica que vaja més enllà de les velocitats de dades de Gbps i reduïsca les latències de les velocitats de dades a nivells molt pròxims al milisegundo. S'han investigat noves tecnologies habilitadoras per a aconseguir estes exigents especificacions. I la utilització de les bandes d'ones millimètriques, on hi ha molt espectre disponible, és una d'elles. A causa de les nombroses dificultats tècniques associades a la utilització d'esta banda de freqüències, es necessiten complicats models de canal per a anticipar les característiques del canal de ràdio i avaluar amb precisió el rendiment dels sistemes cellulars en millimètriques. En concret, els models de propagació més precisos són els basats en tècniques de traçat de rajos deterministes. Però estes tècniques tenen l'estigma de ser computacionalment exigents, i açò dificulta el seu ús per a caracteritzar el canal de ràdio en escenaris interiors complexos i dinàmics. La complexitat de la caracterització d'estos escenaris depén en gran manera de la interacció del cos humà amb l'entorn radioelèctric, que en les ones millimètriques sol ser destructiva i molt impredicible. D'altra banda, en els últims anys, la indústria dels videojocs ha desenrotllat potents ferramentes per a entorns hiperrealistes, on la major part dels avanços en esta emulació de la realitat tenen a veure amb el maneig de la llum. Així, els motors gràfics d'estes plataformes s'han tornat cada vegada més eficients per a manejar grans volums d'informació, per la qual cosa són ideals per a emular el comportament de la propagació de les ones de ràdio, així com per a reconstruir un escenari interior complex. Per això, en esta Tesi s'ha aprofitat la capacitat computacional d'este tipus de ferramentes per a avaluar el canal radioelèctric millimètriques de la manera més eficient possible. Esta Tesi oferix unes pautes per a optimitzar la propagació del senyal en millimètriques en un entorn interior dinàmic i complex, per a la qual cosa es proposen tres objectius principals. El primer objectiu és avaluar els efectes de dispersió del cos humà quan interactua amb el canal de propagació. Una vegada avaluat, es va proposar un model matemàtic i geomètric simplificat per a calcular este efecte de forma fiable i ràpida. Un altre objectiu va ser el disseny d'un reflector passiu modular en millimètriques, que optimitza la cobertura en entorns d'interior, evitant la interferència del ser humà en la propagació, per a així evitar pèrdues de propagació addicionals. I, finalment, es va dissenyar un sistema d'apuntament del feix predictiu en temps real, perquè opere amb el sistema de radiació en millimètriques, l'objectiu del qual és evitar les pèrdues de propagació causades pel cos humà en entorns interiors dinàmics i complexos.[EN] Future mobile communications are expected to experience a technical revolution that goes beyond Gbps data rates and reduces data rate latencies to levels very close to a millisecond. New enabling technologies have been researched to achieve these demanding specifications. The utilization of mmWave bands, where a lot of spectrum is available, is one of them. Due to the numerous technical difficulties associated with using this frequency band, complicated channel models are necessary to anticipate the radio channel characteristics and to accurately evaluate the performance of cellular systems in mmWave. In particular, the most accurate propagation models are those based on deterministic ray tracing techniques. But these techniques have the stigma of being computationally intensive, and this makes it difficult to use them to characterize the radio channel in complex and dynamic indoor scenarios. The complexity of characterizing these scenarios depends largely on the interaction of the human body with the radio environment, which at mmWaves is often destructive and highly unpredictable. On the other hand, in recent years, the video game industry has developed powerful tools for hyper-realistic environments, where most of the progress in this reality emulation has to do with the handling of light. Therefore, the graphic engines of these platforms have become more and more efficient to handle large volumes of information, becoming ideal to emulate the radio wave propagation behavior, as well as to reconstruct a complex interior scenario. Therefore, in this Thesis one has taken advantage of the computational capacity of this type of tools to evaluate the mmWave radio channel in the most efficient way possible. This Thesis offers some guidelines to optimize the signal propagation in mmWaves in a dynamic and complex indoor environment, for which three main objectives are proposed. The first objective has been to evaluate the scattering effects of the human body when it interacts with the propagation channel. Once evaluated, a simplified mathematical and geometrical model has been proposed to calculate this effect in a reliable and fast way. Another objective has been the design of a modular passive reflector in mmWaves, which optimizes the coverage in indoor environments, avoiding human interference in the propagation, in order to avoid its harmful scattering effects. And finally, a real-time predictive beam steering system has been designed for the mmWaves radiation system, in order to avoid propagation losses caused by the human body in dynamic and complex indoor environments.Romero Peña, JS. (2022). Human Body Scattering Effects at Millimeter Waves Frequencies for Future 5G Systems and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19132
    corecore