20 research outputs found
Blind Omnidirectional Image Quality Assessment with Viewport Oriented Graph Convolutional Networks
Quality assessment of omnidirectional images has become increasingly urgent
due to the rapid growth of virtual reality applications. Different from
traditional 2D images and videos, omnidirectional contents can provide
consumers with freely changeable viewports and a larger field of view covering
the spherical surface, which makes the objective
quality assessment of omnidirectional images more challenging. In this paper,
motivated by the characteristics of the human vision system (HVS) and the
viewing process of omnidirectional contents, we propose a novel Viewport
oriented Graph Convolution Network (VGCN) for blind omnidirectional image
quality assessment (IQA). Generally, observers tend to give the subjective
rating of a 360-degree image after passing and aggregating different viewports
information when browsing the spherical scenery. Therefore, in order to model
the mutual dependency of viewports in the omnidirectional image, we build a
spatial viewport graph. Specifically, the graph nodes are first defined with
selected viewports with higher probabilities to be seen, which is inspired by
the HVS that human beings are more sensitive to structural information. Then,
these nodes are connected by spatial relations to capture interactions among
them. Finally, reasoning on the proposed graph is performed via graph
convolutional networks. Moreover, we simultaneously obtain global quality using
the entire omnidirectional image without viewport sampling to boost the
performance according to the viewing experience. Experimental results
demonstrate that our proposed model outperforms state-of-the-art full-reference
and no-reference IQA metrics on two public omnidirectional IQA databases
No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness
360-degree/omnidirectional images (OIs) have achieved remarkable attentions
due to the increasing applications of virtual reality (VR). Compared to
conventional 2D images, OIs can provide more immersive experience to consumers,
benefitting from the higher resolution and plentiful field of views (FoVs).
Moreover, observing OIs is usually in the head mounted display (HMD) without
references. Therefore, an efficient blind quality assessment method, which is
specifically designed for 360-degree images, is urgently desired. In this
paper, motivated by the characteristics of the human visual system (HVS) and
the viewing process of VR visual contents, we propose a novel and effective
no-reference omnidirectional image quality assessment (NR OIQA) algorithm by
Multi-Frequency Information and Local-Global Naturalness (MFILGN).
Specifically, inspired by the frequency-dependent property of visual cortex, we
first decompose the projected equirectangular projection (ERP) maps into
wavelet subbands. Then, the entropy intensities of low and high frequency
subbands are exploited to measure the multi-frequency information of OIs.
Besides, except for considering the global naturalness of ERP maps, owing to
the browsed FoVs, we extract the natural scene statistics features from each
viewport image as the measure of local naturalness. With the proposed
multi-frequency information measurement and local-global naturalness
measurement, we utilize support vector regression as the final image quality
regressor to train the quality evaluation model from visual quality-related
features to human ratings. To our knowledge, the proposed model is the first
no-reference quality assessment method for 360-degreee images that combines
multi-frequency information and image naturalness. Experimental results on two
publicly available OIQA databases demonstrate that our proposed MFILGN
outperforms state-of-the-art approaches
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Deep Multi-Scale Features Learning for Distorted Image Quality Assessment
Image quality assessment (IQA) aims to estimate human perception based image
visual quality. Although existing deep neural networks (DNNs) have shown
significant effectiveness for tackling the IQA problem, it still needs to
improve the DNN-based quality assessment models by exploiting efficient
multi-scale features. In this paper, motivated by the human visual system (HVS)
combining multi-scale features for perception, we propose to use pyramid
features learning to build a DNN with hierarchical multi-scale features for
distorted image quality prediction. Our model is based on both residual maps
and distorted images in luminance domain, where the proposed network contains
spatial pyramid pooling and feature pyramid from the network structure. Our
proposed network is optimized in a deep end-to-end supervision manner. To
validate the effectiveness of the proposed method, extensive experiments are
conducted on four widely-used image quality assessment databases, demonstrating
the superiority of our algorithm
Transformées basées graphes pour la compression de nouvelles modalités d’image
Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes
Texture and Colour in Image Analysis
Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews
Multiple View Texture Mapping: A Rendering Approach Designed for Driving Simulation
Simulation provides a safe and controlled environment ideal for human
testing [49, 142, 120]. Simulation of real environments has reached
new heights in terms of photo-realism. Often, a team of professional
graphical artists would have to be hired to compete with modern commercial
simulators. Meanwhile, machine vision methods are currently
being developed that attempt to automatically provide geometrically
consistent and photo-realistic 3D models of real scenes [189, 139, 115,
19, 140, 111, 132]. Often the only requirement is a set of images of
that scene. A road engineer wishing to simulate the environment of a
real road for driving experiments could potentially use these tools.
This thesis develops a driving simulator that uses machine vision
methods to reconstruct a real road automatically. A computer graphics
method called projective texture mapping is applied to enhance
the photo-realism of the 3D models[144, 43]. This essentially creates
a virtual projector in the 3D environment to automatically assign image
coordinates to a 3D model. These principles are demonstrated
using custom shaders developed for an OpenGL rendering pipeline.
Projective texture mapping presents a list of challenges to overcome,
these include reverse projection and projection onto surfaces not immediately
in front of the projector [53]. A significant challenge was
the removal of dynamic foreground objects. 3D reconstruction systems
create 3D models based on static objects captured in images.
Dynamic objects are rarely reconstructed. Projective texture mapping
of images, including these dynamic objects, can result in visual
artefacts. A workflow is developed to resolve this, resulting in videos
and 3D reconstructions of streets with no moving vehicles on the scene.
The final simulator using 3D reconstruction and projective texture
mapping is then developed. The rendering camera had a motion
model introduced to enable human interaction. The final system is
presented, experimentally tested, and future potential works are discussed
Human Body Scattering Effects at Millimeter Waves Frequencies for Future 5G Systems and Beyond
[ES] Se espera que las futuras comunicaciones móviles experimenten una revolución técnica que vaya más allá de las velocidades de datos de Gbps y reduzca las latencias de las velocidades de datos a niveles muy cercanos al milisegundo. Se han investigado nuevas tecnologÃas habilitadoras para lograr estas exigentes especificaciones. Y la utilización de las bandas de ondas milimétricas, donde hay mucho espectro disponible, es una de ellas.
Debido a las numerosas dificultades técnicas asociadas a la utilización de esta banda de frecuencias, se necesitan complicados modelos de canal para anticipar las caracterÃsticas del canal de radio y evaluar con precisión el rendimiento de los sistemas celulares en milimétricas. En concreto, los modelos de propagación más precisos son los basados en técnicas de trazado de rayos deterministas. Pero estas técnicas tienen el estigma de ser computacionalmente exigentes, y esto dificulta su uso para caracterizar el canal de radio en escenarios interiores complejos y dinámicos. La complejidad de la caracterización de estos escenarios depende en gran medida de la interacción del cuerpo humano con el entorno radioeléctrico, que en las ondas milimétricas suele ser destructiva y muy impredecible.
Por otro lado, en los últimos años, la industria de los videojuegos ha desarrollado potentes herramientas para entornos hiperrealistas, donde la mayor parte de los avances en esta emulación de la realidad tienen que ver con el manejo de la luz. AsÃ, los motores gráficos de estas plataformas se han vuelto cada vez más eficientes para manejar grandes volúmenes de información, por lo que son ideales para emular el comportamiento de la propagación de las ondas de radio, asà como para reconstruir un escenario interior complejo. Por ello, en esta Tesis se ha aprovechado la capacidad computacional de este tipo de herramientas para evaluar el canal radioeléctrico milimétricas de la forma más eficiente posible.
Esta Tesis ofrece unas pautas para optimizar la propagación de la señal en milimétricas en un entorno interior dinámico y complejo, para lo cual se proponen tres objetivos principales.
El primer objetivo es evaluar los efectos de dispersión del cuerpo humano cuando interactúa con el canal de propagación. Una vez evaluado, se propuso un modelo matemático y geométrico simplificado para calcular este efecto de forma fiable y rápida. Otro objetivo fue el diseño de un reflector pasivo modular en milimétricas, que optimiza la cobertura en entornos de interior, evitando la interferencia del ser humano en la propagación. Y, por último, se diseñó un sistema de apuntamiento del haz predictivo en tiempo real, para que opere con el sistema de radiación en milimétricas, cuyo objetivo es evitar las pérdidas de propagación causadas por el cuerpo humano en entornos interiores dinámicos y complejos.[CA] S'espera que les futures comunicacions mòbils experimenten una revolució tècnica que vaja més enllà de les velocitats de dades de Gbps i reduïsca les latències de les velocitats de dades a nivells molt pròxims al milisegundo. S'han investigat noves tecnologies habilitadoras per a aconseguir estes exigents especificacions. I la utilització de les bandes d'ones millimètriques, on hi ha molt espectre disponible, és una d'elles.
A causa de les nombroses dificultats tècniques associades a la utilització d'esta banda de freqüències, es necessiten complicats models de canal per a anticipar les caracterÃstiques del canal de rà dio i avaluar amb precisió el rendiment dels sistemes cellulars en millimètriques. En concret, els models de propagació més precisos són els basats en tècniques de traçat de rajos deterministes. Però estes tècniques tenen l'estigma de ser computacionalment exigents, i açò dificulta el seu ús per a caracteritzar el canal de rà dio en escenaris interiors complexos i dinà mics. La complexitat de la caracterització d'estos escenaris depén en gran manera de la interacció del cos humà amb l'entorn radioelèctric, que en les ones millimètriques sol ser destructiva i molt impredicible.
D'altra banda, en els últims anys, la indústria dels videojocs ha desenrotllat potents ferramentes per a entorns hiperrealistes, on la major part dels avanços en esta emulació de la realitat tenen a veure amb el maneig de la llum. AixÃ, els motors grà fics d'estes plataformes s'han tornat cada vegada més eficients per a manejar grans volums d'informació, per la qual cosa són ideals per a emular el comportament de la propagació de les ones de rà dio, aixà com per a reconstruir un escenari interior complex. Per això, en esta Tesi s'ha aprofitat la capacitat computacional d'este tipus de ferramentes per a avaluar el canal radioelèctric millimètriques de la manera més eficient possible.
Esta Tesi oferix unes pautes per a optimitzar la propagació del senyal en millimètriques en un entorn interior dinà mic i complex, per a la qual cosa es proposen tres objectius principals. El primer objectiu és avaluar els efectes de dispersió del cos humà quan interactua amb el canal de propagació. Una vegada avaluat, es va proposar un model matemà tic i geomètric simplificat per a calcular este efecte de forma fiable i rà pida. Un altre objectiu va ser el disseny d'un reflector passiu modular en millimètriques, que optimitza la cobertura en entorns d'interior, evitant la interferència del ser humà en la propagació, per a aixà evitar pèrdues de propagació addicionals. I, finalment, es va dissenyar un sistema d'apuntament del feix predictiu en temps real, perquè opere amb el sistema de radiació en millimètriques, l'objectiu del qual és evitar les pèrdues de propagació causades pel cos humà en entorns interiors dinà mics i complexos.[EN] Future mobile communications are expected to experience a technical revolution that goes beyond Gbps data rates and reduces data rate latencies to levels very close to a millisecond. New enabling technologies have been researched to achieve these demanding specifications. The utilization of mmWave bands, where a lot of spectrum is available, is one of them.
Due to the numerous technical difficulties associated with using this frequency band, complicated channel models are necessary to anticipate the radio channel characteristics and to accurately evaluate the performance of cellular systems in mmWave. In particular, the most accurate propagation models are those based on deterministic ray tracing techniques. But these techniques have the stigma of being computationally intensive, and this makes it difficult to use them to characterize the radio channel in complex and dynamic indoor scenarios. The complexity of characterizing these scenarios depends largely on the interaction of the human body with the radio environment, which at mmWaves is often destructive and highly unpredictable.
On the other hand, in recent years, the video game industry has developed powerful tools for hyper-realistic environments, where most of the progress in this reality emulation has to do with the handling of light. Therefore, the graphic engines of these platforms have become more and more efficient to handle large volumes of information, becoming ideal to emulate the radio wave propagation behavior, as well as to reconstruct a complex interior scenario. Therefore, in this Thesis one has taken advantage of the computational capacity of this type of tools to evaluate the mmWave radio channel in the most efficient way possible. This Thesis offers some guidelines to optimize the signal propagation in mmWaves in a dynamic and complex indoor environment, for which three main objectives are proposed.
The first objective has been to evaluate the scattering effects of the human body when it interacts with the propagation channel. Once evaluated, a simplified mathematical and geometrical model has been proposed to calculate this effect in a reliable and fast way. Another objective has been the design of a modular passive reflector in mmWaves, which optimizes the coverage in indoor environments, avoiding human interference in the propagation, in order to avoid its harmful scattering effects. And finally, a real-time predictive beam steering system has been designed for the mmWaves radiation system, in order to avoid propagation losses caused by the human body in dynamic and complex indoor environments.Romero Peña, JS. (2022). Human Body Scattering Effects at Millimeter Waves Frequencies for Future 5G Systems and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19132