19 research outputs found
HEVC based Mixed-Resolution Stereo Video Codec
This paper presents a High Efficiency Video Codec (HEVC) based spatial mixed-resolution stereo video codec. The proposed codec applies a frame interleaving algorithm to reorder the stereo video frames into a monoscopic video. The challenge for mixed-resolution video coding is to enable the codec to encode frames with different frame resolutions. This issue is addressed by superimposing a low resolution replica of the decoded I-frame on its respective decoded picture, where remaining space of the frame is set to zero. This significantly reduces the computation cost for finding the best match. The proposed codec’s reference frames structure is designed to efficiently exploit both temporal and inter-view correlations. Performance of the proposed codec is assessed using five standard multiview video datasets and benchmarked against that of the anchor and the state-of-the-art techniques. Results show that the proposed codec yields significantly higher coding performance compared to the anchor and state-of-the-art techniques
Asymmetric 3D video coding based on regions of perceptual relevance
This dissertation presents a study and experimental research on asymmetric coding of
stereoscopic video. A review on 3D technologies, video formats and coding is rst presented
and then particular emphasis is given to asymmetric coding of 3D content and
performance evaluation methods, based on subjective measures, of methods using asymmetric
coding.
The research objective was de ned to be an extension of the current concept of asymmetric
coding for stereo video. To achieve this objective the rst step consists in de ning
regions in the spatial dimension of auxiliary view with di erent perceptual relevance
within the stereo pair, which are identi ed by a binary mask. Then these regions are
encoded with better quality (lower quantisation) for the most relevant ones and worse
quality (higher quantisation) for the those with lower perceptual relevance. The actual
estimation of the relevance of a given region is based on a measure of disparity according
to the absolute di erence between views. To allow encoding of a stereo sequence using
this method, a reference H.264/MVC encoder (JM) has been modi ed to allow additional
con guration parameters and inputs. The nal encoder is still standard compliant.
In order to show the viability of the method subjective assessment tests were performed
over a wide range of objective qualities of the auxiliary view. The results of these tests
allow us to prove 3 main goals. First, it is shown that the proposed method can be
more e cient than traditional asymmetric coding when encoding stereo video at higher
qualities/rates. The method can also be used to extend the threshold at which uniform
asymmetric coding methods start to have an impact on the subjective quality perceived
by the observers. Finally the issue of eye dominance is addressed. Results from stereo
still images displayed over a short period of time showed it has little or no impact on the
proposed method
Methods for Light Field Display Profiling and Scalable Super-Multiview Video Coding
Light field 3D displays reproduce the light field of real or synthetic scenes, as observed by multiple viewers, without the necessity of wearing 3D glasses. Reproducing light fields is a technically challenging task in terms of optical setup, content creation, distributed rendering, among others; however, the impressive visual quality of hologramlike scenes, in full color, with real-time frame rates, and over a very wide field of view justifies the complexity involved. Seeing objects popping far out from the screen plane without glasses impresses even those viewers who have experienced other 3D displays before.Content for these displays can either be synthetic or real. The creation of synthetic (rendered) content is relatively well understood and used in practice. Depending on the technique used, rendering has its own complexities, quite similar to the complexity of rendering techniques for 2D displays. While rendering can be used in many use-cases, the holy grail of all 3D display technologies is to become the future 3DTVs, ending up in each living room and showing realistic 3D content without glasses. Capturing, transmitting, and rendering live scenes as light fields is extremely challenging, and it is necessary if we are about to experience light field 3D television showing real people and natural scenes, or realistic 3D video conferencing with real eye-contact.In order to provide the required realism, light field displays aim to provide a wide field of view (up to 180°), while reproducing up to ~80 MPixels nowadays. Building gigapixel light field displays is realistic in the next few years. Likewise, capturing live light fields involves using many synchronized cameras that cover the same display wide field of view and provide the same high pixel count. Therefore, light field capture and content creation has to be well optimized with respect to the targeted display technologies. Two major challenges in this process are addressed in this dissertation.The first challenge is how to characterize the display in terms of its capabilities to create light fields, that is how to profile the display in question. In clearer terms this boils down to finding the equivalent spatial resolution, which is similar to the screen resolution of 2D displays, and angular resolution, which describes the smallest angle, the color of which the display can control individually. Light field is formalized as 4D approximation of the plenoptic function in terms of geometrical optics through spatiallylocalized and angularly-directed light rays in the so-called ray space. Plenoptic Sampling Theory provides the required conditions to sample and reconstruct light fields. Subsequently, light field displays can be characterized in the Fourier domain by the effective display bandwidth they support. In the thesis, a methodology for displayspecific light field analysis is proposed. It regards the display as a signal processing channel and analyses it as such in spectral domain. As a result, one is able to derive the display throughput (i.e. the display bandwidth) and, subsequently, the optimal camera configuration to efficiently capture and filter light fields before displaying them.While the geometrical topology of optical light sources in projection-based light field displays can be used to theoretically derive display bandwidth, and its spatial and angular resolution, in many cases this topology is not available to the user. Furthermore, there are many implementation details which cause the display to deviate from its theoretical model. In such cases, profiling light field displays in terms of spatial and angular resolution has to be done by measurements. Measurement methods that involve the display showing specific test patterns, which are then captured by a single static or moving camera, are proposed in the thesis. Determining the effective spatial and angular resolution of a light field display is then based on an automated analysis of the captured images, as they are reproduced by the display, in the frequency domain. The analysis reveals the empirical limits of the display in terms of pass-band both in the spatial and angular dimension. Furthermore, the spatial resolution measurements are validated by subjective tests confirming that the results are in line with the smallest features human observers can perceive on the same display. The resolution values obtained can be used to design the optimal capture setup for the display in question.The second challenge is related with the massive number of views and pixels captured that have to be transmitted to the display. It clearly requires effective and efficient compression techniques to fit in the bandwidth available, as an uncompressed representation of such a super-multiview video could easily consume ~20 gigabits per second with today’s displays. Due to the high number of light rays to be captured, transmitted and rendered, distributed systems are necessary for both capturing and rendering the light field. During the first attempts to implement real-time light field capturing, transmission and rendering using a brute force approach, limitations became apparent. Still, due to the best possible image quality achievable with dense multi-camera light field capturing and light ray interpolation, this approach was chosen as the basis of further work, despite the massive amount of bandwidth needed. Decompression of all camera images in all rendering nodes, however, is prohibitively time consuming and is not scalable. After analyzing the light field interpolation process and the data-access patterns typical in a distributed light field rendering system, an approach to reduce the amount of data required in the rendering nodes has been proposed. This approach, on the other hand, requires rectangular parts (typically vertical bars in case of a Horizontal Parallax Only light field display) of the captured images to be available in the rendering nodes, which might be exploited to reduce the time spent with decompression of video streams. However, partial decoding is not readily supported by common image / video codecs. In the thesis, approaches aimed at achieving partial decoding are proposed for H.264, HEVC, JPEG and JPEG2000 and the results are compared.The results of the thesis on display profiling facilitate the design of optimal camera setups for capturing scenes to be reproduced on 3D light field displays. The developed super-multiview content encoding also facilitates light field rendering in real-time. This makes live light field transmission and real-time teleconferencing possible in a scalable way, using any number of cameras, and at the spatial and angular resolution the display actually needs for achieving a compelling visual experience
Error resilience and concealment techniques for high-efficiency video coding
This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods
RECONSTRUCTION OF BURNER FLAMES THROUGH DEEP LEARNING
This MSc thesis reports the design, implementation, and experimental evaluation of a deep
learning-based system for the three-dimensional (3-D) reconstruction and visualisation of
fossil-fired burner flames. A literature review is given to examine all existing techniques for
3-D visualisation and characterisation of flames. Methodologies and techniques for the 3-D
reconstruction of burner flames using optical tomographic and deep learning (DL) techniques
are presented, together with a discussion of their advantages and limitations in their
applications. Technical requirements and existing problems of the reviewed techniques are
discussed.
A technical strategy, incorporating numerical simulations, DL, digital image processing and
optical tomographic techniques is proposed for the reconstruction and visualisation of a
flame. Based on this strategy, a 3-D flame reconstruction and visualisation system based on
DL is developed. The system consists of a trained convolutional neural network (CNN) based
network model and the use of a third-party software tool for visualisation. The system can
use flame images acquired concurrently from eight different directions of a burner and
perform a 3-D reconstruction of the flame. A numerical simulation is performed initially to
examine the suitability of the DL algorithm proposed, ground truth data are generated using
a mathematical model designed to mimic a flame structure and 2-D projection data are
generated from each ground truth. A modified CNN model with a 1-D output dense layer is
established and trained for the reconstruction of the 3-D Gaussian distribution. To determine
the optimal network model architecture for this solution, various experiments were conducted
using different network model parameters. A detailed description of a CNN-based network
implemented for the numerical solutions is presented.
A series of experiments was conducted using flame data obtained from a laboratory-scale
combustion test rig to evaluate the performance of the established CNN model. These
included implementing code to perform image processing routines to prepare the dataset
collected from the laboratory-scale combustion test rig. Additional datasets were also
generated using OpenCV morphological transformation operations to augment the original
dataset. The obtained results have proven that the implemented and trained CNN network
model can reconstruct the cross-sectional slices of a burner flame based on the images
obtained under various combustion conditions. It was also possible to obtain a 3-D flame
structure from the reconstructed cross-sectional flame data using a 3-D visualisation tool.
Results from the experiments and the performance of the implemented 3-D flame
reconstruction and visualisation system based on DL are presented and discussed
Perceptually Optimized Visualization on Autostereoscopic 3D Displays
The family of displays, which aims to visualize a 3D scene with realistic depth, are known as "3D displays". Due to technical limitations and design decisions, such displays create visible distortions, which are interpreted by the human vision as artefacts. In absence of visual reference (e.g. the original scene is not available for comparison) one can improve the perceived quality of the representations by making the distortions less visible. This thesis proposes a number of signal processing techniques for decreasing the visibility of artefacts on 3D displays.
The visual perception of depth is discussed, and the properties (depth cues) of a scene which the brain uses for assessing an image in 3D are identified. Following the physiology of vision, a taxonomy of 3D artefacts is proposed. The taxonomy classifies the artefacts based on their origin and on the way they are interpreted by the human visual system.
The principles of operation of the most popular types of 3D displays are explained. Based on the display operation principles, 3D displays are modelled as a signal processing channel. The model is used to explain the process of introducing distortions. It also allows one to identify which optical properties of a display are most relevant to the creation of artefacts. A set of optical properties for dual-view and multiview 3D displays are identified, and a methodology for measuring them is introduced. The measurement methodology allows one to derive the angular visibility and crosstalk of each display element without the need for precision measurement equipment. Based on the measurements, a methodology for creating a quality profile of 3D displays is proposed. The quality profile can be either simulated using the angular brightness function or directly measured from a series of photographs. A comparative study introducing the measurement results on the visual quality and position of the sweet-spots of eleven 3D displays of different types is presented. Knowing the sweet-spot position and the quality profile allows for easy comparison between 3D displays. The shape and size of the passband allows depth and textures of a 3D content to be optimized for a given 3D display.
Based on knowledge of 3D artefact visibility and an understanding of distortions introduced by 3D displays, a number of signal processing techniques for artefact mitigation are created. A methodology for creating anti-aliasing filters for 3D displays is proposed. For multiview displays, the methodology is extended towards so-called passband optimization which addresses Moiré, fixed-pattern-noise and ghosting artefacts, which are characteristic for such displays. Additionally, design of tuneable anti-aliasing filters is presented, along with a framework which allows the user to select the so-called 3d sharpness parameter according to his or her preferences. Finally, a set of real-time algorithms for view-point-based optimization are presented. These algorithms require active user-tracking, which is implemented as a combination of face and eye-tracking. Once the observer position is known, the image on a stereoscopic display is optimised for the derived observation angle and distance. For multiview displays, the combination of precise light re-direction and less-precise face-tracking is used for extending the head parallax. For some user-tracking algorithms, implementation details are given, regarding execution of the algorithm on a mobile device or on desktop computer with graphical accelerator
Codificação compatível de vídeo 3D com o algoritmo HEVC
Esta dissertação apresenta um trabalho sobre codificação de vídeo 3D compatível com
vídeo 2D. Tem por base o desenvolvimento de um método para melhorar, no descodificador,
a reconstrução de uma vista subamostrada resultante de uma transmissão simulcast
usando a norma de codificação de vídeo H.265 (informalmente denominada de High Efficiency
Video Coding (HEVC)).
Apesar de manter a compatibilidade com vídeo 2D a transmissão simulcast normalmente
requer uma taxa de transmissão elevada. Na ausência de ferramentas de codificação
3D adequadas é possível reduzir a taxa de transmissão utilizando compressão assimétrica
do vídeo, onde a vista base é codificada com a resolução espacial original, enquanto que
a vista auxiliar é codificada com uma resolução espacial menor, sendo sobreamostrada no
descodificador.
O método desenvolvido visa melhorar a vista auxiliar sobreamostrada no descodificador
utilizando informação dos detalhes da vista base, ou seja, as componentes de alta
frequência. Este processo depende de transformadas Afim para realizar um mapeamento
geométrico entre a informação de alta frequência da vista base de resolução completa e
a vista auxiliar de menor resolução. Adicionalmente, de modo a manter a continuidade
do conteúdo da imagem entre regiões, evitando artefatos de blocos, o mapeamento utiliza
uma malha de triangulação da vista auxiliar aplicado à imagem de detalhes obtida a partir
da vista base.
A técnica proposta é comparada com um método de estimação de disparidade por
correspondência de blocos, sendo que os resultados mostram que para algumas sequências
a técnica desenvolvida melhora não só a qualidade objetiva (PSNR) até 2.2 dB, mas
também a qualidade subjetiva, para a mesma taxa de compressão global