597 research outputs found
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Scalable light field representation and coding
This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images.
To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions.
To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency.
A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually.
In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura.
Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem.
Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação.
Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente.
A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista
Recommended from our members
Estimation of LRD present in H.264 video traces using wavelet analysis and proving the paramount of H.264 using OPF technique in wi-fi environment.
While there has always been a tremendous demand for streaming video over
Wireless networks, the nature of the application still presents some challenging
issues. These applications that transmit coded video sequence data over best-effort
networks like the Internet, the application must cope with the changing network
behaviour; especially, the source encoder rate should be controlled based on
feedback from a channel estimator that explores the network intermittently. The
arrival of powerful video compression techniques such as H.264, which advance in
networking and telecommunications, opened up a whole new frontier for multimedia
communications. The aim of this research is to transmit the H.264 coded video
frames in the wireless network with maximum reliability and in a very efficient
manner. When the H.264 encoded video sequences are to be transmitted through
wireless network, it faces major difficulties in reaching the destination. The
characteristics of H.264 video coded sequences are studied fully and their capability
of transmitting in wireless networks are examined and a new approach called
Optimal Packet Fragmentation (OPF) is framed and the H.264 coded sequences are
tested in the wireless simulated environment. This research has three major studies
involved in it. First part of the research has the study about Long Range Dependence
(LRD) and the ways by which the self-similarity can be estimated. For estimating the
LRD a few studies are carried out and Wavelet-based estimator is selected for the
research because Wavelets incarcerate both time and frequency features in the data
and regularly provides a more affluent picture than the classical Fourier analysis.
The Wavelet used to estimate the self-similarity by using the variable called Hurst
Parameter. Hurst Parameter tells the researcher about how a data can behave inside the transmitted network. This Hurst Parameter should be calculated for a more
reliable transmission in the wireless network. The second part of the research deals
with MPEG-4 and H.264 encoder. The study is carried out to prove which encoder is
superior to the other. We need to know which encoder can provide excellent Quality
of Service (QoS) and reliability. This study proves with the help of Hurst parameter
that H.264 is superior to MPEG-4. The third part of the study is the vital part in this
research; it deals with the H.264 video coded frames that are segmented into optimal
packet size in the MAC Layer for an efficient and more reliable transfer in the
wireless network. Finally the H.264 encoded video frames incorporated with the
Optimal Packet Fragmentation are tested in the NS-2 wireless simulated network.
The research proves the superiority of H.264 video encoder and OPF¿s master class
VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard
The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard.
First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case.
Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder.
In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost
Efficient compression of synthetic video
Streaming of on-line gaming video is a challenging problem because of the enormous
amounts of video data that need to be sent during game playing, especially within the
limitations of uplink capabilities. The encoding complexity is also a challenge because of
the time delay while on-line gamers are communicating.
The main goal of this research study is to propose an enhanced on-line game video
streaming system. First, the most common video coding techniques have been evaluated.
The evaluation study considers objective and subjective metrics. Three widespread video
coding techniques are selected and evaluated in the study; H.264, MPEG-4 Visual and VP-
8. Diverse types of video sequences were used with different frame rates and resolutions.
The effects of changing frame rate and resolution on compression efficiency and viewers‟
satisfaction are also presented. Results showed that the compression process and perceptual
satisfaction are severely affected by the nature of the compressed sequence. As a result,
H.264 showed higher compression efficiency for synthetic sequences and outperformed
other codecs in the subjective evaluation tests.
Second, a fast inter prediction technique to speed up the encoding process of H.264 has
been devised. The on-line game streaming service is a real time application, thus,
compression complexity significantly affects the whole process of on-line streaming. H.264
has been recommended for synthetic video coding by our results gained in codecs
comparative studies. However, it still suffers from high encoding complexity; thus a low
complexity coding algorithm is presented as fast inter coding model with reference
management technique. The proposed algorithm was compared to a state of the art method,
the results showing better achievement in time and bit rate reduction with negligible loss of
fidelity.
Third, recommendations on tradeoff between frame rates and resolution within given uplink
capabilities are provided for H.264 video coding. The recommended tradeoffs are offered as a result of extensive experiments using Double Stimulus Impairment Scale (DSIS)
subjective evaluation metric. Experiments showed that viewers‟ satisfaction is profoundly
affected by varying frame rates and resolutions. In addition, increasing frame rate or frame
resolution does not always guarantee improved increments of perceptual quality. As a
result, tradeoffs are recommended to compromise between frame rate and resolution within
a given bit rate to guarantee the highest user satisfaction.
For system completeness and to facilitate the implementation of the proposed techniques,
an efficient game video streaming management system is proposed.
Compared to existing on-line live video service systems for games, the proposed system
provides improved coding efficiency, complexity reduction and better user satisfaction
Foveated Encoding for Large High-Resolution Displays
Collaborative exploration of scientific data sets across large high-resolution displays requires both high visual detail as well as low-latency transfer of image data (oftentimes inducing the need to trade one for the other). In this work, we present a system that dynamically adapts the encoding quality in such systems in a way that reduces the required bandwidth without impacting the details perceived by one or more observers. Humans perceive sharp, colourful details, in the small foveal region around the centre of the field of view, while information in the periphery is perceived blurred and colourless. We account for this by tracking the gaze of observers, and respectively adapting the quality parameter of each macroblock used by the H.264 encoder, considering the so-called visual acuity fall-off. This allows to substantially reduce the required bandwidth with barely noticeable changes in visual quality, which is crucial for collaborative analysis across display walls at different locations. We demonstrate the reduced overall required bandwidth and the high quality inside the foveated regions using particle rendering and parallel coordinates
Cyclostationary error analysis and filter properties in a 3D wavelet coding framework
The reconstruction error due to quantization of wavelet subbands can be modeled as a cyclostationary process because of the linear periodically shift variant property of the inverse wavelet transform. For N-dimensional data, N-dimensional reconstruction error power cyclostationary patterns replicate on the data sample lattice. For audio and image coding applications this fact is of little practical interest since the decoded data is perceived in its wholeness, the error power oscillations on single data elements cannot be seen or heard and a global PSNR error measure is often used to represent the reconstruction quality. A different situation is the one of 3D data (static volumes or video sequences) coding, where decoded data are usually visualized by plane sections and the reconstruction error power is commonly measured by a PSNR[n] sequence, with n representing either a spatial slicing plane (for volumetric data) or the temporal reference frame (for video). In this case, the cyclostationary oscillations on single data elements lead to a global PSNR[n] oscillation and this effect may become a relevant concern. In this paper we study and describe the above phenomena and evaluate their relevance in concrete coding applications. Our analysis is entirely carried out in the original signal domain and can easily be extended to more than three dimensions. We associate the oscillation pattern with the wavelet filter properties in a polyphase framework and we show that a substantial reduction of the oscillation amplitudes can be achieved under a proper selection of the basis functions. Our quantitative model is initially made under high-resolution conditions and then qualitatively extended to all coding rates for the wide family of bit-plane quantization-based coding techniques. Finally, we experimentally validate the proposed models and we perform a subjective evaluation of the visual relevance of the PSNR[n] fluctuations in the cases of medical volumes and video coding
- …