3 research outputs found
MiNL: Micro-images based Neural Representation for Light Fields
Traditional representations for light fields can be separated into two types:
explicit representation and implicit representation. Unlike explicit
representation that represents light fields as Sub-Aperture Images (SAIs) based
arrays or Micro-Images (MIs) based lenslet images, implicit representation
treats light fields as neural networks, which is inherently a continuous
representation in contrast to discrete explicit representation. However, at
present almost all the implicit representations for light fields utilize SAIs
to train an MLP to learn a pixel-wise mapping from 4D spatial-angular
coordinate to pixel colors, which is neither compact nor of low complexity.
Instead, in this paper we propose MiNL, a novel MI-wise implicit neural
representation for light fields that train an MLP + CNN to learn a mapping from
2D MI coordinates to MI colors. Given the micro-image's coordinate, MiNL
outputs the corresponding micro-image's RGB values. Light field encoding in
MiNL is just training a neural network to regress the micro-images and the
decoding process is a simple feedforward operation. Compared with common
pixel-wise implicit representation, MiNL is more compact and efficient that has
faster decoding speed (\textbf{80180} speed-up) as well as better
visual quality (\textbf{14dB} PSNR improvement on average)
MCPNS: A Macropixel Collocated Position and Its Neighbors Search for Plenoptic 2.0 Video Coding
Recently, it was demonstrated that a newly focused plenoptic 2.0 camera can
capture much higher spatial resolution owing to its effective light field
sampling, as compared to a traditional unfocused plenoptic 1.0 camera. However,
due to the nature difference of the optical structure between the plenoptic 1.0
and 2.0 cameras, the existing fast motion estimation (ME) method for plenoptic
1.0 videos is expected to be sub-optimal for encoding plenoptic 2.0 videos. In
this paper, we point out the main motion characteristic differences between
plenoptic 1.0 and 2.0 videos and then propose a new fast ME, called macropixel
collocated position and its neighbors search (MCPNS) for plenoptic 2.0 videos.
In detail, we propose to reduce the number of macropixel collocated position
(MCP) search candidates based on the new observation of center-biased motion
vector distribution at macropixel resolution. After that, due to large motion
deviation behavior around each MCP location in plenoptic 2.0 videos, we propose
to select a certain number of key MCP locations with the lowest matching cost
to perform the neighbors MCP search to improve the motion search accuracy.
Different from existing methods, our method can achieve better performance
without requiring prior knowledge of microlens array orientations. Our
simulation results confirmed the effectiveness of the proposed algorithm in
terms of both bitrate savings and computational costs compared to existing
methods.Comment: Under revie
Scalable light field representation and coding
This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images.
To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions.
To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency.
A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually.
In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura.
Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem.
Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação.
Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente.
A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista