6,904 research outputs found
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Recurrent Convolutional Neural Networks for Scene Parsing
Scene parsing is a technique that consist on giving a label to all pixels in
an image according to the class they belong to. To ensure a good visual
coherence and a high class accuracy, it is essential for a scene parser to
capture image long range dependencies. In a feed-forward architecture, this can
be simply achieved by considering a sufficiently large input context patch,
around each pixel to be labeled. We propose an approach consisting of a
recurrent convolutional neural network which allows us to consider a large
input context, while limiting the capacity of the model. Contrary to most
standard approaches, our method does not rely on any segmentation methods, nor
any task-specific features. The system is trained in an end-to-end manner over
raw pixels, and models complex spatial dependencies with low inference cost. As
the context size increases with the built-in recurrence, the system identifies
and corrects its own errors. Our approach yields state-of-the-art performance
on both the Stanford Background Dataset and the SIFT Flow Dataset, while
remaining very fast at test time
- …