930 research outputs found

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Progressive modeling of steered mixture-of-experts for light field video approximation

    Get PDF
    Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. The goal of this paper is to introduce SMoE for 4D light field videos by including the temporal dimension. However, these videos contain vast amounts of samples due to the large number of views per frame. Previous work on static light field images mitigated the problem by hard subdividing the modeling problem. However, such a hard subdivision introduces visually disturbing block artifacts on moving objects in dynamic image data. We propose a novel modeling method that does not result in block artifacts while minimizing the computational complexity and which allows for a varying spread of kernels in the spatio-temporal domain. Experiments validate that we can progressively model light field videos with increasing objective quality up to 0.97 SSIM

    Hard real-time, pixel-parallel rendering of light field videos using steered mixture-of-experts

    Get PDF
    Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities such as light field images and video. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. Previous research has shown the feasibility of real-time pixel-parallel rendering of static light field images. Each pixel is independently reconstructed by kernels that lay in its vicinity. The number of kernels involved forms the bottleneck on the achievable framerate. The goal of this paper is twofold. Firstly, we introduce pixel-level rendering of light field video, as previous work only rendered static content. Secondly, we investigate rendering using a predefined number of most significant kernels. As such, we can deliver hard real-time constraints by trading off the reconstruction quality

    Hierarchical learning of sparse image representations using steered mixture-of-experts

    Get PDF
    Previous research showed highly efficient compression results for low bit-rates using Steered Mixture-of-Experts (SMoE), higher rates still pose a challenge due to the non-convex optimization problem that becomes more difficult when increasing the number of components. Therefore, a novel estimation method based on Hidden Markov Random Fields is introduced taking spatial dependencies of neighboring pixels into account combined with a tree-structured splitting strategy. Experimental evaluations for images show that our approach outperforms state-of-the-art techniques using only one robust parameter set. For video and light field modeling even more gain can be expected

    Random access prediction structures for light field video coding with MV-HEVC

    Get PDF
    Computational imaging and light field technology promise to deliver the required six-degrees-of-freedom for natural scenes in virtual reality. Already existing extensions of standardized video coding formats, such as multi-view coding and multi-view plus depth, are the most conventional light field video coding solutions at the moment. The latest multi-view coding format, which is a direct extension of the high efficiency video coding (HEVC) standard, is called multi-view HEVC (or MV-HEVC). MV-HEVC treats each light field view as a separate video sequence, and uses syntax elements similar to standard HEVC for exploiting redundancies between neighboring views. To achieve this, inter-view and temporal prediction schemes are deployed with the aim to find the most optimal trade-off between coding performance and reconstruction quality. The number of possible prediction structures is unlimited and many of them are proposed in the literature. Although some of them are efficient in terms of compression ratio, they complicate random access due to the dependencies on previously decoded pixels or frames. Random access is an important feature in video delivery, and a crucial requirement in multi-view video coding. In this work, we propose and compare different prediction structures for coding light field video using MV-HEVC with a focus on both compression efficiency and random accessibility. Experiments on three different short-baseline light field video sequences show the trade-off between bit-rate and distortion, as well as the average number of decoded views/frames, necessary for displaying any random frame at any time instance. The findings of this work indicate the most appropriate prediction structure depending on the available bandwidth and the required degree of random access

    Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

    Get PDF
    When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data
    corecore