31,336 research outputs found

    Overview of MV-HEVC prediction structures for light field video

    Get PDF
    Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions

    Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification

    Get PDF
    Recent work on scene classification still makes use of generic CNN features in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline built upon deep CNN features to harvest discriminative visual objects and parts for scene classification. We first use a region proposal technique to generate a set of high-quality patches potentially containing objects, and apply a pre-trained CNN to extract generic deep features from these patches. Then we perform both unsupervised and weakly supervised learning to screen these patches and discover discriminative ones representing category-specific objects and parts. We further apply discriminative clustering enhanced with local CNN fine-tuning to aggregate similar objects and parts into groups, called meta objects. A scene image representation is constructed by pooling the feature response maps of all the learned meta objects at multiple spatial scales. We have confirmed that the scene image representation obtained using this new pipeline is capable of delivering state-of-the-art performance on two popular scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and Sun397~\cite{Sun397}Comment: To Appear in ICCV 201

    Static 3D Triangle Mesh Compression Overview

    Get PDF
    3D triangle meshes are extremely used to model discrete surfaces, and almost always represented with two tables: one for geometry and another for connectivity. While the raw size of a triangle mesh is of around 200 bits per vertex, by coding cleverly (and separately) those two distinct kinds of information it is possible to achieve compression ratios of 15:1 or more. Different techniques must be used depending on whether single-rate vs. progressive bitstreams are sought; and, in the latter case, on whether or not hierarchically nested meshes are desirable during reconstructio

    Neural coding strategies and mechanisms of competition

    Get PDF
    A long running debate has concerned the question of whether neural representations are encoded using a distributed or a local coding scheme. In both schemes individual neurons respond to certain specific patterns of pre-synaptic activity. Hence, rather than being dichotomous, both coding schemes are based on the same representational mechanism. We argue that a population of neurons needs to be capable of learning both local and distributed representations, as appropriate to the task, and should be capable of generating both local and distributed codes in response to different stimuli. Many neural network algorithms, which are often employed as models of cognitive processes, fail to meet all these requirements. In contrast, we present a neural network architecture which enables a single algorithm to efficiently learn, and respond using, both types of coding scheme
    corecore