44 research outputs found
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Face recognition has attracted increasing attention due to its wide range of
applications, but it is still challenging when facing large variations in the
biometric data characteristics. Lenslet light field cameras have recently come
into prominence to capture rich spatio-angular information, thus offering new
possibilities for advanced biometric recognition systems. This paper proposes a
double-deep spatio-angular learning framework for light field based face
recognition, which is able to learn both texture and angular dynamics in
sequence using convolutional representations; this is a novel recognition
framework that has never been proposed before for either face recognition or
any other visual recognition task. The proposed double-deep learning framework
includes a long short-term memory (LSTM) recurrent network whose inputs are
VGG-Face descriptions that are computed using a VGG-Very-Deep-16 convolutional
neural network (CNN). The VGG-16 network uses different face viewpoints
rendered from a full light field image, which are organised as a pseudo-video
sequence. A comprehensive set of experiments has been conducted with the
IST-EURECOM light field face database, for varied and challenging recognition
tasks. Results show that the proposed framework achieves superior face
recognition performance when compared to the state-of-the-art.Comment: Submitted to IEEE Transactions on Circuits and Systems for Video
Technolog
Light Field Saliency Detection with Deep Convolutional Networks
Light field imaging presents an attractive alternative to RGB imaging because
of the recording of the direction of the incoming light. The detection of
salient regions in a light field image benefits from the additional modeling of
angular patterns. For RGB imaging, methods using CNNs have achieved excellent
results on a range of tasks, including saliency detection. However, it is not
trivial to use CNN-based methods for saliency detection on light field images
because these methods are not specifically designed for processing light field
inputs. In addition, current light field datasets are not sufficiently large to
train CNNs. To overcome these issues, we present a new Lytro Illum dataset,
which contains 640 light fields and their corresponding ground-truth saliency
maps. Compared to current light field saliency datasets [1], [2], our new
dataset is larger, of higher quality, contains more variation and more types of
light field inputs. This makes our dataset suitable for training deeper
networks and benchmarking. Furthermore, we propose a novel end-to-end CNN-based
framework for light field saliency detection. Specifically, we propose three
novel MAC (Model Angular Changes) blocks to process light field micro-lens
images. We systematically study the impact of different architecture variants
and compare light field saliency with regular 2D saliency. Our extensive
comparisons indicate that our novel network significantly outperforms
state-of-the-art methods on the proposed dataset and has desired generalization
abilities on other existing datasets.Comment: 14 pages, 14 figure
Weighted bi-prediction for light field image coding
Light field imaging based on a single-tier camera equipped with a microlens array – also known as integral, holoscopic, and plenoptic imaging – has currently risen up as a practical and prospective approach for future visual applications and services. However, successfully deploying actual light field imaging applications and services will require developing adequate coding solutions to efficiently handle the massive amount of data involved in these systems. In this context, self-similarity compensated prediction is a non-local spatial prediction scheme based on block matching that has been shown to achieve high efficiency for light field image coding based on the High Efficiency Video Coding (HEVC) standard. As previously shown by the authors, this is possible by simply averaging two predictor blocks that are jointly estimated from a causal search window in the current frame itself, referred to as self-similarity bi-prediction. However, theoretical analyses for motion compensated bi-prediction have suggested that it is still possible to achieve further rate-distortion performance improvements by adaptively estimating the weighting coefficients of the two predictor blocks. Therefore, this paper presents a comprehensive study of the rate-distortion performance for HEVC-based light field image coding when using different sets of weighting coefficients for self-similarity bi-prediction. Experimental results demonstrate that it is possible to extend the previous theoretical conclusions to light field image coding and show that the proposed adaptive weighting coefficient selection leads to up to 5 % of bit savings compared to the previous self-similarity bi-prediction scheme.info:eu-repo/semantics/acceptedVersio
HEVC-based light field image coding with bi-predicted self-similarity compensation
This paper proposes an efficient light field image coding (LFC) solution based on High Efficiency Video Coding (HEVC). The proposed light field codec makes use of the self-similarity (SS) compensated prediction concept to efficiently explore the inherent correlation of this type of content. To further improve the coding performance, a bi-predicted SS estimation and SS compensation is proposed, where the candidate predictor can be also devised as a linear combination of two blocks within the same search window. In addition, an improved vector prediction scheme is also used to take advantage of the particular characteristics of the SS prediction vectors. Experimental results show that the proposed LFC scheme is able to outperform the benchmark solutions with significant gains.info:eu-repo/semantics/acceptedVersio
Improved inter-layer prediction for Light field content coding with display scalability
Light field imaging based on microlens arrays - also known as plenoptic, holoscopic and integral imaging - has recently risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display scalable coding solution is essential. In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers. Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the authors' previous scalable codec.info:eu-repo/semantics/acceptedVersio
Scalable light field coding with support for region of interest enhancement
Light field imaging based on microlens arrays - a.k.a. holoscopic, plenoptic, and integral imaging - has currently risen up as a feasible and prospective technology for future image and video applications. However, deploying actual light field applications will require identifying more powerful representation and coding solutions that support emerging manipulation and interaction functionalities. In this context, this paper proposes a novel scalable coding approach that supports a new type of scalability, referred to as Field of View (FOV) scalability, in which enhancement layers can correspond to regions of interest (ROI). The proposed scalable coding approach comprises a base layer compliant with the High Efficiency Video Coding (HEVC) standard, complemented by one or more enhancement layers that progressively allow richer versions of the same light field content in terms of content manipulation and interaction possibilities, for the whole scene or just for a given ROI. Experimental results show the advantages of the proposed scalable coding approach with ROI support to cater for users with different preferences/requirements in terms of interaction functionalities.info:eu-repo/semantics/acceptedVersio
Light field image coding with jointly estimated self-similarity bi-prediction
This paper proposes an efficient light field image coding (LFC) solution based on High Efficiency Video Coding (HEVC) and a novel Bi-prediction Self-Similarity (Bi-SS) estimation and compensation approach to efficiently explore the inherent non-local spatial correlation of this type of content, where two predictor blocks are jointly estimated from the same search window by using a locally optimal rate constrained algorithm. Moreover, a theoretical analysis of the proposed Bi-SS prediction is also presented, which shows that other non-local spatial prediction schemes proposed in literature are suboptimal in terms of Rate-Distortion (RD) performance and, for this reason, can be considered as restricted cases of the jointly estimated Bi-SS solution proposed here. These theoretical insights are shown to be consistent with the presented experimental results, and demonstrate that the proposed LFC scheme is able to outperform the benchmark solutions with significant gains with respect to HEVC (with up to 61.1% of bit savings) and other state-of-the-art LFC solutions in the literature (with up 16.9% of bit savings).info:eu-repo/semantics/acceptedVersio