60 research outputs found

    Light Field Denoising via Anisotropic Parallax Analysis in a CNN Framework

    Full text link
    Light field (LF) cameras provide perspective information of scenes by taking directional measurements of the focusing light rays. The raw outputs are usually dark with additive camera noise, which impedes subsequent processing and applications. We propose a novel LF denoising framework based on anisotropic parallax analysis (APA). Two convolutional neural networks are jointly designed for the task: first, the structural parallax synthesis network predicts the parallax details for the entire LF based on a set of anisotropic parallax features. These novel features can efficiently capture the high frequency perspective components of a LF from noisy observations. Second, the view-dependent detail compensation network restores non-Lambertian variation to each LF view by involving view-specific spatial energies. Extensive experiments show that the proposed APA LF denoiser provides a much better denoising performance than state-of-the-art methods in terms of visual quality and in preservation of parallax details

    Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions

    Full text link
    Depth estimation is a fundamental problem for light field photography applications. Numerous methods have been proposed in recent years, which either focus on crafting cost terms for more robust matching, or on analyzing the geometry of scene structures embedded in the epipolar-plane images. Significant improvements have been made in terms of overall depth estimation error; however, current state-of-the-art methods still show limitations in handling intricate occluding structures and complex scenes with multiple occlusions. To address these challenging issues, we propose a very effective depth estimation framework which focuses on regularizing the initial label confidence map and edge strength weights. Specifically, we first detect partially occluded boundary regions (POBR) via superpixel based regularization. Series of shrinkage/reinforcement operations are then applied on the label confidence map and edge strength weights over the POBR. We show that after weight manipulations, even a low-complexity weighted least squares model can produce much better depth estimation than state-of-the-art methods in terms of average disparity error rate, occlusion boundary precision-recall rate, and the preservation of intricate visual features

    Low-latency compression of mocap data using learned spatial decorrelation transform

    Full text link
    Due to the growing needs of human motion capture (mocap) in movie, video games, sports, etc., it is highly desired to compress mocap data for efficient storage and transmission. This paper presents two efficient frameworks for compressing human mocap data with low latency. The first framework processes the data in a frame-by-frame manner so that it is ideal for mocap data streaming and time critical applications. The second one is clip-based and provides a flexible tradeoff between latency and compression performance. Since mocap data exhibits some unique spatial characteristics, we propose a very effective transform, namely learned orthogonal transform (LOT), for reducing the spatial redundancy. The LOT problem is formulated as minimizing square error regularized by orthogonality and sparsity and solved via alternating iteration. We also adopt a predictive coding and temporal DCT for temporal decorrelation in the frame- and clip-based frameworks, respectively. Experimental results show that the proposed frameworks can produce higher compression performance at lower computational cost and latency than the state-of-the-art methods.Comment: 15 pages, 9 figure

    Human Motion Capture Data Tailored Transform Coding

    Full text link
    Human motion capture (mocap) is a widely used technique for digitalizing human movements. With growing usage, compressing mocap data has received increasing attention, since compact data size enables efficient storage and transmission. Our analysis shows that mocap data have some unique characteristics that distinguish themselves from images and videos. Therefore, directly borrowing image or video compression techniques, such as discrete cosine transform, does not work well. In this paper, we propose a novel mocap-tailored transform coding algorithm that takes advantage of these features. Our algorithm segments the input mocap sequences into clips, which are represented in 2D matrices. Then it computes a set of data-dependent orthogonal bases to transform the matrices to frequency domain, in which the transform coefficients have significantly less dependency. Finally, the compression is obtained by entropy coding of the quantized coefficients and the bases. Our method has low computational cost and can be easily extended to compress mocap databases. It also requires neither training nor complicated parameter setting. Experimental results demonstrate that the proposed scheme significantly outperforms state-of-the-art algorithms in terms of compression performance and speed

    Weakly-supervised Part-Attention and Mentored Networks for Vehicle Re-Identification

    Full text link
    Vehicle re-identification (Re-ID) aims to retrieve images with the same vehicle ID across different cameras. Current part-level feature learning methods typically detect vehicle parts via uniform division, outside tools, or attention modeling. However, such part features often require expensive additional annotations and cause sub-optimal performance in case of unreliable part mask predictions. In this paper, we propose a weakly-supervised Part-Attention Network (PANet) and Part-Mentored Network (PMNet) for Vehicle Re-ID. Firstly, PANet localizes vehicle parts via part-relevant channel recalibration and cluster-based mask generation without vehicle part supervisory information. Secondly, PMNet leverages teacher-student guided learning to distill vehicle part-specific features from PANet and performs multi-scale global-part feature extraction. During inference, PMNet can adaptively extract discriminative part features without part localization by PANet, preventing unstable part mask predictions. We address this Re-ID issue as a multi-task problem and adopt Homoscedastic Uncertainty to learn the optimal weighing of ID losses. Experiments are conducted on two public benchmarks, showing that our approach outperforms recent methods, which require no extra annotations by an average increase of 3.0% in CMC@5 on VehicleID and over 1.4% in mAP on VeRi776. Moreover, our method can extend to the occluded vehicle Re-ID task and exhibits good generalization ability.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl