406 research outputs found

    Light Field Denoising via Anisotropic Parallax Analysis in a CNN Framework

    Full text link
    Light field (LF) cameras provide perspective information of scenes by taking directional measurements of the focusing light rays. The raw outputs are usually dark with additive camera noise, which impedes subsequent processing and applications. We propose a novel LF denoising framework based on anisotropic parallax analysis (APA). Two convolutional neural networks are jointly designed for the task: first, the structural parallax synthesis network predicts the parallax details for the entire LF based on a set of anisotropic parallax features. These novel features can efficiently capture the high frequency perspective components of a LF from noisy observations. Second, the view-dependent detail compensation network restores non-Lambertian variation to each LF view by involving view-specific spatial energies. Extensive experiments show that the proposed APA LF denoiser provides a much better denoising performance than state-of-the-art methods in terms of visual quality and in preservation of parallax details

    Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising

    Full text link
    The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture by assembling various low-dimensional convolution patterns in a probability space for fully capturing spatial-angular information. Building upon the proposed PFE, we then leverage the intrinsic linear imaging model of the coded aperture camera to construct a cycle-consistent 4-D LF reconstruction network from coded measurements. Moreover, we incorporate PFE into an iterative optimization framework for 4-D LF denoising. Our extensive experiments demonstrate the significant superiority of our methods on both real-world and synthetic 4-D LF images, both quantitatively and qualitatively, when compared with state-of-the-art methods. The source code will be publicly available at https://github.com/lyuxianqiang/LFCA-CR-NET

    Unleash the Potential of 3D Point Cloud Modeling with A Calibrated Local Geometry-driven Distance Metric

    Full text link
    Quantifying the dissimilarity between two unstructured 3D point clouds is a challenging task, with existing metrics often relying on measuring the distance between corresponding points that can be either inefficient or ineffective. In this paper, we propose a novel distance metric called Calibrated Local Geometry Distance (CLGD), which computes the difference between the underlying 3D surfaces calibrated and induced by a set of reference points. By associating each reference point with two given point clouds through computing its directional distances to them, the difference in directional distances of an identical reference point characterizes the geometric difference between a typical local region of the two point clouds. Finally, CLGD is obtained by averaging the directional distance differences of all reference points. We evaluate CLGD on various optimization and unsupervised learning-based tasks, including shape reconstruction, rigid registration, scene flow estimation, and feature representation. Extensive experiments show that CLGD achieves significantly higher accuracy under all tasks in a memory and computationally efficient manner, compared with existing metrics. As a generic metric, CLGD has the potential to advance 3D point cloud modeling. The source code is publicly available at https://github.com/rsy6318/CLGD

    Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

    Full text link
    The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST

    Decoupling Dynamic Monocular Videos for Dynamic View Synthesis

    Full text link
    The challenge of dynamic view synthesis from dynamic monocular videos, i.e., synthesizing novel views for free viewpoints given a monocular video of a dynamic scene captured by a moving camera, mainly lies in accurately modeling the dynamic objects of a scene using limited 2D frames, each with a varying timestamp and viewpoint. Existing methods usually require pre-processed 2D optical flow and depth maps by off-the-shelf methods to supervise the network, making them suffer from the inaccuracy of the pre-processed supervision and the ambiguity when lifting the 2D information to 3D. In this paper, we tackle this challenge in an unsupervised fashion. Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints. The former enforces the 3D geometric surfaces of moving objects to be consistent over time, while the latter regularizes their appearances to be consistent across different viewpoints. Such a fine-grained motion formulation can alleviate the learning difficulty for the network, thus enabling it to produce not only novel views with higher quality but also more accurate scene flows and depth than existing methods requiring extra supervision

    Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions

    Full text link
    Depth estimation is a fundamental problem for light field photography applications. Numerous methods have been proposed in recent years, which either focus on crafting cost terms for more robust matching, or on analyzing the geometry of scene structures embedded in the epipolar-plane images. Significant improvements have been made in terms of overall depth estimation error; however, current state-of-the-art methods still show limitations in handling intricate occluding structures and complex scenes with multiple occlusions. To address these challenging issues, we propose a very effective depth estimation framework which focuses on regularizing the initial label confidence map and edge strength weights. Specifically, we first detect partially occluded boundary regions (POBR) via superpixel based regularization. Series of shrinkage/reinforcement operations are then applied on the label confidence map and edge strength weights over the POBR. We show that after weight manipulations, even a low-complexity weighted least squares model can produce much better depth estimation than state-of-the-art methods in terms of average disparity error rate, occlusion boundary precision-recall rate, and the preservation of intricate visual features

    Low-latency compression of mocap data using learned spatial decorrelation transform

    Full text link
    Due to the growing needs of human motion capture (mocap) in movie, video games, sports, etc., it is highly desired to compress mocap data for efficient storage and transmission. This paper presents two efficient frameworks for compressing human mocap data with low latency. The first framework processes the data in a frame-by-frame manner so that it is ideal for mocap data streaming and time critical applications. The second one is clip-based and provides a flexible tradeoff between latency and compression performance. Since mocap data exhibits some unique spatial characteristics, we propose a very effective transform, namely learned orthogonal transform (LOT), for reducing the spatial redundancy. The LOT problem is formulated as minimizing square error regularized by orthogonality and sparsity and solved via alternating iteration. We also adopt a predictive coding and temporal DCT for temporal decorrelation in the frame- and clip-based frameworks, respectively. Experimental results show that the proposed frameworks can produce higher compression performance at lower computational cost and latency than the state-of-the-art methods.Comment: 15 pages, 9 figure

    Human Motion Capture Data Tailored Transform Coding

    Full text link
    Human motion capture (mocap) is a widely used technique for digitalizing human movements. With growing usage, compressing mocap data has received increasing attention, since compact data size enables efficient storage and transmission. Our analysis shows that mocap data have some unique characteristics that distinguish themselves from images and videos. Therefore, directly borrowing image or video compression techniques, such as discrete cosine transform, does not work well. In this paper, we propose a novel mocap-tailored transform coding algorithm that takes advantage of these features. Our algorithm segments the input mocap sequences into clips, which are represented in 2D matrices. Then it computes a set of data-dependent orthogonal bases to transform the matrices to frequency domain, in which the transform coefficients have significantly less dependency. Finally, the compression is obtained by entropy coding of the quantized coefficients and the bases. Our method has low computational cost and can be easily extended to compress mocap databases. It also requires neither training nor complicated parameter setting. Experimental results demonstrate that the proposed scheme significantly outperforms state-of-the-art algorithms in terms of compression performance and speed
    corecore