38,497 research outputs found

    Learning Depth with Convolutional Spatial Propagation Network

    Full text link
    Depth prediction is one of the fundamental problems in computer vision. In this paper, we propose a simple yet effective convolutional spatial propagation network (CSPN) to learn the affinity matrix for various depth estimation tasks. Specifically, it is an efficient linear propagation model, in which the propagation is performed with a manner of recurrent convolutional operation, and the affinity among neighboring pixels is learned through a deep convolutional neural network (CNN). We can append this module to any output from a state-of-the-art (SOTA) depth estimation networks to improve their performances. In practice, we further extend CSPN in two aspects: 1) take sparse depth map as additional input, which is useful for the task of depth completion; 2) similar to commonly used 3D convolution operation in CNNs, we propose 3D CSPN to handle features with one additional dimension, which is effective in the task of stereo matching using 3D cost volume. For the tasks of sparse to dense, a.k.a depth completion. We experimented the proposed CPSN conjunct algorithms over the popular NYU v2 and KITTI datasets, where we show that our proposed algorithms not only produce high quality (e.g., 30% more reduction in depth error), but also run faster (e.g., 2 to 5x faster) than previous SOTA spatial propagation network. We also evaluated our stereo matching algorithm on the Scene Flow and KITTI Stereo datasets, and rank 1st on both the KITTI Stereo 2012 and 2015 benchmarks, which demonstrates the effectiveness of the proposed module. The code of CSPN proposed in this work will be released at https://github.com/XinJCheng/CSPN.Comment: v1.2: add some exps v1.1: fixed some mistakes, v1: 17 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:1808.0015

    Obstacle Detection Quality as a Problem-Oriented Approach to Stereo Vision Algorithms Estimation in Road Situation Analysis

    Full text link
    In this work we present a method for performance evaluation of stereo vision based obstacle detection techniques that takes into account the specifics of road situation analysis to minimize the effort required to prepare a test dataset. This approach has been designed to be implemented in systems such as self-driving cars or driver assistance and can also be used as problem-oriented quality criterion for evaluation of stereo vision algorithms

    Plan3D: Viewpoint and Trajectory Optimization for Aerial Multi-View Stereo Reconstruction

    Full text link
    We introduce a new method that efficiently computes a set of viewpoints and trajectories for high-quality 3D reconstructions in outdoor environments. Our goal is to automatically explore an unknown area, and obtain a complete 3D scan of a region of interest (e.g., a large building). Images from a commodity RGB camera, mounted on an autonomously navigated quadcopter, are fed into a multi-view stereo reconstruction pipeline that produces high-quality results but is computationally expensive. In this setting, the scanning result is constrained by the restricted flight time of quadcopters. To this end, we introduce a novel optimization strategy that respects these constraints by maximizing the information gain from sparsely-sampled view points while limiting the total travel distance of the quadcopter. At the core of our method lies a hierarchical volumetric representation that allows the algorithm to distinguish between unknown, free, and occupied space. Furthermore, our information gain based formulation leverages this representation to handle occlusions in an efficient manner. In addition to the surface geometry, we utilize the free-space information to avoid obstacles and determine collision-free flight paths. Our tool can be used to specify the region of interest and to plan trajectories. We demonstrate our method by obtaining a number of compelling 3D reconstructions, and provide a thorough quantitative evaluation showing improvement over previous state-of-the-art and regular patterns.Comment: 31 pages, 12 figures, 9 table

    The Fast Bilateral Solver

    Full text link
    We present the bilateral solver, a novel algorithm for edge-aware smoothing that combines the flexibility and speed of simple filtering approaches with the accuracy of domain-specific optimization algorithms. Our technique is capable of matching or improving upon state-of-the-art results on several different computer vision tasks (stereo, depth superresolution, colorization, and semantic segmentation) while being 10-1000 times faster than competing approaches. The bilateral solver is fast, robust, straightforward to generalize to new domains, and simple to integrate into deep learning pipelines

    Intel RealSense Stereoscopic Depth Cameras

    Full text link
    We present a comprehensive overview of the stereoscopic Intel RealSense RGBD imaging systems. We discuss these systems' mode-of-operation, functional behavior and include models of their expected performance, shortcomings, and limitations. We provide information about the systems' optical characteristics, their correlation algorithms, and how these properties can affect different applications, including 3D reconstruction and gesture recognition. Our discussion covers the Intel RealSense R200 and the Intel RealSense D400 (formally RS400).Comment: Accepted to CCD 2017, a CVPR 2017 Worksho

    Exploring Computation-Communication Tradeoffs in Camera Systems

    Full text link
    Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous systems, has given rise to a wide range of image processing acceleration techniques at the camera node and in the cloud. In this paper, we characterize two novel camera systems that use acceleration techniques to push the extremes of energy and performance scaling, and explore the computation-communication tradeoffs in their design. The first case study targets a camera system designed to detect and authenticate individual faces, running solely on energy harvested from RFID readers. We design a multi-accelerator SoC design operating in the sub-mW range, and evaluate it with real-world workloads to show performance and energy efficiency improvements over a general purpose microprocessor. The second camera system supports a 16-camera rig processing over 32 Gb/s of data to produce real-time 3D-360 degree virtual reality video. We design a multi-FPGA processing pipeline that outperforms CPU and GPU configurations by up to 10x in computation time, producing panoramic stereo video directly from the camera rig at 30 frames per second. We find that an early data reduction step, either before complex processing or offloading, is the most critical optimization for in-camera systems

    Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures

    Full text link
    Separating an audio scene into isolated sources is a fundamental problem in computer audition, analogous to image segmentation in visual scene analysis. Source separation systems based on deep learning are currently the most successful approaches for solving the underdetermined separation problem, where there are more sources than channels. Traditionally, such systems are trained on sound mixtures where the ground truth decomposition is already known. Since most real-world recordings do not have such a decomposition available, this limits the range of mixtures one can train on, and the range of mixtures the learned models may successfully separate. In this work, we use a simple blind spatial source separation algorithm to generate estimated decompositions of stereo mixtures. These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available. This demonstrates how a simple cue such as the direction of origin of source can be used to bootstrap a model for source separation that can be used in situations where that cue is not available.Comment: 5 pages, 2 figure

    Perceptually-Motivated Nonlinear Channel Decorrelation For Stereo Acoustic Echo Cancellation

    Full text link
    Acoustic echo cancellation with stereo signals is generally an under-determined problem because of the high coherence between the left and right channels. In this paper, we present a novel method of significantly reducing inter-channel coherence without affecting the audio quality. Our work takes into account psychoacoustic masking and binaural auditory cues. The proposed non-linear processing combines a shaped comb-allpass (SCAL) filter with the injection of psychoacoustically masked noise. We show that the proposed method performs significantly better than other known methods for reducing inter-channel coherence.Comment: 4 page

    Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

    Full text link
    We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach's practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.Comment: SIGGRAPH 2019. Project page with video and code: http://people.eecs.berkeley.edu/~bmild/llff

    Stereo on a budget

    Full text link
    We propose an algorithm for recovering depth using less than two images. Instead of having both cameras send their entire image to the host computer, the left camera sends its image to the host while the right camera sends only a fraction ϵ\epsilon of its image. The key aspect is that the cameras send the information without communicating at all. Hence, the required communication bandwidth is significantly reduced. While standard image compression techniques can reduce the communication bandwidth, this requires additional computational resources on the part of the encoder (camera). We aim at designing a light weight encoder that only touches a fraction of the pixels. The burden of decoding is placed on the decoder (host). We show that it is enough for the encoder to transmit a sparse set of pixels. Using only 1+ϵ1+\epsilon images, with ϵ\epsilon as little as 2% of the image, the decoder can compute a depth map. The depth map's accuracy is comparable to traditional stereo matching algorithms that require both images as input. Using the depth map and the left image, the right image can be synthesized. No computations are required at the encoder, and the decoder's runtime is linear in the images' size.Comment: update flowchart in Fig.
    corecore