38,497 research outputs found
Learning Depth with Convolutional Spatial Propagation Network
Depth prediction is one of the fundamental problems in computer vision. In
this paper, we propose a simple yet effective convolutional spatial propagation
network (CSPN) to learn the affinity matrix for various depth estimation tasks.
Specifically, it is an efficient linear propagation model, in which the
propagation is performed with a manner of recurrent convolutional operation,
and the affinity among neighboring pixels is learned through a deep
convolutional neural network (CNN). We can append this module to any output
from a state-of-the-art (SOTA) depth estimation networks to improve their
performances. In practice, we further extend CSPN in two aspects: 1) take
sparse depth map as additional input, which is useful for the task of depth
completion; 2) similar to commonly used 3D convolution operation in CNNs, we
propose 3D CSPN to handle features with one additional dimension, which is
effective in the task of stereo matching using 3D cost volume. For the tasks of
sparse to dense, a.k.a depth completion. We experimented the proposed CPSN
conjunct algorithms over the popular NYU v2 and KITTI datasets, where we show
that our proposed algorithms not only produce high quality (e.g., 30% more
reduction in depth error), but also run faster (e.g., 2 to 5x faster) than
previous SOTA spatial propagation network. We also evaluated our stereo
matching algorithm on the Scene Flow and KITTI Stereo datasets, and rank 1st on
both the KITTI Stereo 2012 and 2015 benchmarks, which demonstrates the
effectiveness of the proposed module. The code of CSPN proposed in this work
will be released at https://github.com/XinJCheng/CSPN.Comment: v1.2: add some exps v1.1: fixed some mistakes, v1: 17 pages, 12
figures. arXiv admin note: substantial text overlap with arXiv:1808.0015
Obstacle Detection Quality as a Problem-Oriented Approach to Stereo Vision Algorithms Estimation in Road Situation Analysis
In this work we present a method for performance evaluation of stereo vision
based obstacle detection techniques that takes into account the specifics of
road situation analysis to minimize the effort required to prepare a test
dataset. This approach has been designed to be implemented in systems such as
self-driving cars or driver assistance and can also be used as problem-oriented
quality criterion for evaluation of stereo vision algorithms
Plan3D: Viewpoint and Trajectory Optimization for Aerial Multi-View Stereo Reconstruction
We introduce a new method that efficiently computes a set of viewpoints and
trajectories for high-quality 3D reconstructions in outdoor environments. Our
goal is to automatically explore an unknown area, and obtain a complete 3D scan
of a region of interest (e.g., a large building). Images from a commodity RGB
camera, mounted on an autonomously navigated quadcopter, are fed into a
multi-view stereo reconstruction pipeline that produces high-quality results
but is computationally expensive. In this setting, the scanning result is
constrained by the restricted flight time of quadcopters. To this end, we
introduce a novel optimization strategy that respects these constraints by
maximizing the information gain from sparsely-sampled view points while
limiting the total travel distance of the quadcopter. At the core of our method
lies a hierarchical volumetric representation that allows the algorithm to
distinguish between unknown, free, and occupied space. Furthermore, our
information gain based formulation leverages this representation to handle
occlusions in an efficient manner. In addition to the surface geometry, we
utilize the free-space information to avoid obstacles and determine
collision-free flight paths. Our tool can be used to specify the region of
interest and to plan trajectories. We demonstrate our method by obtaining a
number of compelling 3D reconstructions, and provide a thorough quantitative
evaluation showing improvement over previous state-of-the-art and regular
patterns.Comment: 31 pages, 12 figures, 9 table
The Fast Bilateral Solver
We present the bilateral solver, a novel algorithm for edge-aware smoothing
that combines the flexibility and speed of simple filtering approaches with the
accuracy of domain-specific optimization algorithms. Our technique is capable
of matching or improving upon state-of-the-art results on several different
computer vision tasks (stereo, depth superresolution, colorization, and
semantic segmentation) while being 10-1000 times faster than competing
approaches. The bilateral solver is fast, robust, straightforward to generalize
to new domains, and simple to integrate into deep learning pipelines
Intel RealSense Stereoscopic Depth Cameras
We present a comprehensive overview of the stereoscopic Intel RealSense RGBD
imaging systems. We discuss these systems' mode-of-operation, functional
behavior and include models of their expected performance, shortcomings, and
limitations. We provide information about the systems' optical characteristics,
their correlation algorithms, and how these properties can affect different
applications, including 3D reconstruction and gesture recognition. Our
discussion covers the Intel RealSense R200 and the Intel RealSense D400
(formally RS400).Comment: Accepted to CCD 2017, a CVPR 2017 Worksho
Exploring Computation-Communication Tradeoffs in Camera Systems
Cameras are the defacto sensor. The growing demand for real-time and
low-power computer vision, coupled with trends towards high-efficiency
heterogeneous systems, has given rise to a wide range of image processing
acceleration techniques at the camera node and in the cloud. In this paper, we
characterize two novel camera systems that use acceleration techniques to push
the extremes of energy and performance scaling, and explore the
computation-communication tradeoffs in their design. The first case study
targets a camera system designed to detect and authenticate individual faces,
running solely on energy harvested from RFID readers. We design a
multi-accelerator SoC design operating in the sub-mW range, and evaluate it
with real-world workloads to show performance and energy efficiency
improvements over a general purpose microprocessor. The second camera system
supports a 16-camera rig processing over 32 Gb/s of data to produce real-time
3D-360 degree virtual reality video. We design a multi-FPGA processing pipeline
that outperforms CPU and GPU configurations by up to 10x in computation time,
producing panoramic stereo video directly from the camera rig at 30 frames per
second. We find that an early data reduction step, either before complex
processing or offloading, is the most critical optimization for in-camera
systems
Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures
Separating an audio scene into isolated sources is a fundamental problem in
computer audition, analogous to image segmentation in visual scene analysis.
Source separation systems based on deep learning are currently the most
successful approaches for solving the underdetermined separation problem, where
there are more sources than channels. Traditionally, such systems are trained
on sound mixtures where the ground truth decomposition is already known. Since
most real-world recordings do not have such a decomposition available, this
limits the range of mixtures one can train on, and the range of mixtures the
learned models may successfully separate. In this work, we use a simple blind
spatial source separation algorithm to generate estimated decompositions of
stereo mixtures. These estimates, together with a weighting scheme in the
time-frequency domain, based on confidence in the separation quality, are used
to train a deep learning model that can be used for single-channel separation,
where no source direction information is available. This demonstrates how a
simple cue such as the direction of origin of source can be used to bootstrap a
model for source separation that can be used in situations where that cue is
not available.Comment: 5 pages, 2 figure
Perceptually-Motivated Nonlinear Channel Decorrelation For Stereo Acoustic Echo Cancellation
Acoustic echo cancellation with stereo signals is generally an
under-determined problem because of the high coherence between the left and
right channels. In this paper, we present a novel method of significantly
reducing inter-channel coherence without affecting the audio quality. Our work
takes into account psychoacoustic masking and binaural auditory cues. The
proposed non-linear processing combines a shaped comb-allpass (SCAL) filter
with the injection of psychoacoustically masked noise. We show that the
proposed method performs significantly better than other known methods for
reducing inter-channel coherence.Comment: 4 page
Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines
We present a practical and robust deep learning solution for capturing and
rendering novel views of complex real world scenes for virtual exploration.
Previous approaches either require intractably dense view sampling or provide
little to no guidance for how users should sample views of a scene to reliably
render high-quality novel views. Instead, we propose an algorithm for view
synthesis from an irregular grid of sampled views that first expands each
sampled view into a local light field via a multiplane image (MPI) scene
representation, then renders novel views by blending adjacent local light
fields. We extend traditional plenoptic sampling theory to derive a bound that
specifies precisely how densely users should sample views of a given scene when
using our algorithm. In practice, we apply this bound to capture and render
views of real world scenes that achieve the perceptual quality of Nyquist rate
view sampling while using up to 4000x fewer views. We demonstrate our
approach's practicality with an augmented reality smartphone app that guides
users to capture input images of a scene and viewers that enable realtime
virtual exploration on desktop and mobile platforms.Comment: SIGGRAPH 2019. Project page with video and code:
http://people.eecs.berkeley.edu/~bmild/llff
Stereo on a budget
We propose an algorithm for recovering depth using less than two images.
Instead of having both cameras send their entire image to the host computer,
the left camera sends its image to the host while the right camera sends only a
fraction of its image. The key aspect is that the cameras send the
information without communicating at all. Hence, the required communication
bandwidth is significantly reduced.
While standard image compression techniques can reduce the communication
bandwidth, this requires additional computational resources on the part of the
encoder (camera). We aim at designing a light weight encoder that only touches
a fraction of the pixels. The burden of decoding is placed on the decoder
(host).
We show that it is enough for the encoder to transmit a sparse set of pixels.
Using only images, with as little as 2% of the image,
the decoder can compute a depth map. The depth map's accuracy is comparable to
traditional stereo matching algorithms that require both images as input. Using
the depth map and the left image, the right image can be synthesized. No
computations are required at the encoder, and the decoder's runtime is linear
in the images' size.Comment: update flowchart in Fig.
- …