864 research outputs found
Low Power Depth Estimation of Rigid Objects for Time-of-Flight Imaging
Depth sensing is useful in a variety of applications that range from
augmented reality to robotics. Time-of-flight (TOF) cameras are appealing
because they obtain dense depth measurements with minimal latency. However, for
many battery-powered devices, the illumination source of a TOF camera is power
hungry and can limit the battery life of the device. To address this issue, we
present an algorithm that lowers the power for depth sensing by reducing the
usage of the TOF camera and estimating depth maps using concurrently collected
images. Our technique also adaptively controls the TOF camera and enables it
when an accurate depth map cannot be estimated. To ensure that the overall
system power for depth sensing is reduced, we design our algorithm to run on a
low power embedded platform, where it outputs 640x480 depth maps at 30 frames
per second. We evaluate our approach on several RGB-D datasets, where it
produces depth maps with an overall mean relative error of 0.96% and reduces
the usage of the TOF camera by 85%. When used with commercial TOF cameras, we
estimate that our algorithm can lower the total power for depth sensing by up
to 73%
Real-Time Panoramic Tracking for Event Cameras
Event cameras are a paradigm shift in camera technology. Instead of full
frames, the sensor captures a sparse set of events caused by intensity changes.
Since only the changes are transferred, those cameras are able to capture quick
movements of objects in the scene or of the camera itself. In this work we
propose a novel method to perform camera tracking of event cameras in a
panoramic setting with three degrees of freedom. We propose a direct camera
tracking formulation, similar to state-of-the-art in visual odometry. We show
that the minimal information needed for simultaneous tracking and mapping is
the spatial position of events, without using the appearance of the imaged
scene point. We verify the robustness to fast camera movements and dynamic
objects in the scene on a recently proposed dataset and self-recorded
sequences.Comment: Accepted to International Conference on Computational Photography
201
Accelerating Real-Time, High-Resolution Depth Upsampling on FPGAs
While the popularity of high-resolution, computer-vision applications (e.g. mixed reality, autonomous vehicles) is increasing, there have been complementary advances in time-of-flight (ToF) depth-sensor resolution and quality. These advances in ToF sensors provide a platform that can enable real-time, depth-upsampling algorithms targeted for high-resolution video systems with low-latency requirements. This thesis demonstrates that filter-based upsampling algorithms are feasible for real-time, low-power scenarios, such as those on HMDs. Specifically, the author profiled, parallelized, and accelerated a filter-based depth-upsampling algorithm on an FPGA using high-level synthesis tools from Xilinx. We show that our accelerated algorithm can accurately upsample the resolution and reduce the noise of ToF sensors. We also demonstrate that this algorithm exceeds the real-time requirements of 90 frames-per-second (FPS) and 11 ms latency of mixed-reality hardware, achieving a lower-bound speedup of 40 times over the fastest CPU-only version and a 4.7 times speedup over the original GPU implementation
BatVision: Learning to See 3D Spatial Layout with Two Ears
Many species have evolved advanced non-visual perception while artificial
systems fall behind. Radar and ultrasound complement camera-based vision but
they are often too costly and complex to set up for very limited information
gain. In nature, sound is used effectively by bats, dolphins, whales, and
humans for navigation and communication. However, it is unclear how to best
harness sound for machine perception. Inspired by bats' echolocation mechanism,
we design a low-cost BatVision system that is capable of seeing the 3D spatial
layout of space ahead by just listening with two ears. Our system emits short
chirps from a speaker and records returning echoes through microphones in an
artificial human pinnae pair. During training, we additionally use a stereo
camera to capture color images for calculating scene depths. We train a model
to predict depth maps and even grayscale images from the sound alone. During
testing, our trained BatVision provides surprisingly good predictions of 2D
visual scenes from two 1D audio signals. Such a sound to vision system would
benefit robot navigation and machine vision, especially in low-light or
no-light conditions. Our code and data are publicly available
- …