6,698 research outputs found
Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals
Reconstruction of the tridimensional geometry of a visual scene using the
binocular disparity information is an important issue in computer vision and
mobile robotics, which can be formulated as a Bayesian inference problem.
However, computation of the full disparity distribution with an advanced
Bayesian model is usually an intractable problem, and proves computationally
challenging even with a simple model. In this paper, we show how probabilistic
hardware using distributed memory and alternate representation of data as
stochastic bitstreams can solve that problem with high performance and energy
efficiency. We put forward a way to express discrete probability distributions
using stochastic data representations and perform Bayesian fusion using those
representations, and show how that approach can be applied to diparity
computation. We evaluate the system using a simulated stochastic implementation
and discuss possible hardware implementations of such architectures and their
potential for sensorimotor processing and robotics.Comment: Preprint of article submitted for publication in International
Journal of Approximate Reasoning and accepted pending minor revision
Computing motion in the primate's visual system
Computing motion on the basis of the time-varying image intensity is a difficult problem for both artificial and biological vision systems. We will show how one well-known gradient-based computer algorithm for estimating visual motion can be implemented within the primate's visual system. This relaxation algorithm computes the optical flow field by minimizing a variational functional of a form commonly encountered in early vision, and is performed in two steps. In the first stage, local motion is computed, while in the second stage spatial integration occurs. Neurons in the second stage represent the optical flow field via a population-coding scheme, such that the vector sum of all neurons at each location codes for the direction and magnitude of the velocity at that location. The resulting network maps onto the magnocellular pathway of the primate visual system, in particular onto cells in the primary visual cortex (V1) as well as onto cells in the middle temporal area (MT). Our algorithm mimics a number of psychophysical phenomena and illusions (perception of coherent plaids, motion capture, motion coherence) as well as electrophysiological recordings. Thus, a single unifying principle ‘the final optical flow should be as smooth as possible’ (except at isolated motion discontinuities) explains a large number of phenomena and links single-cell behavior with perception and computational theory
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance
Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Attention-Aware Disparity Control in interactive environments
Cataloged from PDF version of article.Our paper introduces a novel approach for controlling
stereo camera parameters in interactive 3D environments
in a way that specifically addresses the interplay
of binocular depth perception and saliency of scene contents.
Our proposed Dynamic Attention-Aware Disparity
Control (DADC) method produces depth-rich stereo rendering
that improves viewer comfort through joint optimization
of stereo parameters. While constructing the optimization
model, we consider the importance of scene elements, as
well as their distance to the camera and the locus of attention
on the display. Our method also optimizes the depth
effect of a given scene by considering the individual user’s
stereoscopic disparity range and comfortable viewing experience
by controlling accommodation/convergence conflict.
We validate our method in a formal user study that also reveals
the advantages, such as superior quality and practical
relevance, of considering our method.© Springer-Verlag Berlin Heidelberg 2013
High-Performance Testbed for Vision-Aided Autonomous Navigation for Quadrotor UAVs in Cluttered Environments
This thesis presents the development of an aerial robotic testbed based on Robot Operating System (ROS). The purpose of this high-performance testbed is to develop a system capable of performing robust navigation tasks using vision tools such as a stereo camera. While ensuring the computation of robot odometery, the system is also capable of sensing the environment using the same stereo camera. Hence, all the navigation tasks are performed using a stereo camera and an inertial measurement unit (IMU) as the main sensor suite. ROS is used as a framework for software integration due to its capabilities to provide efficient communication and sensor interfaces. Moreover, it also allows us to use C++ which is efficient in performance especially on embedded platforms. Combining together ROS and C++ provides the necessary computation efficiency and tools to handle fast, real-time image processing and planning which are the vital parts of navigation and obstacle avoidance on such scale. The main application of this work revolves around proposing a real-time and efficient way to demonstrate vision-based navigation in UAVs. The proposed approach is developed for a quadrotor UAV which is capable of performing defensive maneuvers in case any obstacles are in its way, while constantly moving towards a user-defined final destination. Stereo depth computation adds a third axis to a two dimensional image coordinate frame. This can be referred to as the depth image space or depth image coordinate frame. The idea of planning in this frame of reference is utilized along with certain precomputed action primitives. The formulation of these action primitives leads to a hybrid control law for feasible trajectory generation. Further, a proof of stability of this system is also presented. The proposed approach keeps in view the fact that while performing fast maneuvers and obstacle avoidance simultaneously, many of the standard optimization approaches might not work in real-time on-board due to time and resource limitations. This leads to a need for the development of real-time techniques for vision-based autonomous navigation
- …