6,698 research outputs found

    Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals

    Get PDF
    Reconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics.Comment: Preprint of article submitted for publication in International Journal of Approximate Reasoning and accepted pending minor revision

    Computing motion in the primate's visual system

    Get PDF
    Computing motion on the basis of the time-varying image intensity is a difficult problem for both artificial and biological vision systems. We will show how one well-known gradient-based computer algorithm for estimating visual motion can be implemented within the primate's visual system. This relaxation algorithm computes the optical flow field by minimizing a variational functional of a form commonly encountered in early vision, and is performed in two steps. In the first stage, local motion is computed, while in the second stage spatial integration occurs. Neurons in the second stage represent the optical flow field via a population-coding scheme, such that the vector sum of all neurons at each location codes for the direction and magnitude of the velocity at that location. The resulting network maps onto the magnocellular pathway of the primate visual system, in particular onto cells in the primary visual cortex (V1) as well as onto cells in the middle temporal area (MT). Our algorithm mimics a number of psychophysical phenomena and illusions (perception of coherent plaids, motion capture, motion coherence) as well as electrophysiological recordings. Thus, a single unifying principle ‘the final optical flow should be as smooth as possible’ (except at isolated motion discontinuities) explains a large number of phenomena and links single-cell behavior with perception and computational theory

    Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs

    Full text link
    Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems

    Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance

    Full text link
    Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016

    Anytime Stereo Image Depth Estimation on Mobile Devices

    Full text link
    Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms force a choice between either generating accurate mappings at a slow pace, or quickly generating inaccurate ones, and additionally these methods typically require far too many parameters to be usable on power- or memory-constrained devices. Motivated by these shortcomings, we propose a novel approach for disparity prediction in the anytime setting. In contrast to prior work, our end-to-end learned approach can trade off computation and accuracy at inference time. Depth estimation is performed in stages, during which the model can be queried at any time to output its current best estimate. Our final model can process 1242× \times 375 resolution images within a range of 10-35 FPS on an NVIDIA Jetson TX2 module with only marginal increases in error -- using two orders of magnitude fewer parameters than the most competitive baseline. The source code is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201

    Attention-Aware Disparity Control in interactive environments

    Get PDF
    Cataloged from PDF version of article.Our paper introduces a novel approach for controlling stereo camera parameters in interactive 3D environments in a way that specifically addresses the interplay of binocular depth perception and saliency of scene contents. Our proposed Dynamic Attention-Aware Disparity Control (DADC) method produces depth-rich stereo rendering that improves viewer comfort through joint optimization of stereo parameters. While constructing the optimization model, we consider the importance of scene elements, as well as their distance to the camera and the locus of attention on the display. Our method also optimizes the depth effect of a given scene by considering the individual user’s stereoscopic disparity range and comfortable viewing experience by controlling accommodation/convergence conflict. We validate our method in a formal user study that also reveals the advantages, such as superior quality and practical relevance, of considering our method.© Springer-Verlag Berlin Heidelberg 2013

    High-Performance Testbed for Vision-Aided Autonomous Navigation for Quadrotor UAVs in Cluttered Environments

    Get PDF
    This thesis presents the development of an aerial robotic testbed based on Robot Operating System (ROS). The purpose of this high-performance testbed is to develop a system capable of performing robust navigation tasks using vision tools such as a stereo camera. While ensuring the computation of robot odometery, the system is also capable of sensing the environment using the same stereo camera. Hence, all the navigation tasks are performed using a stereo camera and an inertial measurement unit (IMU) as the main sensor suite. ROS is used as a framework for software integration due to its capabilities to provide efficient communication and sensor interfaces. Moreover, it also allows us to use C++ which is efficient in performance especially on embedded platforms. Combining together ROS and C++ provides the necessary computation efficiency and tools to handle fast, real-time image processing and planning which are the vital parts of navigation and obstacle avoidance on such scale. The main application of this work revolves around proposing a real-time and efficient way to demonstrate vision-based navigation in UAVs. The proposed approach is developed for a quadrotor UAV which is capable of performing defensive maneuvers in case any obstacles are in its way, while constantly moving towards a user-defined final destination. Stereo depth computation adds a third axis to a two dimensional image coordinate frame. This can be referred to as the depth image space or depth image coordinate frame. The idea of planning in this frame of reference is utilized along with certain precomputed action primitives. The formulation of these action primitives leads to a hybrid control law for feasible trajectory generation. Further, a proof of stability of this system is also presented. The proposed approach keeps in view the fact that while performing fast maneuvers and obstacle avoidance simultaneously, many of the standard optimization approaches might not work in real-time on-board due to time and resource limitations. This leads to a need for the development of real-time techniques for vision-based autonomous navigation
    • …
    corecore