5,631 research outputs found

    Encoderless Gimbal Calibration of Dynamic Multi-Camera Clusters

    Full text link
    Dynamic Camera Clusters (DCCs) are multi-camera systems where one or more cameras are mounted on actuated mechanisms such as a gimbal. Existing methods for DCC calibration rely on joint angle measurements to resolve the time-varying transformation between the dynamic and static camera. This information is usually provided by motor encoders, however, joint angle measurements are not always readily available on off-the-shelf mechanisms. In this paper, we present an encoderless approach for DCC calibration which simultaneously estimates the kinematic parameters of the transformation chain as well as the unknown joint angles. We also demonstrate the integration of an encoderless gimbal mechanism with a state-of-the art VIO algorithm, and show the extensions required in order to perform simultaneous online estimation of the joint angles and vehicle localization state. The proposed calibration approach is validated both in simulation and on a physical DCC composed of a 2-DOF gimbal mounted on a UAV. Finally, we show the experimental results of the calibrated mechanism integrated into the OKVIS VIO package, and demonstrate successful online joint angle estimation while maintaining localization accuracy that is comparable to a standard static multi-camera configuration.Comment: ICRA 201

    FastDepth: Fast Monocular Depth Estimation on Embedded Systems

    Full text link
    Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the lowest latency and highest throughput on an embedded platform that can be carried by a micro aerial vehicle.Comment: Accepted for presentation at ICRA 2019. 8 pages, 6 figures, 7 table

    Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots

    Full text link
    In the last decade, many medical companies and research groups have tried to convert passive capsule endoscopes as an emerging and minimally invasive diagnostic technology into actively steerable endoscopic capsule robots which will provide more intuitive disease detection, targeted drug delivery and biopsy-like operations in the gastrointestinal(GI) tract. In this study, we introduce a fully unsupervised, real-time odometry and depth learner for monocular endoscopic capsule robots. We establish the supervision by warping view sequences and assigning the re-projection minimization to the loss function, which we adopt in multi-view pose estimation and single-view depth estimation network. Detailed quantitative and qualitative analyses of the proposed framework performed on non-rigidly deformable ex-vivo porcine stomach datasets proves the effectiveness of the method in terms of motion estimation and depth recovery.Comment: submitted to IROS 201

    Neural Representations for Sensory-Motor Control, II: Learning a Head-Centered Visuomotor Representation of 3-D Target Position

    Full text link
    A neural network model is described for how an invariant head-centered representation of 3-D target position can be autonomously learned by the brain in real time. Once learned, such a target representation may be used to control both eye and limb movements. The target representation is derived from the positions of both eyes in the head, and the locations which the target activates on the retinas of both eyes. A Vector Associative Map, or YAM, learns the many-to-one transformation from multiple combinations of eye-and-retinal position to invariant 3-D target position. Eye position is derived from outflow movement signals to the eye muscles. Two successive stages of opponent processing convert these corollary discharges into a. head-centered representation that closely approximates the azimuth, elevation, and vergence of the eyes' gaze position with respect to a cyclopean origin located between the eyes. YAM learning combines this cyclopean representation of present gaze position with binocular retinal information about target position into an invariant representation of 3-D target position with respect to the head. YAM learning can use a teaching vector that is externally derived from the positions of the eyes when they foveate the target. A YAM can also autonomously discover and learn the invariant representation, without an explicit teacher, by generating internal error signals from environmental fluctuations in which these invariant properties are implicit. YAM error signals are computed by Difference Vectors, or DVs, that are zeroed by the YAM learning process. YAMs may be organized into YAM Cascades for learning and performing both sensory-to-spatial maps and spatial-to-motor maps. These multiple uses clarify why DV-type properties are computed by cells in the parietal, frontal, and motor cortices of many mammals. YAMs are modulated by gating signals that express different aspects of the will-to-act. These signals transform a single invariant representation into movements of different speed (GO signal) and size (GRO signal), and thereby enable YAM controllers to match a planned action sequence to variable environmental conditions.National Science Foundation (IRI-87-16960, IRI-90-24877); Office of Naval Research (N00014-92-J-1309

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Eye position representation in human anterior parietal cortex

    Get PDF
    Eye position helps locate visual targets relative to one's own body and modulates the distribution of attention in visual space. Whereas in the monkey, proprioceptive eye position signals have been recorded in the somatosensory cortex, in humans, no brain site has yet been associated with eye position. We aimed to disrupt the proprioceptive representation of the right eye in the left somatosensory cortex, presumably located near the representation of the right hand, using repetitive transcranial magnetic stimulation (rTMS). Head-fixed subjects reported their perceived visual straight-ahead position using both left and right eye monocular vision, before and after 15 min of 1 Hz rTMS. rTMS over left somatosensory but not over left motor cortex shifted the perceived visual straight ahead to the left, whereas nonvisual detection of body midline was unchanged for either brain area. These results can be explained by the underestimation of the angle of gaze of the right eye when fixating the target. To link this effect more tightly to an altered ocular proprioception, we applied a passive deviation to the right eye before the visual straight-ahead task. Passive eye displacement modulated the shift in the perceived straight ahead induced by somatosensory rTMS, without affecting the perceived straight ahead at baseline or after motor cortex rTMS. We conclude that the anterior parietal cortex in humans encodes eye position and that this signal has a proprioceptive component

    A Neural Network Architecture for Figure-ground Separation of Connected Scenic Figures

    Full text link
    A neural network model, called an FBF network, is proposed for automatic parallel separation of multiple image figures from each other and their backgrounds in noisy grayscale or multi-colored images. The figures can then be processed in parallel by an array of self-organizing Adaptive Resonance Theory (ART) neural networks for automatic target recognition. An FBF network can automatically separate the disconnected but interleaved spirals that Minsky and Papert introduced in their book Perceptrons. The network's design also clarifies why humans cannot rapidly separate interleaved spirals, yet can rapidly detect conjunctions of disparity and color, or of disparity and motion, that distinguish target figures from surrounding distractors. Figure-ground separation is accomplished by iterating operations of a Feature Contour System (FCS) and a Boundary Contour System (BCS) in the order FCS-BCS-FCS, hence the term FBF, that have been derived from an analysis of biological vision. The FCS operations include the use of nonlinear shunting networks to compensate for variable illumination and nonlinear diffusion networks to control filling-in. A key new feature of an FBF network is the use of filling-in for figure-ground separation. The BCS operations include oriented filters joined to competitive and cooperative interactions designed to detect, regularize, and complete boundaries in up to 50 percent noise, while suppressing the noise. A modified CORT-X filter is described which uses both on-cells and off-cells to generate a boundary segmentation from a noisy image.Air Force Office of Scientific Research (90-0175); Army Research Office (DAAL-03-88-K0088); Defense Advanced Research Projects Agency (90-0083); Hughes Research Laboratories (S1-804481-D, S1-903136); American Society for Engineering Educatio

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery
    corecore