16,596 research outputs found

    How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change

    Full text link
    Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a previously-seen canonical appearance. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved metric relocalization performance under illumination change, where conventional methods normally fail. We further provide a preliminary investigation of transfer learning from synthetic to real environments in a localization context. An open-source implementation of our method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 201

    sCAM: An Untethered Insertable Laparoscopic Surgical Camera Robot

    Get PDF
    Fully insertable robotic imaging devices represent a promising future of minimally invasive laparoscopic vision. Emerging research efforts in this field have resulted in several proof-of-concept prototypes. One common drawback of these designs derives from their clumsy tethering wires which not only cause operational interference but also reduce camera mobility. Meanwhile, these insertable laparoscopic cameras are manipulated without any pose information or haptic feedback, which results in open loop motion control and raises concerns about surgical safety caused by inappropriate use of force.This dissertation proposes, implements, and validates an untethered insertable laparoscopic surgical camera (sCAM) robot. Contributions presented in this work include: (1) feasibility of an untethered fully insertable laparoscopic surgical camera, (2) camera-tissue interaction characterization and force sensing, (3) pose estimation, visualization, and feedback with sCAM, and (4) robotic-assisted closed-loop laparoscopic camera control. Borrowing the principle of spherical motors, camera anchoring and actuation are achieved through transabdominal magnetic coupling in a stator-rotor manner. To avoid the tethering wires, laparoscopic vision and control communication are realized with dedicated wireless links based on onboard power. A non-invasive indirect approach is proposed to provide real-time camera-tissue interaction force measurement, which, assisted by camera-tissue interaction modeling, predicts stress distribution over the tissue surface. Meanwhile, the camera pose is remotely estimated and visualized using complementary filtering based on onboard motion sensing. Facilitated by the force measurement and pose estimation, robotic-assisted closed-loop control has been realized in a double-loop control scheme with shared autonomy between surgeons and the robotic controller.The sCAM has brought robotic laparoscopic imaging one step further toward less invasiveness and more dexterity. Initial ex vivo test results have verified functions of the implemented sCAM design and the proposed force measurement and pose estimation approaches, demonstrating the technical feasibility of a tetherless insertable laparoscopic camera. Robotic-assisted control has shown its potential to free surgeons from low-level intricate camera manipulation workload and improve precision and intuitiveness in laparoscopic imaging

    Deep Retinal Optical Flow: From Synthetic Dataset Generation to Framework Creation and Evaluation

    Get PDF
    Sustained delivery of regenerative retinal therapies by robotic systems requires intra-operative tracking of the retinal fundus. This thesis presents a supervised convolutional neural network to densely predict optical flow of the retinal fundus, using semantic segmentation as an auxiliary task. Retinal flow information missing due to occlusion by surgical tools or other effects is implicitly inpainted, allowing for the robust tracking of surgical targets. As manual annotation of optical flow is infeasible, a flexible algorithm for the generation of large synthetic training datasets on the basis of given intra-operative retinal images and tool templates is developed. The compositing of synthetic images is approached as a layer-wise operation implementing a number of transforms at every level which can be extended as required, mimicking the various phenomena visible in real data. Optical flow ground truth is calculated from motion transforms with the help of oflib, an open-source optical flow library available from the Python Package Index. It enables the user to manipulate, evaluate, and combine flow fields. The PyTorch version of oflib is fully differentiable and therefore suitable for use in deep learning methods requiring back-propagation. The optical flow estimation from the network trained on synthetic data is evaluated using three performance metrics obtained from tracking a grid and sparsely annotated ground truth points. The evaluation benchmark consists of a series of challenging real intra-operative clips obtained from an extensive internally acquired dataset encompassing representative surgical cases. The deep learning approach clearly outperforms variational baseline methods and is shown to generalise well to real data showing scenarios routinely observed during vitreoretinal procedures. This indicates complex synthetic training datasets can be used to specifically guide optical flow estimation, laying the foundation for a robust system which can assist with intra-operative tracking of moving surgical targets even when occluded

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Single-image Tomography: 3D Volumes from 2D Cranial X-Rays

    Get PDF
    As many different 3D volumes could produce the same 2D x-ray image, inverting this process is challenging. We show that recent deep learning-based convolutional neural networks can solve this task. As the main challenge in learning is the sheer amount of data created when extending the 2D image into a 3D volume, we suggest firstly to learn a coarse, fixed-resolution volume which is then fused in a second step with the input x-ray into a high-resolution volume. To train and validate our approach we introduce a new dataset that comprises of close to half a million computer-simulated 2D x-ray images of 3D volumes scanned from 175 mammalian species. Applications of our approach include stereoscopic rendering of legacy x-ray images, re-rendering of x-rays including changes of illumination, view pose or geometry. Our evaluation includes comparison to previous tomography work, previous learning methods using our data, a user study and application to a set of real x-rays

    Mixed marker-based/marker-less visual odometry system for mobile robots

    Get PDF
    When moving in generic indoor environments, robotic platforms generally rely solely on information provided by onboard sensors to determine their position and orientation. However, the lack of absolute references often leads to the introduction of severe drifts in estimates computed, making autonomous operations really hard to accomplish. This paper proposes a solution to alleviate the impact of the above issues by combining two vision‐based pose estimation techniques working on relative and absolute coordinate systems, respectively. In particular, the unknown ground features in the images that are captured by the vertical camera of a mobile platform are processed by a vision‐based odometry algorithm, which is capable of estimating the relative frame‐to‐frame movements. Then, errors accumulated in the above step are corrected using artificial markers displaced at known positions in the environment. The markers are framed from time to time, which allows the robot to maintain the drifts bounded by additionally providing it with the navigation commands needed for autonomous flight. Accuracy and robustness of the designed technique are demonstrated using an off‐the‐shelf quadrotor via extensive experimental test

    Biomechanics

    Get PDF
    Biomechanics is a vast discipline within the field of Biomedical Engineering. It explores the underlying mechanics of how biological and physiological systems move. It encompasses important clinical applications to address questions related to medicine using engineering mechanics principles. Biomechanics includes interdisciplinary concepts from engineers, physicians, therapists, biologists, physicists, and mathematicians. Through their collaborative efforts, biomechanics research is ever changing and expanding, explaining new mechanisms and principles for dynamic human systems. Biomechanics is used to describe how the human body moves, walks, and breathes, in addition to how it responds to injury and rehabilitation. Advanced biomechanical modeling methods, such as inverse dynamics, finite element analysis, and musculoskeletal modeling are used to simulate and investigate human situations in regard to movement and injury. Biomechanical technologies are progressing to answer contemporary medical questions. The future of biomechanics is dependent on interdisciplinary research efforts and the education of tomorrow’s scientists

    Learned optical flow for intra-operative tracking of the retinal fundus

    Get PDF
    Purpose: Sustained delivery of regenerative retinal therapies by robotic systems requires intra-operative tracking of the retinal fundus. We propose a supervised deep convolutional neural network to densely predict semantic segmentation and optical flow of the retina as mutually supportive tasks, implicitly inpainting retinal flow information missing due to occlusion by surgical tools. / Methods: As manual annotation of optical flow is infeasible, we propose a flexible algorithm for generation of large synthetic training datasets on the basis of given intra-operative retinal images. We evaluate optical flow estimation by tracking a grid and sparsely annotated ground truth points on a benchmark of challenging real intra-operative clips obtained from an extensive internally acquired dataset encompassing representative vitreoretinal surgical cases. / Results: The U-Net-based network trained on the synthetic dataset is shown to generalise well to the benchmark of real surgical videos. When used to track retinal points of interest, our flow estimation outperforms variational baseline methods on clips containing tool motions which occlude the points of interest, as is routinely observed in intra-operatively recorded surgery videos. / Conclusions: The results indicate that complex synthetic training datasets can be used to specifically guide optical flow estimation. Our proposed algorithm therefore lays the foundation for a robust system which can assist with intra-operative tracking of moving surgical targets even when occluded
    corecore