1,478 research outputs found

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    3D Motion Analysis via Energy Minimization

    Get PDF
    This work deals with 3D motion analysis from stereo image sequences for driver assistance systems. It consists of two parts: the estimation of motion from the image data and the segmentation of moving objects in the input images. The content can be summarized with the technical term machine visual kinesthesia, the sensation or perception and cognition of motion. In the first three chapters, the importance of motion information is discussed for driver assistance systems, for machine vision in general, and for the estimation of ego motion. The next two chapters delineate on motion perception, analyzing the apparent movement of pixels in image sequences for both a monocular and binocular camera setup. Then, the obtained motion information is used to segment moving objects in the input video. Thus, one can clearly identify the thread from analyzing the input images to describing the input images by means of stationary and moving objects. Finally, I present possibilities for future applications based on the contents of this thesis. Previous work in each case is presented in the respective chapters. Although the overarching issue of motion estimation from image sequences is related to practice, there is nothing as practical as a good theory (Kurt Lewin). Several problems in computer vision are formulated as intricate energy minimization problems. In this thesis, motion analysis in image sequences is thoroughly investigated, showing that splitting an original complex problem into simplified sub-problems yields improved accuracy, increased robustness, and a clear and accessible approach to state-of-the-art motion estimation techniques. In Chapter 4, optical flow is considered. Optical flow is commonly estimated by minimizing the combined energy, consisting of a data term and a smoothness term. These two parts are decoupled, yielding a novel and iterative approach to optical flow. The derived Refinement Optical Flow framework is a clear and straight-forward approach to computing the apparent image motion vector field. Furthermore this results currently in the most accurate motion estimation techniques in literature. Much as this is an engineering approach of fine-tuning precision to the last detail, it helps to get a better insight into the problem of motion estimation. This profoundly contributes to state-of-the-art research in motion analysis, in particular facilitating the use of motion estimation in a wide range of applications. In Chapter 5, scene flow is rethought. Scene flow stands for the three-dimensional motion vector field for every image pixel, computed from a stereo image sequence. Again, decoupling of the commonly coupled approach of estimating three-dimensional position and three dimensional motion yields an approach to scene ow estimation with more accurate results and a considerably lower computational load. It results in a dense scene flow field and enables additional applications based on the dense three-dimensional motion vector field, which are to be investigated in the future. One such application is the segmentation of moving objects in an image sequence. Detecting moving objects within the scene is one of the most important features to extract in image sequences from a dynamic environment. This is presented in Chapter 6. Scene flow and the segmentation of independently moving objects are only first steps towards machine visual kinesthesia. Throughout this work, I present possible future work to improve the estimation of optical flow and scene flow. Chapter 7 additionally presents an outlook on future research for driver assistance applications. But there is much more to the full understanding of the three-dimensional dynamic scene. This work is meant to inspire the reader to think outside the box and contribute to the vision of building perceiving machines.</em

    SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos

    Full text link
    We present SLoMo: a first-of-its-kind framework for transferring skilled motions from casually captured "in the wild" video footage of humans and animals to legged robots. SLoMo works in three stages: 1) synthesize a physically plausible reconstructed key-point trajectory from monocular videos; 2) optimize a dynamically feasible reference trajectory for the robot offline that includes body and foot motion, as well as contact sequences that closely tracks the key points; 3) track the reference trajectory online using a general-purpose model-predictive controller on robot hardware. Traditional motion imitation for legged motor skills often requires expert animators, collaborative demonstrations, and/or expensive motion capture equipment, all of which limits scalability. Instead, SLoMo only relies on easy-to-obtain monocular video footage, readily available in online repositories such as YouTube. It converts videos into motion primitives that can be executed reliably by real-world robots. We demonstrate our approach by transferring the motions of cats, dogs, and humans to example robots including a quadruped (on hardware) and a humanoid (in simulation). To the best knowledge of the authors, this is the first attempt at a general-purpose motion transfer framework that imitates animal and human motions on legged robots directly from casual videos without artificial markers or labels.Comment: accepted at RA-L 2023, with ICRA 2024 optio

    Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes

    Get PDF
    Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and non-rigid parts without explicitly modelling the camera motion. Using dense correspondences, we derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points. In this process, the dense depth map is learned implicitly using the as-rigid-as-possible hypothesis. Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes. Furthermore, the proposed method also provides unsupervised motion segmentation results as an auxiliary output
    corecore