20,851 research outputs found

    LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

    Full text link
    Deep neural network (DNN) architectures have been shown to outperform traditional pipelines for object segmentation and pose estimation using RGBD data, but the performance of these DNN pipelines is directly tied to how representative the training data is of the true data. Hence a key requirement for employing these methods in practice is to have a large set of labeled data for your specific robotic manipulation task, a requirement that is not generally satisfied by existing datasets. In this paper we develop a pipeline to rapidly generate high quality RGBD data with pixelwise labels and object poses. We use an RGBD camera to collect video of a scene from multiple viewpoints and leverage existing reconstruction techniques to produce a 3D dense reconstruction. We label the 3D reconstruction using a human assisted ICP-fitting of object meshes. By reprojecting the results of labeling the 3D scene we can produce labels for each RGBD image of the scene. This pipeline enabled us to collect over 1,000,000 labeled object instances in just a few days. We use this dataset to answer questions related to how much training data is required, and of what quality the data must be, to achieve high performance from a DNN architecture

    Lifting GIS Maps into Strong Geometric Context for Scene Understanding

    Full text link
    Contextual information can have a substantial impact on the performance of visual tasks such as semantic segmentation, object detection, and geometric estimation. Data stored in Geographic Information Systems (GIS) offers a rich source of contextual information that has been largely untapped by computer vision. We propose to leverage such information for scene understanding by combining GIS resources with large sets of unorganized photographs using Structure from Motion (SfM) techniques. We present a pipeline to quickly generate strong 3D geometric priors from 2D GIS data using SfM models aligned with minimal user input. Given an image resectioned against this model, we generate robust predictions of depth, surface normals, and semantic labels. We show that the precision of the predicted geometry is substantially more accurate other single-image depth estimation methods. We then demonstrate the utility of these contextual constraints for re-scoring pedestrian detections, and use these GIS contextual features alongside object detection score maps to improve a CRF-based semantic segmentation framework, boosting accuracy over baseline models

    Multi-stage Suture Detection for Robot Assisted Anastomosis based on Deep Learning

    Full text link
    In robotic surgery, task automation and learning from demonstration combined with human supervision is an emerging trend for many new surgical robot platforms. One such task is automated anastomosis, which requires bimanual needle handling and suture detection. Due to the complexity of the surgical environment and varying patient anatomies, reliable suture detection is difficult, which is further complicated by occlusion and thread topologies. In this paper, we propose a multi-stage framework for suture thread detection based on deep learning. Fully convolutional neural networks are used to obtain the initial detection and the overlapping status of suture thread, which are later fused with the original image to learn a gradient road map of the thread. Based on the gradient road map, multiple segments of the thread are extracted and linked to form the whole thread using a curvilinear structure detector. Experiments on two different types of sutures demonstrate the accuracy of the proposed framework.Comment: Submitted to ICRA 201

    Towards multiple 3D bone surface identification and reconstruction using few 2D X-ray images for intraoperative applications

    Get PDF
    This article discusses a possible method to use a small number, e.g. 5, of conventional 2D X-ray images to reconstruct multiple 3D bone surfaces intraoperatively. Each bone’s edge contours in X-ray images are automatically identified. Sparse 3D landmark points of each bone are automatically reconstructed by pairing the 2D X-ray images. The reconstructed landmark point distribution on a surface is approximately optimal covering main characteristics of the surface. A statistical shape model, dense point distribution model (DPDM), is then used to fit the reconstructed optimal landmarks vertices to reconstruct a full surface of each bone separately. The reconstructed surfaces can then be visualised and manipulated by surgeons or used by surgical robotic systems

    RGBD Datasets: Past, Present and Future

    Full text link
    Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style

    NiftyNet: a deep-learning platform for medical imaging

    Get PDF
    Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this application requires substantial implementation effort. Thus, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. NiftyNet provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D and 3D images and computational graphs by default. We present 3 illustrative medical image analysis applications built using NiftyNet: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. NiftyNet enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications.Comment: Wenqi Li and Eli Gibson contributed equally to this work. M. Jorge Cardoso and Tom Vercauteren contributed equally to this work. 26 pages, 6 figures; Update includes additional applications, updated author list and formatting for journal submissio
    corecore