517 research outputs found

    ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools

    Get PDF
    Real-time tool segmentation from endoscopic videos is an essential part of many computer-assisted robotic surgical systems and of critical importance in robotic surgical data science. We propose two novel deep learning architectures for automatic segmentation of non-rigid surgical instruments. Both methods take advantage of automated deep-learning-based multi-scale feature extraction while trying to maintain an accurate segmentation quality at all resolutions. The two proposed methods encode the multi-scale constraint inside the network architecture. The first proposed architecture enforces it by cascaded aggregation of predictions and the second proposed network does it by means of a holistically-nested architecture where the loss at each scale is taken into account for the optimization process. As the proposed methods are for real-time semantic labeling, both present a reduced number of parameters. We propose the use of parametric rectified linear units for semantic labeling in these small architectures to increase the regularization ability of the design and maintain the segmentation accuracy without overfitting the training sets. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets, including ex vivo cases with phantom tissue and different robotic surgical instruments present in the scene. Our results show a statistically significant improved Dice Similarity Coefficient over previous instrument segmentation methods. We analyze our design choices and discuss the key drivers for improving accuracy.Comment: Paper accepted at IROS 201

    Spartan Daily, October 27, 2016

    Get PDF
    Volume 147, Issue 25https://scholarworks.sjsu.edu/spartan_daily_2016/1065/thumbnail.jp

    Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping

    Full text link
    Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.Comment: 5 pages, 2 figure

    MouldingNet: Deep-Learning for 3D Object Reconstruction

    Get PDF
    With the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the (d+1)(d+1)-dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes

    MouldingNet: Deep-learning for 3D Object Reconstruction

    Get PDF
    th the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the (d Å1)-dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes
    corecore