517 research outputs found
ToolNet: Holistically-Nested Real-Time Segmentation of Robotic Surgical Tools
Real-time tool segmentation from endoscopic videos is an essential part of
many computer-assisted robotic surgical systems and of critical importance in
robotic surgical data science. We propose two novel deep learning architectures
for automatic segmentation of non-rigid surgical instruments. Both methods take
advantage of automated deep-learning-based multi-scale feature extraction while
trying to maintain an accurate segmentation quality at all resolutions. The two
proposed methods encode the multi-scale constraint inside the network
architecture. The first proposed architecture enforces it by cascaded
aggregation of predictions and the second proposed network does it by means of
a holistically-nested architecture where the loss at each scale is taken into
account for the optimization process. As the proposed methods are for real-time
semantic labeling, both present a reduced number of parameters. We propose the
use of parametric rectified linear units for semantic labeling in these small
architectures to increase the regularization ability of the design and maintain
the segmentation accuracy without overfitting the training sets. We compare the
proposed architectures against state-of-the-art fully convolutional networks.
We validate our methods using existing benchmark datasets, including ex vivo
cases with phantom tissue and different robotic surgical instruments present in
the scene. Our results show a statistically significant improved Dice
Similarity Coefficient over previous instrument segmentation methods. We
analyze our design choices and discuss the key drivers for improving accuracy.Comment: Paper accepted at IROS 201
Spartan Daily, October 27, 2016
Volume 147, Issue 25https://scholarworks.sjsu.edu/spartan_daily_2016/1065/thumbnail.jp
Explaining Multimodal Data Fusion: Occlusion Analysis for Wilderness Mapping
Jointly harnessing complementary features of multi-modal input data in a
common latent space has been found to be beneficial long ago. However, the
influence of each modality on the models decision remains a puzzle. This study
proposes a deep learning framework for the modality-level interpretation of
multimodal earth observation data in an end-to-end fashion. While leveraging an
explainable machine learning method, namely Occlusion Sensitivity, the proposed
framework investigates the influence of modalities under an early-fusion
scenario in which the modalities are fused before the learning process. We show
that the task of wilderness mapping largely benefits from auxiliary data such
as land cover and night time light data.Comment: 5 pages, 2 figure
MouldingNet: Deep-Learning for 3D Object Reconstruction
With the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the -dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes
MouldingNet: Deep-learning for 3D Object Reconstruction
th the rise of deep neural networks a number of approaches for learning over 3D data have gained popularity. In this paper, we take advantage of one of these approaches, bilateral convolutional layers to propose a novel end-to-end deep auto-encoder architecture to efficiently encode and reconstruct 3D point clouds. Bilateral convolutional layers project the input point cloud onto an even tessellation of a hyperplane in the (d Å1)-dimensional space known as the permutohedral lattice and perform convolutions over this representation. In contrast to existing point cloud based learning approaches, this allows us to learn over the underlying geometry of the object to create a robust global descriptor. We demonstrate its accuracy by evaluating across the shapenet and modelnet datasets, in order to illustrate 2 main scenarios, known and unknown object reconstruction. These experiments show that our network generalises well from seen classes to unseen classes
- …