12,367 research outputs found

    Multi-region segmentation of bladder cancer structures in MRI with progressive dilated convolutional networks

    Full text link
    Precise segmentation of bladder walls and tumor regions is an essential step towards non-invasive identification of tumor stage and grade, which is critical for treatment decision and prognosis of patients with bladder cancer (BC). However, the automatic delineation of bladder walls and tumor in magnetic resonance images (MRI) is a challenging task, due to important bladder shape variations, strong intensity inhomogeneity in urine and very high variability across population, particularly on tumors appearance. To tackle these issues, we propose to use a deep fully convolutional neural network. The proposed network includes dilated convolutions to increase the receptive field without incurring extra cost nor degrading its performance. Furthermore, we introduce progressive dilations in each convolutional block, thereby enabling extensive receptive fields without the need for large dilation rates. The proposed network is evaluated on 3.0T T2-weighted MRI scans from 60 pathologically confirmed patients with BC. Experiments shows the proposed model to achieve high accuracy, with a mean Dice similarity coefficient of 0.98, 0.84 and 0.69 for inner wall, outer wall and tumor region, respectively. These results represent a very good agreement with reference contours and an increase in performance compared to existing methods. In addition, inference times are less than a second for a whole 3D volume, which is between 2-3 orders of magnitude faster than related state-of-the-art methods for this application. We showed that a CNN can yield precise segmentation of bladder walls and tumors in bladder cancer patients on MRI. The whole segmentation process is fully-automatic and yields results in very good agreement with the reference standard, demonstrating the viability of deep learning models for the automatic multi-region segmentation of bladder cancer MRI images.Comment: Published at the journal of Medical Physic

    Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

    Full text link
    To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene geometry, the key insight toward a truly functional understanding of the environment is the usage of higher-level entities during mapping, such as individual object instances. We propose an approach to incrementally build volumetric object-centric maps during online scanning with a localized RGB-D camera. First, a per-frame segmentation scheme combines an unsupervised geometric approach with instance-aware semantic object predictions. This allows us to detect and segment elements both from the set of known classes and from other, previously unseen categories. Next, a data association step tracks the predicted instances across the different frames. Finally, a map integration strategy fuses information about their 3D shape, location, and, if available, semantic class into a global volume. Evaluation on a publicly available dataset shows that the proposed approach for building instance-level semantic maps is competitive with state-of-the-art methods, while additionally able to discover objects of unseen categories. The system is further evaluated within a real-world robotic mapping setup, for which qualitative results highlight the online nature of the method.Comment: 8 pages, 4 figures. To be published in IEEE Robotics and Automation Letters (RA-L) and 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Accompanying video material can be found at http://youtu.be/Jvl42VJmYx

    Analysis and Modeling of 3D Indoor Scenes

    Full text link
    We live in a 3D world, performing activities and interacting with objects in the indoor environments everyday. Indoor scenes are the most familiar and essential environments in everyone's life. In the virtual world, 3D indoor scenes are also ubiquitous in 3D games and interior design. With the fast development of VR/AR devices and the emerging applications, the demand of realistic 3D indoor scenes keeps growing rapidly. Currently, designing detailed 3D indoor scenes requires proficient 3D designing and modeling skills and is often time-consuming. For novice users, creating realistic and complex 3D indoor scenes is even more difficult and challenging. Many efforts have been made in different research communities, e.g. computer graphics, vision and robotics, to capture, analyze and generate the 3D indoor data. This report mainly focuses on the recent research progress in graphics on geometry, structure and semantic analysis of 3D indoor data and different modeling techniques for creating plausible and realistic indoor scenes. We first review works on understanding and semantic modeling of scenes from captured 3D data of the real world. Then, we focus on the virtual scenes composed of 3D CAD models and study methods for 3D scene analysis and processing. After that, we survey various modeling paradigms for creating 3D indoor scenes and investigate human-centric scene analysis and modeling, which bridge indoor scene studies of graphics, vision and robotics. At last, we discuss open problems in indoor scene processing that might bring interests to graphics and all related communities

    Radiological images and machine learning: trends, perspectives, and prospects

    Full text link
    The application of machine learning to radiological images is an increasingly active research area that is expected to grow in the next five to ten years. Recent advances in machine learning have the potential to recognize and classify complex patterns from different radiological imaging modalities such as x-rays, computed tomography, magnetic resonance imaging and positron emission tomography imaging. In many applications, machine learning based systems have shown comparable performance to human decision-making. The applications of machine learning are the key ingredients of future clinical decision making and monitoring systems. This review covers the fundamental concepts behind various machine learning techniques and their applications in several radiological imaging areas, such as medical image segmentation, brain function studies and neurological disease diagnosis, as well as computer-aided systems, image registration, and content-based image retrieval systems. Synchronistically, we will briefly discuss current challenges and future directions regarding the application of machine learning in radiological imaging. By giving insight on how take advantage of machine learning powered applications, we expect that clinicians can prevent and diagnose diseases more accurately and efficiently.Comment: 13 figure

    ConvPoint: Continuous Convolutions for Point Cloud Processing

    Full text link
    Point clouds are unstructured and unordered data, as opposed to images. Thus, most machine learning approach developed for image cannot be directly transferred to point clouds. In this paper, we propose a generalization of discrete convolutional neural networks (CNNs) in order to deal with point clouds by replacing discrete kernels by continuous ones. This formulation is simple, allows arbitrary point cloud sizes and can easily be used for designing neural networks similarly to 2D CNNs. We present experimental results with various architectures, highlighting the flexibility of the proposed approach. We obtain competitive results compared to the state-of-the-art on shape classification, part segmentation and semantic segmentation for large-scale point clouds.Comment: 12 page

    Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control

    Full text link
    In the context of deep learning for robotics, we show effective method of training a real robot to grasp a tiny sphere (1.37cm of diameter), with an original combination of system design choices. We decompose the end-to-end system into a vision module and a closed-loop controller module. The two modules use target object segmentation as their common interface. The vision module extracts information from the robot end-effector camera, in the form of a binary segmentation mask of the target. We train it to achieve effective domain transfer by composing real background images with simulated images of the target. The controller module takes as input the binary segmentation mask, and thus is agnostic to visual discrepancies between simulated and real environments. We train our closed-loop controller in simulation using imitation learning and show it is robust with respect to discrepancies between the dynamic model of the simulated and real robot: when combined with eye-in-hand observations, we achieve a 90% success rate in grasping a tiny sphere with a real robot. The controller can generalize to unseen scenarios where the target is moving and even learns to recover from failures.Comment: Neural Information Processing Systems (NIPS) 2017 Workshop on Acting and Interacting in the Real World: Challenges in Robot Learnin

    Learning to Sample

    Full text link
    Processing large point clouds is a challenging task. Therefore, the data is often sampled to a size that can be processed more easily. The question is how to sample the data? A popular sampling technique is Farthest Point Sampling (FPS). However, FPS is agnostic to a downstream application (classification, retrieval, etc.). The underlying assumption seems to be that minimizing the farthest point distance, as done by FPS, is a good proxy to other objective functions. We show that it is better to learn how to sample. To do that, we propose a deep network to simplify 3D point clouds. The network, termed S-NET, takes a point cloud and produces a smaller point cloud that is optimized for a particular task. The simplified point cloud is not guaranteed to be a subset of the original point cloud. Therefore, we match it to a subset of the original points in a post-processing step. We contrast our approach with FPS by experimenting on two standard data sets and show significantly better results for a variety of applications. Our code is publicly available at: https://github.com/orendv/learning_to_sampleComment: CVPR 201

    Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features

    Full text link
    Cardiac magnetic resonance imaging improves on diagnosis of cardiovascular diseases by providing images at high spatiotemporal resolution. Manual evaluation of these time-series, however, is expensive and prone to biased and non-reproducible outcomes. In this paper, we present a method that addresses named limitations by integrating segmentation and disease classification into a fully automatic processing pipeline. We use an ensemble of UNet inspired architectures for segmentation of cardiac structures such as the left and right ventricular cavity (LVC, RVC) and the left ventricular myocardium (LVM) on each time instance of the cardiac cycle. For the classification task, information is extracted from the segmented time-series in form of comprehensive features handcrafted to reflect diagnostic clinical procedures. Based on these features we train an ensemble of heavily regularized multilayer perceptrons (MLP) and a random forest classifier to predict the pathologic target class. We evaluated our method on the ACDC dataset (4 pathology groups, 1 healthy group) and achieve dice scores of 0.945 (LVC), 0.908 (RVC) and 0.905 (LVM) in a cross-validation over the training set (100 cases) and 0.950 (LVC), 0.923 (RVC) and 0.911 (LVM) on the test set (50 cases). We report a classification accuracy of 94% on a training set cross-validation and 92% on the test set. Our results underpin the potential of machine learning methods for accurate, fast and reproducible segmentation and computer-assisted diagnosis (CAD).Comment: To appear in the STACOM 2017 proceeding

    IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction

    Full text link
    The majority of the existing methods for non-rigid 3D surface regression from monocular 2D images require an object template or point tracks over multiple frames as an input, and are still far from real-time processing rates. In this work, we present the Isometry-Aware Monocular Generative Adversarial Network (IsMo-GAN) - an approach for direct 3D reconstruction from a single image, trained for the deformation model in an adversarial manner on a light-weight synthetic dataset. IsMo-GAN reconstructs surfaces from real images under varying illumination, camera poses, textures and shading at over 250 Hz. In multiple experiments, it consistently outperforms several approaches in the reconstruction accuracy, runtime, generalisation to unknown surfaces and robustness to occlusions. In comparison to the state-of-the-art, we reduce the reconstruction error by 10-30% including the textureless case and our surfaces evince fewer artefacts qualitatively.Comment: 13 pages, 11 figures, 4 tables, 6 sections, 73 reference

    MeshCNN: A Network with an Edge

    Full text link
    Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of our task-driven pooling on various learning tasks applied to 3D meshes.Comment: For a two-minute explanation video see https://bit.ly/meshcnnvide
    • …