1,301 research outputs found
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting
This paper proposes a single-shot approach for recognising clothing
categories from 2.5D features. We propose two visual features, BSP (B-Spline
Patch) and TSD (Topology Spatial Distances) for this task. The local BSP
features are encoded by LLC (Locality-constrained Linear Coding) and fused with
three different global features. Our visual feature is robust to deformable
shapes and our approach is able to recognise the category of unknown clothing
in unconstrained and random configurations. We integrated the category
recognition pipeline with a stereo vision system, clothing instance detection,
and dual-arm manipulators to achieve an autonomous sorting system. To verify
the performance of our proposed method, we build a high-resolution RGBD
clothing dataset of 50 clothing items of 5 categories sampled in random
configurations (a total of 2,100 clothing samples). Experimental results show
that our approach is able to reach 83.2\% accuracy while classifying clothing
items which were previously unseen during training. This advances beyond the
previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach
in an autonomous robot sorting system, in which the robot recognises a clothing
item from an unconstrained pile, grasps it, and sorts it into a box according
to its category. Our proposed sorting system achieves reasonable sorting
success rates with single-shot perception.Comment: 9 pages, accepted by IROS201
Automatic 3D bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach
Deep learning approaches have achieved state-of-the-art performance in
cardiac magnetic resonance (CMR) image segmentation. However, most approaches
have focused on learning image intensity features for segmentation, whereas the
incorporation of anatomical shape priors has received less attention. In this
paper, we combine a multi-task deep learning approach with atlas propagation to
develop a shape-constrained bi-ventricular segmentation pipeline for short-axis
CMR volumetric images. The pipeline first employs a fully convolutional network
(FCN) that learns segmentation and landmark localisation tasks simultaneously.
The architecture of the proposed FCN uses a 2.5D representation, thus combining
the computational advantage of 2D FCNs networks and the capability of
addressing 3D spatial consistency without compromising segmentation accuracy.
Moreover, the refinement step is designed to explicitly enforce a shape
constraint and improve segmentation quality. This step is effective for
overcoming image artefacts (e.g. due to different breath-hold positions and
large slice thickness), which preclude the creation of anatomically meaningful
3D cardiac shapes. The proposed pipeline is fully automated, due to network's
ability to infer landmarks, which are then used downstream in the pipeline to
initialise atlas propagation. We validate the pipeline on 1831 healthy subjects
and 649 subjects with pulmonary hypertension. Extensive numerical experiments
on the two datasets demonstrate that our proposed method is robust and capable
of producing accurate, high-resolution and anatomically smooth bi-ventricular
3D models, despite the artefacts in input CMR volumes
Dense 3D Object Reconstruction from a Single Depth View
In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs
the complete 3D structure of a given object from a single arbitrary depth view
using generative adversarial networks. Unlike existing work which typically
requires multiple views of the same object or class labels to recover the full
3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation
of a depth view of the object as input, and is able to generate the complete 3D
occupancy grid with a high resolution of 256^3 by recovering the
occluded/missing regions. The key idea is to combine the generative
capabilities of autoencoders and the conditional Generative Adversarial
Networks (GAN) framework, to infer accurate and fine-grained 3D structures of
objects in high-dimensional voxel space. Extensive experiments on large
synthetic datasets and real-world Kinect datasets show that the proposed
3D-RecGAN++ significantly outperforms the state of the art in single view 3D
object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at:
https://github.com/Yang7879/3D-RecGAN-extended. This article extends from
arXiv:1708.0796
Designing Volumetric Truss Structures
We present the first algorithm for designing volumetric Michell Trusses. Our
method uses a parametrization approach to generate trusses made of structural
elements aligned with the primary direction of an object's stress field. Such
trusses exhibit high strength-to-weight ratios. We demonstrate the structural
robustness of our designs via a posteriori physical simulation. We believe our
algorithm serves as an important complement to existing structural optimization
tools and as a novel standalone design tool itself
DeepNav: Joint View Learning for Direct Optimal Path Perception in Cochlear Surgical Platform Navigation
Although much research has been conducted in the field of automated cochlear implant navigation, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as identifying the optimal navigation zone (OPZ) in the cochlear. In this paper, a 2.5D joint-view convolutional neural network (2.5D CNN) is proposed and evaluated for the identification of the OPZ in the cochlear segments. The proposed network consists of 2 complementary sagittal and bird-view (or top view) networks for the 3D OPZ recognition, each utilizing a ResNet-8 architecture consisting of 5 convolutional layers with rectified nonlinearity unit (ReLU) activations, followed by average pooling with size equal to the size of the final feature maps. The last fully connected layer of each network has 4 indicators, equivalent to the classes considered: the distance to the adjacent left and right walls, collision probability and heading angle. To demonstrate this, the 2.5D CNN was trained using a parametric data generation model, and then evaluated using anatomically constructed cochlea models from the micro-CT images of different cases. Prediction of the indicators demonstrates the effectiveness of the 2.5D CNN, for example the heading angle has less than 1° error with computation delays of less that <1 milliseconds
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
- …