372 research outputs found
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting
This paper proposes a single-shot approach for recognising clothing
categories from 2.5D features. We propose two visual features, BSP (B-Spline
Patch) and TSD (Topology Spatial Distances) for this task. The local BSP
features are encoded by LLC (Locality-constrained Linear Coding) and fused with
three different global features. Our visual feature is robust to deformable
shapes and our approach is able to recognise the category of unknown clothing
in unconstrained and random configurations. We integrated the category
recognition pipeline with a stereo vision system, clothing instance detection,
and dual-arm manipulators to achieve an autonomous sorting system. To verify
the performance of our proposed method, we build a high-resolution RGBD
clothing dataset of 50 clothing items of 5 categories sampled in random
configurations (a total of 2,100 clothing samples). Experimental results show
that our approach is able to reach 83.2\% accuracy while classifying clothing
items which were previously unseen during training. This advances beyond the
previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach
in an autonomous robot sorting system, in which the robot recognises a clothing
item from an unconstrained pile, grasps it, and sorts it into a box according
to its category. Our proposed sorting system achieves reasonable sorting
success rates with single-shot perception.Comment: 9 pages, accepted by IROS201
Dense 3D Object Reconstruction from a Single Depth View
In this paper, we propose a novel approach, 3D-RecGAN++, which reconstructs
the complete 3D structure of a given object from a single arbitrary depth view
using generative adversarial networks. Unlike existing work which typically
requires multiple views of the same object or class labels to recover the full
3D geometry, the proposed 3D-RecGAN++ only takes the voxel grid representation
of a depth view of the object as input, and is able to generate the complete 3D
occupancy grid with a high resolution of 256^3 by recovering the
occluded/missing regions. The key idea is to combine the generative
capabilities of autoencoders and the conditional Generative Adversarial
Networks (GAN) framework, to infer accurate and fine-grained 3D structures of
objects in high-dimensional voxel space. Extensive experiments on large
synthetic datasets and real-world Kinect datasets show that the proposed
3D-RecGAN++ significantly outperforms the state of the art in single view 3D
object reconstruction, and is able to reconstruct unseen types of objects.Comment: TPAMI 2018. Code and data are available at:
https://github.com/Yang7879/3D-RecGAN-extended. This article extends from
arXiv:1708.0796
Viewpoint Push Planning for Mapping of Unknown Confined Spaces
Viewpoint planning is an important task in any application where objects or
scenes need to be viewed from different angles to achieve sufficient coverage.
The mapping of confined spaces such as shelves is an especially challenging
task since objects occlude each other and the scene can only be observed from
the front, posing limitations on the possible viewpoints. In this paper, we
propose a deep reinforcement learning framework that generates promising views
aiming at reducing the map entropy. Additionally, the pipeline extends standard
viewpoint planning by predicting adequate minimally invasive push actions to
uncover occluded objects and increase the visible space. Using a 2.5D occupancy
height map as state representation that can be efficiently updated, our system
decides whether to plan a new viewpoint or perform a push. To learn feasible
pushes, we use a neural network to sample push candidates on the map based on
training data provided by human experts. As simulated and real-world
experimental results with a robotic arm show, our system is able to
significantly increase the mapped space compared to different baselines, while
the executed push actions highly benefit the viewpoint planner with only minor
changes to the object configuration.Comment: In: Proceedings of the IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 202
TransSC: Transformer-based Shape Completion for Grasp Evaluation
Currently, robotic grasping methods based on sparse partial point clouds have
attained a great grasping performance on various objects while they often
generate wrong grasping candidates due to the lack of geometric information on
the object. In this work, we propose a novel and robust shape completion model
(TransSC). This model has a transformer-based encoder to explore more
point-wise features and a manifold-based decoder to exploit more object details
using a partial point cloud as input.
Quantitative experiments verify the effectiveness of the proposed shape
completion network and demonstrate it outperforms existing methods. Besides,
TransSC is integrated into a grasp evaluation network to generate a set of
grasp candidates. The simulation experiment shows that TransSC improves the
grasping generation result compared to the existing shape completion baselines.
Furthermore, our robotic experiment shows that with TransSC the robot is more
successful in grasping objects that are randomly placed on a support surface
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
Perceiving accurate 3D object shape is important for robots to interact with
the physical world. Current research along this direction has been primarily
relying on visual observations. Vision, however useful, has inherent
limitations due to occlusions and the 2D-3D ambiguities, especially for
perception with a monocular camera. In contrast, touch gets precise local shape
information, though its efficiency for reconstructing the entire shape could be
low. In this paper, we propose a novel paradigm that efficiently perceives
accurate 3D object shape by incorporating visual and tactile observations, as
well as prior knowledge of common object shapes learned from large-scale shape
repositories. We use vision first, applying neural networks with learned shape
priors to predict an object's 3D shape from a single-view color image. We then
use tactile sensing to refine the shape; the robot actively touches the object
regions where the visual prediction has high uncertainty. Our method
efficiently builds the 3D shape of common objects from a color image and a
small number of tactile explorations (around 10). Our setup is easy to apply
and has potentials to help robots better perform grasping or manipulation tasks
on real-world objects.Comment: IROS 2018. The first two authors contributed equally to this wor
- …