12,150 research outputs found
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting
This paper proposes a single-shot approach for recognising clothing
categories from 2.5D features. We propose two visual features, BSP (B-Spline
Patch) and TSD (Topology Spatial Distances) for this task. The local BSP
features are encoded by LLC (Locality-constrained Linear Coding) and fused with
three different global features. Our visual feature is robust to deformable
shapes and our approach is able to recognise the category of unknown clothing
in unconstrained and random configurations. We integrated the category
recognition pipeline with a stereo vision system, clothing instance detection,
and dual-arm manipulators to achieve an autonomous sorting system. To verify
the performance of our proposed method, we build a high-resolution RGBD
clothing dataset of 50 clothing items of 5 categories sampled in random
configurations (a total of 2,100 clothing samples). Experimental results show
that our approach is able to reach 83.2\% accuracy while classifying clothing
items which were previously unseen during training. This advances beyond the
previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach
in an autonomous robot sorting system, in which the robot recognises a clothing
item from an unconstrained pile, grasps it, and sorts it into a box according
to its category. Our proposed sorting system achieves reasonable sorting
success rates with single-shot perception.Comment: 9 pages, accepted by IROS201
Fast, Accurate Thin-Structure Obstacle Detection for Autonomous Mobile Robots
Safety is paramount for mobile robotic platforms such as self-driving cars
and unmanned aerial vehicles. This work is devoted to a task that is
indispensable for safety yet was largely overlooked in the past -- detecting
obstacles that are of very thin structures, such as wires, cables and tree
branches. This is a challenging problem, as thin objects can be problematic for
active sensors such as lidar and sonar and even for stereo cameras. In this
work, we propose to use video sequences for thin obstacle detection. We
represent obstacles with edges in the video frames, and reconstruct them in 3D
using efficient edge-based visual odometry techniques. We provide both a
monocular camera solution and a stereo camera solution. The former incorporates
Inertial Measurement Unit (IMU) data to solve scale ambiguity, while the latter
enjoys a novel, purely vision-based solution. Experiments demonstrated that the
proposed methods are fast and able to detect thin obstacles robustly and
accurately under various conditions.Comment: Appeared at IEEE CVPR 2017 Workshop on Embedded Visio
Understanding of Object Manipulation Actions Using Human Multi-Modal Sensory Data
Object manipulation actions represent an important share of the Activities of
Daily Living (ADLs). In this work, we study how to enable service robots to use
human multi-modal data to understand object manipulation actions, and how they
can recognize such actions when humans perform them during human-robot
collaboration tasks. The multi-modal data in this study consists of videos,
hand motion data, applied forces as represented by the pressure patterns on the
hand, and measurements of the bending of the fingers, collected as human
subjects performed manipulation actions. We investigate two different
approaches. In the first one, we show that multi-modal signal (motion, finger
bending and hand pressure) generated by the action can be decomposed into a set
of primitives that can be seen as its building blocks. These primitives are
used to define 24 multi-modal primitive features. The primitive features can in
turn be used as an abstract representation of the multi-modal signal and
employed for action recognition. In the latter approach, the visual features
are extracted from the data using a pre-trained image classification deep
convolutional neural network. The visual features are subsequently used to
train the classifier. We also investigate whether adding data from other
modalities produces a statistically significant improvement in the classifier
performance. We show that both approaches produce a comparable performance.
This implies that image-based methods can successfully recognize human actions
during human-robot collaboration. On the other hand, in order to provide
training data for the robot so it can learn how to perform object manipulation
actions, multi-modal data provides a better alternative
- …