64 research outputs found
Continuous Perception for Classifying Shapes and Weights of Garmentsfor Robotic Vision Applications
We present an approach to continuous perception for robotic laundry tasks.
Our assumption is that the visual prediction of a garment's shapes and weights
is possible via a neural network that learns the dynamic changes of garments
from video sequences. Continuous perception is leveraged during training by
inputting consecutive frames, of which the network learns how a garment
deforms. To evaluate our hypothesis, we captured a dataset of 40K RGB and 40K
depth video sequences while a garment is being manipulated. We also conducted
ablation studies to understand whether the neural network learns the physical
and dynamic properties of garments. Our findings suggest that a modified
AlexNet-LSTM architecture has the best classification performance for the
garment's shape and weights. To further provide evidence that continuous
perception facilitates the prediction of the garment's shapes and weights, we
evaluated our network on unseen video sequences and computed the 'Moving
Average' over a sequence of predictions. We found that our network has a
classification accuracy of 48% and 60% for shapes and weights of garments,
respectively.Comment: Accepted by the 17th International Conference on Computer Vision
Theory and Application
A Portable Active Binocular Robot Vision Architecture for Scene Exploration
We present a portable active binocular robot vision archi-
tecture that integrates a number of visual behaviours. This vision archi-
tecture inherits the abilities of vergence, localisation, recognition and si-
multaneous identification of multiple target object instances. To demon-
strate the portability of our vision architecture, we carry out qualitative
and comparative analysis under two different hardware robotic settings,
feature extraction techniques and viewpoints. Our portable active binoc-
ular robot vision architecture achieved average recognition rates of 93.5%
for fronto-parallel viewpoints and, 83% percentage for anthropomorphic
viewpoints, respectively
Object Edge Contour Localisation Based on HexBinary Feature Matching
This paper addresses the issue of localising object
edge contours in cluttered backgrounds to support robotics
tasks such as grasping and manipulation and also to improve
the potential perceptual capabilities of robot vision systems. Our
approach is based on coarse-to-fine matching of a new recursively
constructed hierarchical, dense, edge-localised descriptor,
the HexBinary, based on the HexHog descriptor structure first
proposed in [1]. Since Binary String image descriptors [2]–
[5] require much lower computational resources, but provide
similar or even better matching performance than Histogram
of Orientated Gradient (HoG) descriptors, we have replaced
the HoG base descriptor fields used in HexHog with Binary
Strings generated from first and second order polar derivative
approximations. The ALOI [6] dataset is used to evaluate
the HexBinary descriptors which we demonstrate to achieve
a superior performance to that of HexHoG [1] for pose
refinement. The validation of our object contour localisation
system shows promising results with correctly labelling ~86% of edgel positions and mis-labelling ~3%
Interactive Perception based on Gaussian Process Classification Applied to Household Object Recognition & Sorting
No abstract available
A hierarchical active binocular robot vision architecture for scene exploration and object appearance learning
This thesis presents an investigation of a computational model of hierarchical visual behaviours within an active binocular robot vision architecture. The robot vision system is able to localise multiple instances of the same object class, while simultaneously maintaining vergence and directing its gaze to attend and recognise objects within cluttered, complex scenes. This is achieved by implementing all image analysis in an egocentric symbolic space without creating explicit pixel-space maps and without the need for calibration or other knowledge of the camera geometry. One of the important aspects of the active binocular vision paradigm requires that visual features in both camera eyes must be bound together in order to drive visual search to saccade, locate and recognise putative objects or salient locations in the robot's field of view. The system structure is based on the “attentional spotlight” metaphor of biological systems and a collection of abstract and reactive visual behaviours arranged in a hierarchical structure.
Several studies have shown that the human brain represents and learns objects for recognition by snapshots of 2-dimensional views of the imaged scene that happens to contain the object of interest during active interaction (exploration) of the environment. Likewise, psychophysical findings specify that the primate’s visual cortex represents common everyday objects by a hierarchical structure of their parts or sub-features and, consequently, recognise by simple but imperfect 2D view object part approximations. This thesis incorporates the above observations into an active visual learning behaviour in the hierarchical active binocular robot vision architecture. By actively exploring the object viewing sphere (as higher mammals do), the robot vision system automatically synthesises and creates its own part-based object representation from multiple observations while a human teacher indicates the object and supplies a classification name. Its is proposed to adopt the computational concepts of a visual learning exploration mechanism that controls the accumulation of visual evidence and directs attention towards the spatial salient object parts.
The behavioural structure of the binocular robot vision architecture is loosely modelled by a WHAT and WHERE visual streams. The WHERE stream maintains and binds spatial attention on the object part coordinates that egocentrically characterises the location of the object of interest and extracts spatio-temporal properties of feature coordinates and descriptors. The WHAT stream either determines the identity of an object or triggers a learning behaviour that stores view-invariant feature descriptions of the object part. Therefore, the robot vision is capable to perform a collection of different specific visual tasks such as vergence, detection, discrimination, recognition localisation and multiple same-instance identification. This classification of tasks enables the robot vision system to execute and fulfil specified high-level tasks, e.g. autonomous scene exploration and active object appearance learning
Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting
We present an interactive perception model for
object sorting based on Gaussian Process (GP) classification
that is capable of recognizing objects categories from point
cloud data. In our approach, FPFH features are extracted from
point clouds to describe the local 3D shape of objects and
a Bag-of-Words coding method is used to obtain an object-level
vocabulary representation. Multi-class Gaussian Process
classification is employed to provide and probable estimation of
the identity of the object and serves a key role in the interactive
perception cycle – modelling perception confidence. We show
results from simulated input data on both SVM and GP based
multi-class classifiers to validate the recognition accuracy of our
proposed perception model. Our results demonstrate that by
using a GP-based classifier, we obtain true positive classification
rates of up to 80%. Our semi-autonomous object sorting
experiments show that the proposed GP based interactive
sorting approach outperforms random sorting by up to 30%
when applied to scenes comprising configurations of household
objects
Recognising the Clothing Categories from Free-Configuration Using Gaussian-Process-Based Interactive Perception
In this paper, we propose a Gaussian Process- based interactive perception approach for recognising highly- wrinkled clothes. We have integrated this recognition method within a clothes sorting pipeline for the pre-washing stage of an autonomous laundering process. Our approach differs from reported clothing manipulation approaches by allowing the robot to update its perception confidence via numerous interactions with the garments. The classifiers predominantly reported in clothing perception (e.g. SVM, Random Forest) studies do not provide true classification probabilities, due to their inherent structure. In contrast, probabilistic classifiers (of which the Gaussian Process is a popular example) are able to provide predictive probabilities. In our approach, we employ a multi-class Gaussian Process classification using the Laplace approximation for posterior inference and optimising hyper-parameters via marginal likelihood maximisation. Our experimental results show that our approach is able to recognise unknown garments from highly-occluded and wrinkled con- figurations and demonstrates a substantial improvement over non-interactive perception approaches
On the Calibration of Active Binocular and RGBD Vision Systems for Dual-Arm Robots
This paper describes a camera and hand-eye
calibration methodology for integrating an active binocular
robot head within a dual-arm robot. For this purpose, we
derive the forward kinematic model of our active robot head
and describe our methodology for calibrating and integrating
our robot head. This rigid calibration provides a closedform
hand-to-eye solution. We then present an approach for
updating dynamically camera external parameters for optimal
3D reconstruction that are the foundation for robotic tasks such
as grasping and manipulating rigid and deformable objects. We
show from experimental results that our robot head achieves
an overall sub millimetre accuracy of less than 0.3 millimetres
while recovering the 3D structure of a scene. In addition, we
report a comparative study between current RGBD cameras
and our active stereo head within two dual-arm robotic testbeds
that demonstrates the accuracy and portability of our proposed
methodology
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting
This paper proposes a single-shot approach for recognising clothing
categories from 2.5D features. We propose two visual features, BSP (B-Spline
Patch) and TSD (Topology Spatial Distances) for this task. The local BSP
features are encoded by LLC (Locality-constrained Linear Coding) and fused with
three different global features. Our visual feature is robust to deformable
shapes and our approach is able to recognise the category of unknown clothing
in unconstrained and random configurations. We integrated the category
recognition pipeline with a stereo vision system, clothing instance detection,
and dual-arm manipulators to achieve an autonomous sorting system. To verify
the performance of our proposed method, we build a high-resolution RGBD
clothing dataset of 50 clothing items of 5 categories sampled in random
configurations (a total of 2,100 clothing samples). Experimental results show
that our approach is able to reach 83.2\% accuracy while classifying clothing
items which were previously unseen during training. This advances beyond the
previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach
in an autonomous robot sorting system, in which the robot recognises a clothing
item from an unconstrained pile, grasps it, and sorts it into a box according
to its category. Our proposed sorting system achieves reasonable sorting
success rates with single-shot perception.Comment: 9 pages, accepted by IROS201
A Biologically Motivated Software Retina for Robotic Sensors for ARM-Based Mobile Platform Technology
A key issue in designing robotics systems is the cost of an integrated camera sensor that meets the bandwidth/processing requirement for many advanced robotics applications, especially lightweight robotics applications, such as visual surveillance or SLAM in autonomous aerial vehicles. There is currently much work going on to adapt smartphones to provide complete robot vision systems, as the smartphone is so exquisitely integrated by having camera(s), inertial sensing, sound I/O and excellent wireless connectivity. Mass market production makes this a very low-cost platform and manufacturers from quadrotor drone suppliers to children’s toys, such as the Meccanoid robot [5], employ a smartphone to provide a vision system/control system [7,8].
Accordingly, many research groups are attempting to optimise image analysis, computer vision and machine learning libraries for the smartphone platform. However current approaches to robot vision remain highly demanding for mobile processors such as the ARM, and while a number of algorithms have been developed, these are very stripped down, i.e. highly compromised in function or performance. For example, the semi-dense visual odometry implementation of [1] operates on images of only 320x240pixels.
In our research we have been developing biologically motivated foveated vision algorithms based on a model of the mammalian retina [2], potentially 100 times more efficient than their conventional counterparts. Accordingly, vision systems based on the foveated architectures found in mammals have also the potential to reduce bandwidth and processing requirements by about x100 - it has been estimated that our brains would weigh ~60Kg if we were to process all our visual input at uniform high resolution. We have reported a foveated visual architecture [2,3,4] that implements a functional model of the retina-visual cortex to produce feature vectors that can be matched/classified using conventional methods, or indeed could be adapted to employ Deep Convolutional Neural Nets for the classification/interpretation stage. Given the above processing/bandwidth limitations, a viable way forward would be to perform off-line learning and implement the forward recognition path on the mobile platform, returning simple object labels, or sparse hierarchical feature symbols, and gaze control commands to the host robot vision system and controller.
We are now at the early stages of investigating how best to port our foveated architecture onto an ARM-based smartphone platform. To achieve the required levels of performance we propose to port and optimise our retina model to the mobile ARM processor architecture in conjunction with their integrated GPUs. We will then be in the position to provide a foveated smart vision system on a smartphone with the advantage of processing speed gains and bandwidth optimisations. Our approach will be to develop efficient parallelising compilers and perhaps propose new processor architectural features to support this approach to computer vision, e.g. efficient processing of hexagonally sampled foveated images.
Our current goal is to have a foveated system running in real-time on at least a 1080p input video stream to serve as a front-end robot sensor for tasks such as general purpose object recognition and reliable dense SLAM using a commercial off-the-shelf smartphone. Initially this system would communicate a symbol stream to conventional hardware performing back-end visual classification/interpretation, although simple object detection and recognition tasks should be possible on-board the device. We propose that, as in Nature, foveated vision is the key to achieving the necessary data reduction to be able to implement complete visual recognition and learning processes on the smartphone itself
- …