427 research outputs found
Review of deep learning methods in robotic grasp detection
For robots to attain more general-purpose utility, grasping is a necessary skill to master. Such general-purpose robots may use their perception abilities to visually identify grasps for a given object. A grasp describes how a robotic end-effector can be arranged to securely grab an object and successfully lift it without slippage. Traditionally, grasp detection requires expert human knowledge to analytically form the task-specific algorithm, but this is an arduous and time-consuming approach. During the last five years, deep learning methods have enabled significant advancements in robotic vision, natural language processing, and automated driving applications. The successful results of these methods have driven robotics researchers to explore the use of deep learning methods in task-generalised robotic applications. This paper reviews the current state-of-the-art in regards to the application of deep learning methods to generalised robotic grasping and discusses how each element of the deep learning approach has improved the overall performance of robotic grasp detection. Several of the most promising approaches are evaluated and the most suitable for real-time grasp detection is identified as the one-shot detection method. The availability of suitable volumes of appropriate training data is identified as a major obstacle for effective utilisation of the deep learning approaches, and the use of transfer learning techniques is proposed as a potential mechanism to address this. Finally, current trends in the field and future potential research directions are discussed
Physical Primitive Decomposition
Objects are made of parts, each with distinct geometry, physics,
functionality, and affordances. Developing such a distributed, physical,
interpretable representation of objects will facilitate intelligent agents to
better explore and interact with the world. In this paper, we study physical
primitive decomposition---understanding an object through its components, each
with physical and geometric attributes. As annotated data for object parts and
physics are rare, we propose a novel formulation that learns physical
primitives by explaining both an object's appearance and its behaviors in
physical events. Our model performs well on block towers and tools in both
synthetic and real scenarios; we also demonstrate that visual and physical
observations often provide complementary signals. We further present ablation
and behavioral studies to better understand our model and contrast it with
human performance.Comment: ECCV 2018. Project page: http://ppd.csail.mit.edu
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
From ionic surfactants to Nafion through convolutional neural networks
We have applied recent machine learning advances, deep convolutional neural
network, to three-dimensional (voxels) soft matter data, generated by Molecular
Dynamics computer simulation. We have focused on the structural and phase
properties of a coarse-grained model of hydrated ionic surfactants. We have
trained a classifier able to automatically detect the water quantity absorbed
in the system, therefore associating to each hydration level the corresponding
most representative nano-structure. Based on the notion of transfer learning,
we have next applied the same network to the related polymeric ionomer Nafion,
and have extracted a measure of the similarity of these configurations with
those above. We demonstrate that on this basis is possible to express the
static structure factor of the polymer at fixed hydration level as a
superposition of those of the surfactants at multiple water contents. We
suggest that such a procedure can provide a useful, agnostic, data-driven,
precise description of the multi-scale structure of disordered materials,
without resorting to any a-priori model picture
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
For humans, the process of grasping an object relies heavily on rich tactile
feedback. Most recent robotic grasping work, however, has been based only on
visual input, and thus cannot easily benefit from feedback after initiating
contact. In this paper, we investigate how a robot can learn to use tactile
information to iteratively and efficiently adjust its grasp. To this end, we
propose an end-to-end action-conditional model that learns regrasping policies
from raw visuo-tactile data. This model -- a deep, multimodal convolutional
network -- predicts the outcome of a candidate grasp adjustment, and then
executes a grasp by iteratively selecting the most promising actions. Our
approach requires neither calibration of the tactile sensors, nor any
analytical modeling of contact forces, thus reducing the engineering effort
required to obtain efficient grasping policies. We train our model with data
from about 6,450 grasping trials on a two-finger gripper equipped with GelSight
high-resolution tactile sensors on each finger. Across extensive experiments,
our approach outperforms a variety of baselines at (i) estimating grasp
adjustment outcomes, (ii) selecting efficient grasp adjustments for quick
grasping, and (iii) reducing the amount of force applied at the fingers, while
maintaining competitive performance. Finally, we study the choices made by our
model and show that it has successfully acquired useful and interpretable
grasping behaviors.Comment: 8 pages. Published on IEEE Robotics and Automation Letters (RAL).
Website: https://sites.google.com/view/more-than-a-feelin
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
- …