1,088 research outputs found
Understanding of Object Manipulation Actions Using Human Multi-Modal Sensory Data
Object manipulation actions represent an important share of the Activities of
Daily Living (ADLs). In this work, we study how to enable service robots to use
human multi-modal data to understand object manipulation actions, and how they
can recognize such actions when humans perform them during human-robot
collaboration tasks. The multi-modal data in this study consists of videos,
hand motion data, applied forces as represented by the pressure patterns on the
hand, and measurements of the bending of the fingers, collected as human
subjects performed manipulation actions. We investigate two different
approaches. In the first one, we show that multi-modal signal (motion, finger
bending and hand pressure) generated by the action can be decomposed into a set
of primitives that can be seen as its building blocks. These primitives are
used to define 24 multi-modal primitive features. The primitive features can in
turn be used as an abstract representation of the multi-modal signal and
employed for action recognition. In the latter approach, the visual features
are extracted from the data using a pre-trained image classification deep
convolutional neural network. The visual features are subsequently used to
train the classifier. We also investigate whether adding data from other
modalities produces a statistically significant improvement in the classifier
performance. We show that both approaches produce a comparable performance.
This implies that image-based methods can successfully recognize human actions
during human-robot collaboration. On the other hand, in order to provide
training data for the robot so it can learn how to perform object manipulation
actions, multi-modal data provides a better alternative
A Recurrent Deep Neural Network Model to measure Sentence Complexity for the Italian Language
Text simplification (TS) is a natural language processing task devoted to the modification of a text in such a way that the grammar and structure of the phrases is greatly simplified, preserving the underlying meaning and information contents. In this paper we give a contribution to the TS field presenting a deep neural network model able to detect the complexity of italian sentences. In particular, the system gives a score to an input text that identifies the confidence level during the decision making process and that could be interpreted as a measure of the sentence complexity. Experiments have been carried out on one public corpus of Italian texts created specifically for the task of TS. We have also provided a comparison of our model with a state of the art method
used for the same purpos
CSGNet: Neural Shape Parser for Constructive Solid Geometry
We present a neural architecture that takes as input a 2D or 3D shape and
outputs a program that generates the shape. The instructions in our program are
based on constructive solid geometry principles, i.e., a set of boolean
operations on shape primitives defined recursively. Bottom-up techniques for
this shape parsing task rely on primitive detection and are inherently slow
since the search space over possible primitive combinations is large. In
contrast, our model uses a recurrent neural network that parses the input shape
in a top-down manner, which is significantly faster and yields a compact and
easy-to-interpret sequence of modeling instructions. Our model is also more
effective as a shape detector compared to existing state-of-the-art detection
techniques. We finally demonstrate that our network can be trained on novel
datasets without ground-truth program annotations through policy gradient
techniques.Comment: Accepted at CVPR-201
Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster
Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied.This work is partially supported by the Spanish Ministry of Economy and Competitivity under
contract TIN2012-34557, by the BSC-CNS Severo Ochoa program (SEV-2011-00067), by the SGR programmes (2014-SGR-1051 and 2014-SGR-1421) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European Regional Development Fund (ERDF). We also would like to thank the technical support team at the Barcelona Supercomputing center (BSC) especially to Carlos Tripiana.Peer ReviewedPostprint (published version
- …