5,189 research outputs found
Toward Abstraction from Multi-modal Data: Empirical Studies on Multiple Time-scale Recurrent Models
The abstraction tasks are challenging for multi- modal sequences as they
require a deeper semantic understanding and a novel text generation for the
data. Although the recurrent neural networks (RNN) can be used to model the
context of the time-sequences, in most cases the long-term dependencies of
multi-modal data make the back-propagation through time training of RNN tend to
vanish in the time domain. Recently, inspired from Multiple Time-scale
Recurrent Neural Network (MTRNN), an extension of Gated Recurrent Unit (GRU),
called Multiple Time-scale Gated Recurrent Unit (MTGRU), has been proposed to
learn the long-term dependencies in natural language processing. Particularly
it is also able to accomplish the abstraction task for paragraphs given that
the time constants are well defined. In this paper, we compare the MTRNN and
MTGRU in terms of its learning performances as well as their abstraction
representation on higher level (with a slower neural activation). This was done
by conducting two studies based on a smaller data- set (two-dimension time
sequences from non-linear functions) and a relatively large data-set
(43-dimension time sequences from iCub manipulation tasks with multi-modal
data). We conclude that gated recurrent mechanisms may be necessary for
learning long-term dependencies in large dimension multi-modal data-sets (e.g.
learning of robot manipulation), even when natural language commands was not
involved. But for smaller learning tasks with simple time-sequences, generic
version of recurrent models, such as MTRNN, were sufficient to accomplish the
abstraction task.Comment: Accepted by IJCNN 201
Understanding of Object Manipulation Actions Using Human Multi-Modal Sensory Data
Object manipulation actions represent an important share of the Activities of
Daily Living (ADLs). In this work, we study how to enable service robots to use
human multi-modal data to understand object manipulation actions, and how they
can recognize such actions when humans perform them during human-robot
collaboration tasks. The multi-modal data in this study consists of videos,
hand motion data, applied forces as represented by the pressure patterns on the
hand, and measurements of the bending of the fingers, collected as human
subjects performed manipulation actions. We investigate two different
approaches. In the first one, we show that multi-modal signal (motion, finger
bending and hand pressure) generated by the action can be decomposed into a set
of primitives that can be seen as its building blocks. These primitives are
used to define 24 multi-modal primitive features. The primitive features can in
turn be used as an abstract representation of the multi-modal signal and
employed for action recognition. In the latter approach, the visual features
are extracted from the data using a pre-trained image classification deep
convolutional neural network. The visual features are subsequently used to
train the classifier. We also investigate whether adding data from other
modalities produces a statistically significant improvement in the classifier
performance. We show that both approaches produce a comparable performance.
This implies that image-based methods can successfully recognize human actions
during human-robot collaboration. On the other hand, in order to provide
training data for the robot so it can learn how to perform object manipulation
actions, multi-modal data provides a better alternative
A comparative study of speculative retrieval for multi-modal data trails: towards user-friendly Human-Vehicle interactions
In the era of growing developments in Autonomous Vehicles, the importance of Human-Vehicle Interaction has become apparent. However, the requirements of retrieving in-vehicle drivers’ multi- modal data trails, by utilizing embedded sensors, have been consid- ered user unfriendly and impractical. Hence, speculative designs, for in-vehicle multi-modal data retrieval, has been demanded for future personalized and intelligent Human-Vehicle Interaction. In this paper, we explore the feasibility to utilize facial recog- nition techniques to build in-vehicle multi-modal data retrieval. We first perform a comprehensive user study to collect relevant data and extra trails through sensors, cameras and questionnaire. Then, we build the whole pipeline through Convolution Neural Net- works to predict multi-model values of three particular categories of data, which are Heart Rate, Skin Conductance and Vehicle Speed, by solely taking facial expressions as input. We further evaluate and validate its effectiveness within the data set, which suggest the promising future of Speculative Designs for Multi-modal Data Retrieval through this approach
Recommended from our members
A multi-modal data resource for investigating topographic heterogeneity in patient-derived xenograft tumors.
Patient-derived xenografts (PDXs) are an essential pre-clinical resource for investigating tumor biology. However, cellular heterogeneity within and across PDX tumors can strongly impact the interpretation of PDX studies. Here, we generated a multi-modal, large-scale dataset to investigate PDX heterogeneity in metastatic colorectal cancer (CRC) across tumor models, spatial scales and genomic, transcriptomic, proteomic and imaging assay modalities. To showcase this dataset, we present analysis to assess sources of PDX variation, including anatomical orientation within the implanted tumor, mouse contribution, and differences between replicate PDX tumors. A unique aspect of our dataset is deep characterization of intra-tumor heterogeneity via immunofluorescence imaging, which enables investigation of variation across multiple spatial scales, from subcellular to whole tumor levels. Our study provides a benchmark data resource to investigate PDX models of metastatic CRC and serves as a template for future, quantitative investigations of spatial heterogeneity within and across PDX tumor models
- …