2,820 research outputs found
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Machine learning methods for sign language recognition: a critical review and analysis.
Sign language is an essential tool to bridge the communication gap between normal and hearing-impaired people. However, the diversity of over 7000 present-day sign languages with variability in motion position, hand shape, and position of body parts making automatic sign language recognition (ASLR) a complex system. In order to overcome such complexity, researchers are investigating better ways of developing ASLR systems to seek intelligent solutions and have demonstrated remarkable success. This paper aims to analyse the research published on intelligent systems in sign language recognition over the past two decades. A total of 649 publications related to decision support and intelligent systems on sign language recognition (SLR) are extracted from the Scopus database and analysed. The extracted publications are analysed using bibliometric VOSViewer software to (1) obtain the publications temporal and regional distributions, (2) create the cooperation networks between affiliations and authors and identify productive institutions in this context. Moreover, reviews of techniques for vision-based sign language recognition are presented. Various features extraction and classification techniques used in SLR to achieve good results are discussed. The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. Overall, it is expected that this study will facilitate knowledge accumulation and creation of intelligent-based SLR and provide readers, researchers, and practitioners a roadmap to guide future direction
Temporal superimposed crossover module for effective continuous sign language
The ultimate goal of continuous sign language recognition(CSLR) is to
facilitate the communication between special people and normal people, which
requires a certain degree of real-time and deploy-ability of the model.
However, in the previous research on CSLR, little attention has been paid to
the real-time and deploy-ability. In order to improve the real-time and
deploy-ability of the model, this paper proposes a zero parameter, zero
computation temporal superposition crossover module(TSCM), and combines it with
2D convolution to form a "TSCM+2D convolution" hybrid convolution, which
enables 2D convolution to have strong spatial-temporal modelling capability
with zero parameter increase and lower deployment cost compared with other
spatial-temporal convolutions. The overall CSLR model based on TSCM is built on
the improved ResBlockT network in this paper. The hybrid convolution of
"TSCM+2D convolution" is applied to the ResBlock of the ResNet network to form
the new ResBlockT, and random gradient stop and multi-level CTC loss are
introduced to train the model, which reduces the final recognition WER while
reducing the training memory usage, and extends the ResNet network from image
classification task to video recognition task. In addition, this study is the
first in CSLR to use only 2D convolution extraction of sign language video
temporal-spatial features for end-to-end learning for recognition. Experiments
on two large-scale continuous sign language datasets demonstrate the
effectiveness of the proposed method and achieve highly competitive results.Comment: 10 pages, 7 figure
Dual sticky hierarchical Dirichlet process hidden Markov model and its application to natural language description of motions
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov modle (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. The number of HMMs and the number of topics are both automatically determined. The sticky prior avoids redundant states and makes our HDP-HMM more effective to model multimodal observations. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. The sources and sinks in the scene are learnt by clustering endpoints (origins and destinations of trajectories). The semantic motion regions are learnt using the points in trajectories. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequences of atomic activities. the action represented by the trajectory can be described in natural language in as autometic a way as possible.The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene
Fully Convolutional Networks for Continuous Sign Language Recognition
Continuous sign language recognition (SLR) is a challenging task that
requires learning on both spatial and temporal dimensions of signing frame
sequences. Most recent work accomplishes this by using CNN and RNN hybrid
networks. However, training these networks is generally non-trivial, and most
of them fail in learning unseen sequence patterns, causing an unsatisfactory
performance for online recognition. In this paper, we propose a fully
convolutional network (FCN) for online SLR to concurrently learn spatial and
temporal features from weakly annotated video sequences with only
sentence-level annotations given. A gloss feature enhancement (GFE) module is
introduced in the proposed network to enforce better sequence alignment
learning. The proposed network is end-to-end trainable without any
pre-training. We conduct experiments on two large scale SLR datasets.
Experiments show that our method for continuous SLR is effective and performs
well in online recognition.Comment: Accepted to ECCV202
- …