12,846 research outputs found
Robust ASR using Support Vector Machines
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units.
In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad
SVMs for Automatic Speech Recognition: a Survey
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research
Automatic skin segmentation for gesture recognition combining region and support vector machine active learning
Skin segmentation is the cornerstone of many applications such as gesture recognition, face detection, and objectionable image filtering. In this paper, we attempt to address the skin segmentation problem for gesture recognition. Initially, given a gesture video sequence, a generic skin model is applied to the first couple of frames to automatically collect the training data. Then, an SVM classifier based on active learning is used to identify the skin pixels. Finally, the results are improved by incorporating region segmentation. The proposed algorithm is fully automatic and adaptive to different signers. We have tested our approach on the ECHO database. Comparing with other existing algorithms, our method could achieve better performance
Discovery and recognition of motion primitives in human activities
We present a novel framework for the automatic discovery and recognition of
motion primitives in videos of human activities. Given the 3D pose of a human
in a video, human motion primitives are discovered by optimizing the `motion
flux', a quantity which captures the motion variation of a group of skeletal
joints. A normalization of the primitives is proposed in order to make them
invariant with respect to a subject anatomical variations and data sampling
rate. The discovered primitives are unknown and unlabeled and are
unsupervisedly collected into classes via a hierarchical non-parametric Bayes
mixture model. Once classes are determined and labeled they are further
analyzed for establishing models for recognizing discovered primitives. Each
primitive model is defined by a set of learned parameters.
Given new video data and given the estimated pose of the subject appearing on
the video, the motion is segmented into primitives, which are recognized with a
probability given according to the parameters of the learned models.
Using our framework we build a publicly available dataset of human motion
primitives, using sequences taken from well-known motion capture datasets. We
expect that our framework, by providing an objective way for discovering and
categorizing human motion, will be a useful tool in numerous research fields
including video analysis, human inspired motion generation, learning by
demonstration, intuitive human-robot interaction, and human behavior analysis
Joint segmentation of multivariate time series with hidden process regression for human activity recognition
The problem of human activity recognition is central for understanding and
predicting the human behavior, in particular in a prospective of assistive
services to humans, such as health monitoring, well being, security, etc. There
is therefore a growing need to build accurate models which can take into
account the variability of the human activities over time (dynamic models)
rather than static ones which can have some limitations in such a dynamic
context. In this paper, the problem of activity recognition is analyzed through
the segmentation of the multidimensional time series of the acceleration data
measured in the 3-d space using body-worn accelerometers. The proposed model
for automatic temporal segmentation is a specific statistical latent process
model which assumes that the observed acceleration sequence is governed by
sequence of hidden (unobserved) activities. More specifically, the proposed
approach is based on a specific multiple regression model incorporating a
hidden discrete logistic process which governs the switching from one activity
to another over time. The model is learned in an unsupervised context by
maximizing the observed-data log-likelihood via a dedicated
expectation-maximization (EM) algorithm. We applied it on a real-world
automatic human activity recognition problem and its performance was assessed
by performing comparisons with alternative approaches, including well-known
supervised static classifiers and the standard hidden Markov model (HMM). The
obtained results are very encouraging and show that the proposed approach is
quite competitive even it works in an entirely unsupervised way and does not
requires a feature extraction preprocessing step
- …