6 research outputs found
On-line adaptation of the SCHMM parameters based on the segmental quasi-bayes learning for speech recognition
On-line quasi-Bayes adaptation of the mixture coefficients and mean vectors in semicontinuous hidden Markov model (SCHMM) is studied. The viability of the proposed algorithm is confirmed and the related practical issues are addressed in a specific application of on-line speaker adaptation using a 26-word English alphabet vocabulary.published_or_final_versio
A Cognitive and Unsupervised MAP Adaptation Approach to the Recognition of the Focus of Attention from Head Pose
In this paper, the recognition of the visual focus of attention (VFOA) of meeting participants (as defined by their eye gaze direction) from their head pose is addressed. To this end, the head pose observations are modeled using an Hidden Markov Model (HMM) whose hidden states corresponds to the VFOA. The novelties are threefold. First, contrary to previous studies on the topic, in our set-up, the potential VFOA of a person is not restricted to other participants only, but includes environmental targets (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan and tilt (as well) gaze space. Second, the HMM parameters are set by exploiting results from the cognitive science on saccadic eye motion, which allows to predict what the head pose should be given an actual gaze target. Third, an unsupervised parameter adaptation step is proposed which accounts for the specific gazing behaviour of each participant. Using a publicly available corpus of 8 meetings featuring 4 persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision based tracking system
Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition
A theoretical framework for Bayesian adaptive training of the parameters of a discrete hidden Markov model (DHMM) and of a semi-continuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forward-backward MAP (maximum a posteriori) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally efficient segmental quasi-Bayes algorithm for estimating the state-specific mixture coefficients in SCHMM is developed. For estimating the parameters of the prior densities, a new empirical Bayes method based on the moment estimates is also proposed. The MAP algorithms and the prior parameter specification are directly applicable to training speaker adaptive HMMs. Practical issues related to the use of the proposed techniques for HMM-based speaker adaptation are studied. The proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited.published_or_final_versio
Particle-kernel estimation of the filter density in state-space models
Sequential Monte Carlo (SMC) methods, also known as particle filters, are
simulation-based recursive algorithms for the approximation of the a posteriori
probability measures generated by state-space dynamical models. At any given
time , a SMC method produces a set of samples over the state space of the
system of interest (often termed "particles") that is used to build a discrete
and random approximation of the posterior probability distribution of the state
variables, conditional on a sequence of available observations. One potential
application of the methodology is the estimation of the densities associated to
the sequence of a posteriori distributions. While practitioners have rather
freely applied such density approximations in the past, the issue has received
less attention from a theoretical perspective. In this paper, we address the
problem of constructing kernel-based estimates of the posterior probability
density function and its derivatives, and obtain asymptotic convergence results
for the estimation errors. In particular, we find convergence rates for the
approximation errors that hold uniformly on the state space and guarantee that
the error vanishes almost surely as the number of particles in the filter
grows. Based on this uniform convergence result, we first show how to build
continuous measures that converge almost surely (with known rate) toward the
posterior measure and then address a few applications. The latter include
maximum a posteriori estimation of the system state using the approximate
derivatives of the posterior density and the approximation of functionals of
it, for example, Shannon's entropy.
This manuscript is identical to the published paper, including a gap in the
proof of Theorem 4.2. The Theorem itself is correct. We provide an {\em
erratum} at the end of this document with a complete proof and a brief
discussion.Comment: IMPORTANT: This manuscript is identical to the published paper,
including a gap in the proof of Theorem 4.2. The Theorem itself is correct.
We provide an erratum at the end of this document. Published at
http://dx.doi.org/10.3150/13-BEJ545 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Automatic Person Verification Using Speech and Face Information
Interest in biometric based identification and verification systems has increased considerably over the last decade. As an example, the shortcomings of security systems based on passwords can be addressed through the supplemental use of biometric systems based on speech signals, face images or fingerprints. Biometric recognition can also be applied to other areas, such as passport control (immigration checkpoints), forensic work (to determine whether a biometric sample belongs to a suspect) and law enforcement applications (e.g. surveillance). While biometric systems based on face images and/or speech signals can be useful, their performance can degrade in the presence of challenging conditions. In face based systems this can be in the form of a change in the illumination direction and/or face pose variations. Multi-modal systems use more than one biometric at the same time. This is done for two main reasons -- to achieve better robustness and to increase discrimination power. This thesis reviews relevant backgrounds in speech and face processing, as well as information fusion. It reports research aimed at increasing the robustness of single- and multi-modal biometric identity verification systems. In particular, it addresses the illumination and pose variation problems in face recognition, as well as the challenge of effectively fusing information from multiple modalities under non-ideal conditions