2,543 research outputs found
Employing Emotion Cues to Verify Speakers in Emotional Talking Environments
Usually, people talk neutrally in environments where there are no abnormal
talking conditions such as stress and emotion. Other emotional conditions that
might affect people talking tone like happiness, anger, and sadness. Such
emotions are directly affected by the patient health status. In neutral talking
environments, speakers can be easily verified, however, in emotional talking
environments, speakers cannot be easily verified as in neutral talking ones.
Consequently, speaker verification systems do not perform well in emotional
talking environments as they do in neutral talking environments. In this work,
a two-stage approach has been employed and evaluated to improve speaker
verification performance in emotional talking environments. This approach
employs speaker emotion cues (text-independent and emotion-dependent speaker
verification problem) based on both Hidden Markov Models (HMMs) and
Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is
comprised of two cascaded stages that combines and integrates emotion
recognizer and speaker recognizer into one recognizer. The architecture has
been tested on two different and separate emotional speech databases: our
collected database and Emotional Prosody Speech and Transcripts database. The
results of this work show that the proposed approach gives promising results
with a significant improvement over previous studies and other approaches such
as emotion-independent speaker verification approach and emotion-dependent
speaker verification approach based completely on HMMs.Comment: Journal of Intelligent Systems, Special Issue on Intelligent
Healthcare Systems, De Gruyter, 201
Language Identification Using Visual Features
Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech
Speaker recognition using frequency filtered spectral energies
The spectral parameters that result from filtering the
frequency sequence of log mel-scaled filter-bank energies
with a simple first or second order FIR filter have proved
to be an efficient speech representation in terms of both
speech recognition rate and computational load. Recently,
the authors have shown that this frequency filtering can
approximately equalize the cepstrum variance enhancing
the oscillations of the spectral envelope curve that are
most effective for discrimination between speakers. Even
better speaker identification results than using melcepstrum
have been obtained on the TIMIT database,
especially when white noise was added. On the other
hand, the hybridization of both linear prediction and
filter-bank spectral analysis using either cepstral
transformation or the alternative frequency filtering has
been explored for speaker verification. The combination
of hybrid spectral analysis and frequency filtering, that
had shown to be able to outperform the conventional
techniques in clean and noisy word recognition, has yield
good text-dependent speaker verification results on the
new speaker-oriented telephone-line POLYCOST
database.Peer ReviewedPostprint (published version
- …