11,337 research outputs found
Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees
This paper proposes a voice morphing system for people suffering from
Laryngectomy, which is the surgical removal of all or part of the larynx or the
voice box, particularly performed in cases of laryngeal cancer. A primitive
method of achieving voice morphing is by extracting the source's vocal
coefficients and then converting them into the target speaker's vocal
parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping
the coefficients from source to destination. However, the use of the
traditional/conventional GMM-based mapping approach results in the problem of
over-smoothening of the converted voice. Thus, we hereby propose a unique
method to perform efficient voice morphing and conversion based on GMM,which
overcomes the traditional-method effects of over-smoothening. It uses a
technique of glottal waveform separation and prediction of excitations and
hence the result shows that not only over-smoothening is eliminated but also
the transformed vocal tract parameters match with the target. Moreover, the
synthesized speech thus obtained is found to be of a sufficiently high quality.
Thus, voice morphing based on a unique GMM approach has been proposed and also
critically evaluated based on various subjective and objective evaluation
parameters. Further, an application of voice morphing for Laryngectomees which
deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer
Applications Volume 49, Number 21, July 201
Simultaneous Multispeaker Segmentation for Automatic Meeting Recognition
Vocal activity detection is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, and standard vocal activity detection algorithms for close-talk microphones have shown to be ineffective. This is primarily due to the problem of crosstalk, in which a participant’s speech appears on other participants ’ microphones, making it hard to attribute detected speech to its correct speaker. We describe an automatic multichannel segmentation system for meeting recognition, which accounts for both the observed acoustics and the inferred vocal activity states of all participants using joint multi-participant models. Our experiments show that this approach almost completely eliminates the crosstalk problem. Recent improvements to the baseline reduce the development set word error rate, achieved by a state-of-theart multi-pass speech recognition system, by 62 % relative to manual segmentation. We also observe significant performance improvements on unseen data
Military applications of automatic speech recognition and future requirements
An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit
- …