1,349,261 research outputs found
Voice integrated systems
The program at Naval Air Development Center was initiated to determine the desirability of interactive voice systems for use in airborne weapon systems crew stations. A voice recognition and synthesis system (VRAS) was developed and incorporated into a human centrifuge. The speech recognition aspect of VRAS was developed using a voice command system (VCS) developed by Scope Electronics. The speech synthesis capability was supplied by a Votrax, VS-5, speech synthesis unit built by Vocal Interface. The effects of simulated flight on automatic speech recognition were determined by repeated trials in the VRAS-equipped centrifuge. The relationship of vibration, G, O2 mask, mission duration, and cockpit temperature and voice quality was determined. The results showed that: (1) voice quality degrades after 0.5 hours with an O2 mask; (2) voice quality degrades under high vibration; and (3) voice quality degrades under high levels of G. The voice quality studies are summarized. These results were obtained with a baseline of 80 percent recognition accuracy with VCS
Jitter and Shimmer measurements for speaker diarization
Jitter and shimmer voice quality features have been successfully
used to characterize speaker voice traits and detect voice pathologies.
Jitter and shimmer measure variations in the fundamental frequency
and amplitude of speaker's voice, respectively. Due to their nature, they can be used to assess differences between speakers. In this paper, we investigate the usefulness of these voice quality features in the task of speaker diarization. The combination of voice quality features with the conventional spectral features, Mel-Frequency Cepstral Coefficients (MFCC), is addressed in the framework of Augmented Multiparty Interaction (AMI) corpus, a multi-party and spontaneous speech set of recordings. Both sets of features are independently modeled using mixture of Gaussians and fused together at the score likelihood level. The experiments carried out on the AMI corpus show that incorporating jitter and shimmer measurements to the baseline spectral features decreases the diarization error rate in most of the recordings.Peer ReviewedPostprint (published version
Voice quality estimation in combined radio-VoIP networks for dispatching systems
The voice quality modelling assessment and planning field is deeply and widely theoretically and practically mastered for common voice communication systems, especially for the public fixed and mobile telephone networks including Next Generation Networks (NGN - internet protocol based networks). This article seeks to contribute voice quality modelling assessment and planning for dispatching communication systems based on Internet Protocol (IP) and private radio networks. The network plan, correction in E-model calculation and default values for the model are presented and discussed
音声モーフィングにおける基準点付与の自動化
Automatic reference point placement method for voice morphing is reported in this paper. Voice morphing is one of fundamental voice editing methods to blend feature vector sequences of two voices based on corresponding reference points. Reference points are basically assigned by hands, and depends on the quality of voice morphing output. Moreover, assigning reference points is a time-consuming task. The proposed method realizes to assign reference points on spectrogram in time- and frequency-domain automatically based on temporal decomposition (TD) and line spectral frequency (LSF). As results of two-speakers’ voice morphing, the proposed method was worked well by using voice and its transcription as inputs
- …
