8,016 research outputs found
Voice data entry in air traffic control
Several of the keyboard data languages were tabulated and analyzed. The key language chosen as a test vehicle was that used by the nonradar or flight data controllers. This application was undertaken to minimize effort in a cost efficient way and with less research and development
Real-time interactive speech technology at Threshold Technology, Incorporated
Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed
The SJTU System for Short-duration Speaker Verification Challenge 2021
This paper presents the SJTU system for both text-dependent and
text-independent tasks in short-duration speaker verification (SdSV) challenge
2021. In this challenge, we explored different strong embedding extractors to
extract robust speaker embedding. For text-independent task, language-dependent
adaptive snorm is explored to improve the system performance under the
cross-lingual verification condition. For text-dependent task, we mainly focus
on the in-domain fine-tuning strategies based on the model pre-trained on
large-scale out-of-domain data. In order to improve the distinction between
different speakers uttering the same phrase, we proposed several novel
phrase-aware fine-tuning strategies and phrase-aware neural PLDA. With such
strategies, the system performance is further improved. Finally, we fused the
scores of different systems, and our fusion systems achieved 0.0473 in Task1
(rank 3) and 0.0581 in Task2 (rank 8) on the primary evaluation metric.Comment: Published by Interspeech 202
Robust ASR using Support Vector Machines
The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units.
In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad
- …