15 research outputs found

    Increased diphone recognition for an Afrikaans TTS system

    Get PDF
    In this paper we discuss the implementation of an Afrikaans TTS system that is based on diphones. Using diphones makes the system flexible but presents other challenges. A previous effort to design an Afrikaans TTS system was done by SUN. They implemented a TTS system based on full words. A full word based TTS system produces more natural sounding speech than when the system is designed using other techniques. The disadvantage of using full words is that it lacks flexibility. The baseline system was build using the Festival Speech Synthesis System. Problems occurred in the baseline due to the mislabeling of diphones and the diphone index. The system was improved by manually labeling the diphones using Wavesurfer, and by changing the diphone index. Wavelength comparison tests were done on the diphone index to show how much of the diphones are recognized during synthesis. For the diphones tested results show an average improvement of 38% in the recognition of diphones compared to the baseline. These improvements improve the overall quality of the system

    Evaluating microphone arrays for a speaker identification task

    Get PDF
    Abstract—Microphone array systems have been an area of active research for several years. The potential for high quality hands-free speech acquisition in noisy and reflecting environments makes microphone arrays an attractive alternative to conventional close-talking microphones. The signal-enhancement and sourcelocation capabilities of microphone arrays make them applicable to a variety of tasks including teleconferencing, speaker tracking, speaker recognition and speech recognition. In this paper we evaluate techniques for setting up microphone arrays for speaker identification. We propose the use of an active noise canceling beamformer based on the generalized sidelobe canceller (GSC) beamformer. Significant improvements in identification rate are achieved using this method compared to other beamforming techniques investigated in this paper

    Preparation of Deaf end-users and the softbridge for semi-automated relay trials

    Get PDF
    Following on the development of several prototypes, we have built a semi-automated Deaf Telephony prototype on the SoftBridge platform. This prototype relays text and speech between Deaf users on the Internet and hearing users on the telephone system. Previous work with a pilot trial in the laboratory revealed several opportunities for enhancement. We added a Wizard of Oz (WoOz) to replace the poorly performing automatic speech recognition functionality as well as H.323 breakout, more extensive logging and advanced call initiation functionality. In order to trial the current prototype, we initiated an Information and Communication Technology (ICT) training programme with the Deaf Community of Cape Town. Twenty Deaf users participated in the training. In addition to the training, much baseline user data was collected to give an indication of how Deaf users communicate with hearing users as well as how familiar they are with ICT devices and services. The work for the rest of this year requires us to recruit and train a WoOz operator. Subsequent trials will essentially consist of monthly cycles of prototype introduction, training, automated metric and log collection, user feedback and then feature enhancement. Linguistic analyses of the text output of the Deaf users will be analyzed. We hope to refine the SoftBridge prototype to fit the needs of the Deaf and hearing users, from both technical and social viewpoints. We expect that these iterative cycles will continue for some time and will teach us many lessons regarding multi-modal semi-synchronous IP-based communications systems.Telkom, Siemens, THRIP, SANPA

    AVONET: morphological, ecological and geographical data for all birds

    Get PDF
    Functional traits offer a rich quantitative framework for developing and testing theories in evolutionary biology, ecology and ecosystem science. However, the potential of functional traits to drive theoretical advances and refine models of global change can only be fully realised when species‐level information is complete. Here we present the AVONET dataset containing comprehensive functional trait data for all birds, including six ecological variables, 11 continuous morphological traits, and information on range size and location. Raw morphological measurements are presented from 90,020 individuals of 11,009 extant bird species sampled from 181 countries. These data are also summarised as species averages in three taxonomic formats, allowing integration with a global phylogeny, geographical range maps, IUCN Red List data and the eBird citizen science database. The AVONET dataset provides the most detailed picture of continuous trait variation for any major radiation of organisms, offering a global template for testing hypotheses and exploring the evolutionary origins, structure and functioning of biodiversity

    Experiments On A Parametric Nonlinear Spectral Warping For An Hmm-Based Speech Recognizer

    No full text
    This paper is concerned with the search for an optimal feature-set for a speech recognition system. A better acoustic feature analysis that suitably enhances the semantic information in a consistent fashion can reduce raw-score (no grammar) error rate significantly. A simple two-dimensional parameterized feature -set is proposed. The feature-set is compared against a standard mel-cepstrum, LPC-based feature-set in talker-independent, connected-alphadigit HMM-based recognizer. The results show that a particular combination of parameters yields a significantly lower error rate than the baseline mel-cepstrum LPC-based feature-set. 1. INTRODUCTION Recent research on speech-recognition systems has tended to focus on the back-end of an automatic speech recognition (ASR) system. Although this has been helpful, as indicated by improvements in recognition performance and shortening of training times, we are equally convinced that research on the front-end of an ASR system can improve performan..

    Using ABR-service for a call centre application: speaker recognition

    No full text
    Abstract In this paper we look at the performance of speaker identification on speech that has been transported on an ABRtype service. This is simulated for an environment were minimum costs are essential and the packets are not retransmitted but discarded. Earlier work has Evans et. al. [1] has reported the performance on speaker verification with the losses on individual feature-sets of the speech processing system and not packets as will be the case on an ATM network. Our approach is based on packets being lost and not just a 20ms speech frame. Our results show that packet loss is not as detrimental to the performance as other kinds of noises such as from the telephone handset and the network. It is shown that even with a packet loss of about 10% the performance of the system is unaffected for clean speech. For noisy speech a packet loss of 10% reduces the performance of the system by about 40%

    Utterance Dependent Parametric Warping For A Talker-Independent Hmm-Based Recognizer

    No full text
    In an effort to improve recognition performance of talker-independent speech systems, many adaptive methods have been proposed. The methods generally seek to exploit the higher recognition performance rate of talker-dependent systems and extend it to talker-independent systems. This is achieved by some form of placing talkers into several categories, usually using gender or vocal-tract size. In this paper we investigate a similar idea, but categorize each utterance independently. An utterance is processed using several spectral compressions, and the compression with the maximum likelihood is then used to train a better model. For testing, the spectral compression with the maximum likelihood is used to decode the utterance. While the spectral compressions divided the utterances well, this did not translate into significant improvement in performance, and the computational cost increase was significant. 1. INTRODUCTION Research in improving the performance of speech recognition systems ..

    Analysis of LPC/DFT Features for an HMM-based Alphadigit Recognizer

    No full text
    The search for better and more robust performance of speech recognition systems is ongoing. Much of the improvement is likely to come from better acoustic feature analysis. In this letter, the results from a significant experiment are reported; these show how a warped-DFT analysis outperforms an LPC-cepstral analysis in a significant way, supporting results by other researchers for different recognition tasks. An analysis of nasal-letter performance is used to show the development of the warped-DFT feature analysis. Keywords--- Cepstral Features, ANN, HMM. I. Introduction Different types of hidden Markov model (HMM)-based algorithms have been used successfully in speech recognition systems along with artificial neural networks (ANN), dynamic time warping (DTW) and template matching (TM) algorithms. In all these systems, the properties of the feature set play a very crucial role. In this letter, an HMM-based explicit-duration, talker-independent, connected-alphadigit recognizer is use..
    corecore