1,224 research outputs found

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

    A Comparison of Front-Ends for Bitstream-Based ASR over IP

    Get PDF
    Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses. In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223–233. A. Gallardo-Antolín, C. Pelàez-Moreno, F. Díaz-de-María, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591–604. C. Peláez-Moreno, A. Gallardo-Antolín, F. Díaz-de-María, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209–218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare ‘pseudocepstrum’ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195–199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad

    An Application of SVM to Lost Packets Reconstruction in Voice-Enabled Services

    Get PDF
    Voice over IP (VoIP) is becoming very popular due to the huge range of services that can be implemented by integrating different media (voice, audio, data, etc.). Besides, voice-enabled interfaces for those services are being very actively researched. Nevertheless the impoverishment of voice quality due to packet losses severely affects the speech recognizers supporting those interfaces ([8]). In this paper, we have compared the usual lost packets reconstruction method with an SVM-based one that outperforms previous results

    A Subvector-Based Error Concealment Algorithm for Speech Recognition over Mobile Networks

    Get PDF

    Improving recognition accuracy on CVSD speech under mismatched conditions

    Get PDF
    Emerging technology in mobile communications is seeing increasingly high acceptance as a preferred choice for last-mile communication. There have been a wide range of techniques to achieve signal compression to suit to the smaller bandwidths available on mobile communication channels; but speech recognition methods have seen success mostly only in controlled speech environments. However, designing of speech recognition systems for mobile communications is crucial in order to provide voice enabled command and control and for applications like Mobile Voice Commerce. Continuously Variable Slope Delta (CVSD) modulation, a technique for low bitrate coding of speech, has been in use particularly in military wireless environments for over 30 years, and is now also adopted by BlueTooth. CVSD is particularly suitable for Internet and mobile environments due to its robustness against transmission errors, and simplicity of implementation and the absence of a need for synchronization. In this paper, we study some characteristics of the CVSD speech in the context of robust recognition of compressed speech, and present two methods of improving the recognition accuracy in Automatic Speech Recognition (ASR) systems. We study the characteristics of the features extracted for ASR and how they relate to the corresponding features computed from Pulse Coded Modulation (PCM) speech and apply this relation to correct the CVSD features to improve recognition accuracy. Secondly we show that the ASR done on bit-streams directly, gives a good recognition accuracy and when combined with our approach gives a better accuracy

    Recognizing GSM Digital Speech

    Get PDF
    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
    • …
    corecore