2,206 research outputs found

    Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

    Get PDF
    The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

    Conversational Turn Length and Fluency Measurement in Aphasia

    Get PDF
    A common assumption regarding fluency is that the difference between a fluent and non-fluent speaker can be easily stated (Poeck, 1989; Gordon, 1998). However, there is no objective and valid measure to determinate the level of a person with aphasia on the fluency continuum. Traditionally, people with aphasia have been classified as fluent or non-fluent following the cognitive criteria. In ecologycal data we find that 7,3 words-per-turn value is a valid measure in Spanish and Catalan to delimit fluent and non-fluent speakers. These results emphasize the importance of the quantitative analysis of fluency in speech in its natural environment. As well, the measure of 7,3 words-per-turn not only can determinate the difference between fluent and non-fluent speaker, but allows the diagnosis of severe fluency deficits as logorrhea or mutism

    Estrategias colaborativas de compensación del déficit lingüístico: la importancia del interlocutor-clave en el indice de participación conversacional

    Get PDF
    In this work we analyze how the conversational performance of the key conversational partners can affect the Index of Conversational Participation of the people with aphasia. From three recordings of the corpus PerLA (Perception, Language and Aphasia) in which two aphasic speakers take part with some different speakers, we indicate the compensatory strategies of the deficit that both types of informants realize, and that results in a colaborative construction of the interaction. RESUMEN: En este trabajo nos planteamos cómo la actuación conversacional de los interlocutores-clave puede afectar al índice de participación conversacional de los hablantes con afasia. A partir de tres grabaciones del corpus PerLA (Percepción, Lenguaje y Afasia) en las que participan dos hablantes afásicas con diferentes interlocutores, señalamos las estrategias compensatorias del déficit que realizan ambos tipos de informante, y que redundan en una construcción colaborativa de la conversación

    A Comparison of Front-Ends for Bitstream-Based ASR over IP

    Get PDF
    Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses. In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223–233. A. Gallardo-Antolín, C. Pelàez-Moreno, F. Díaz-de-María, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591–604. C. Peláez-Moreno, A. Gallardo-Antolín, F. Díaz-de-María, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209–218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare ‘pseudocepstrum’ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195–199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

    El color en la calidad de los edulcorantes de la agroindustria panelera

    Get PDF
    El trabajo tiene como objetivo la determinación y evaluación del color de los derivados de la agroindustria panelera, como medida de control de calidad y aceptación orechazo en el mercado. Para la miel hidrolizada, la panela y el azúcar natural, el color depende de la utilización o no de clarificantes naturales o químicos incorporados en el jugo de caña durante su proceso de fabricación. Se realiza un análisis experimental de determinación de azufre y color a diferentes muestras de los tres productos edulcorantes, utilizando espectrofotómetro y equipo de captación de color Capsure Palette X-rite, respectivamente. Los resultados obtenidos permiten determinar la autenticidad del color de la miel, panela y azúcar natural y establecer el color básico de calidad de los productos. Los colores de amarillo intenso a amarillo pálido con tonos verdosos y oliva, son aquellas que contienen azufre en la composición. Finalmente, se elabora un abanico colorimétrico para edulcorantes de la agroindustria panelera como alternativa de control de calidad basado en el color; para establecerla presencia o no de sustancias químicas prohibidas especialmente del hidrosulfito de sodio

    A Speech Recognizer based on Multiclass SVMs with HMM-Guided Segmentation

    Get PDF
    Automatic Speech Recognition (ASR) is essentially a problem of pattern classification, however, the time dimension of the speech signal has prevented to pose ASR as a simple static classification problem. Support Vector Machine (SVM) classifiers could provide an appropriate solution, since they are very well adapted to high-dimensional classification problems. Nevertheless, the use of SVMs for ASR is by no means straightforward, mainly because SVM classifiers require an input of fixed-dimension. In this paper we study the use of a HMM-based segmentation as a mean to get the fixed-dimension input vectors required by SVMs, in a problem of isolated-digit recognition. Different configurations for all the parameters involved have been tested. Also, we deal with the problem of multi-class classification (as SVMs are initially binary classifers), studying two of the most popular approaches: 1-vs-all and 1-vs-1

    An Application of SVM to Lost Packets Reconstruction in Voice-Enabled Services

    Get PDF
    Voice over IP (VoIP) is becoming very popular due to the huge range of services that can be implemented by integrating different media (voice, audio, data, etc.). Besides, voice-enabled interfaces for those services are being very actively researched. Nevertheless the impoverishment of voice quality due to packet losses severely affects the speech recognizers supporting those interfaces ([8]). In this paper, we have compared the usual lost packets reconstruction method with an SVM-based one that outperforms previous results

    Morphological processing of a dynamic compressive gammachirp filterbank for automatic speech recognition

    Get PDF
    Actas de: VII Jornadas en Tecnología del Habla and III Iberian SLTECH Workshop (IberSPEECH 2012). Madrid, 21-23 noviembre 2012.The Dynamic Compressive Gammachirp is presented for producing auditory-inspired feature extraction in Automatic Speech Recognition. The proposed acoustic features combine spectral subtraction and two-dimensional non-linear filtering technique most usually employed for image processing: morphological filtering. These features have been proven to be more robust to noisy speech than those based on simpler auditory filterbanks like the classical mel-scaled triangular filterbank, the Gammatone filterbank and the passive Gammachirp in a noisy Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Projects No. TEC2008-06382/TEC and No. TEC2011-26807.Publicad

    Oral lichenoid lesions related to contact with dental materials: A literature review

    Get PDF
    Oral lichenoid lesions related to contact are defined as oral-cavity eruptions with an identifiable etiology, and are clinically and histologically similar to oral lichen planus. Within this group are found oral lichenoid lesions related to contact with dental materials (OLLC), the most common being those related to silver amalgam. Currently, it remains difficult to diagnose these lesions due to the clinical and histopathological similarity with oral lichen planus and other oral mucosa lesions of lichenoid characteristics. In the present paper, we carry out an updated review of the tests for, and the different characteristics of OLLC, which may aid the diagnosis. For this review, we made searches in the Pubmed® and Cochrane® databases. Among the literature we found several published papers, from which we have used review papers, case papers, cohort studies, case and control studies, and a meta-analysis study. After carrying out this review, we can conclude that the diagnosis of these lesions is still difficult and controversial. However, there are different aspects in the clinical presentation, pathological study and results obtained when replacing suspect materials, which, when taken together, may be useful when establishing the final diagnosis of OLLC
    corecore