186 research outputs found

    A Comparison of Front-Ends for Bitstream-Based ASR over IP

    Get PDF
    Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses. In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223–233. A. Gallardo-Antolín, C. Pelàez-Moreno, F. Díaz-de-María, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591–604. C. Peláez-Moreno, A. Gallardo-Antolín, F. Díaz-de-María, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209–218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare ‘pseudocepstrum’ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195–199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

    Recognizing GSM Digital Speech

    Get PDF
    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen

    Robust Distributed Speech Recognition Using Auditory Modelling

    Get PDF

    IMPROVING THE AUTOMATIC RECOGNITION OF DISTORTED SPEECH

    Get PDF
    Automatic speech recognition has a wide variety of uses in this technological age, yet speech distortions present many difficulties for accurate recognition. The research presented provides solutions that counter the detrimental effects that some distortions have on the accuracy of automatic speech recognition. Two types of speech distortions are focused on independently. They are distortions due to speech coding and distortions due to additive noise. Compensations for both types of distortion resulted in decreased recognition error.Distortions due to the speech coding process are countered through recognition of the speech directly from the bitstream, thus eliminating the need for reconstruction of the speech signal and eliminating the distortion caused by it. There is a relative difference of 6.7% between the recognition error rate of uncoded speech and that of speech reconstructed from MELP encoded parameters. The relative difference between the recognition error rate for uncoded speech and that of encoded speech recognized directly from the MELP bitstream is 3.5%. This 3.2 percentage point difference is equivalent to the accurate recognition of an additional 334 words from the 12,863 words spoken.Distortions due to noise are offset through appropriate modification of an existing noise reduction technique called minimum mean-square error log spectral amplitude enhancement. A relative difference of 28% exists between the recognition error rate of clean speech and that of speech with additive noise. Applying a speech enhancement front-end reduced this difference to 22.2%. This 5.8 percentage point difference is equivalent to the accurate recognition of an additional 540 words from the 12,863 words spoken

    Contribuciones al reconocimiento robusto de habla en redes de comunicaciones mediante transparametrizaciĂłn

    Get PDF
    La creciente influencia de las redes de comunicaciones en todos los åmbitos de la vida moderna hace que cada vez sean mås los servicios que se ofrecen a través de ellas, y dado que la comunicación oral es la forma mås natural de comunicación humana, las tecnologías del habla juegan un rol importante en nuestra sociedad. Por este motivo, en esta tesis planteamos una serie de contribuciones al reconocimiento de habla en entornos de redes de comunicaciones, utilizando la técnica reconocimiento mediante transparametrización (RMT) sobre los dos tipos de redes que mås cobertura tienen hoy en día: Internet y la telefonía celular. En particular, mejoramos la robustez ya demostrada de la técnica RMT frente a la distorsión por codificación y los errores de transmisión, y extendemos el anålisis a casos con ruido de ambiente. En primer lugar, proponemos un procedimiento mejorado de estimación de la energía. En segundo lugar, aplicamos una técnica complementaria al RMT consistente en un filtrado del espectro de modulación, demostrando su eficacia en el entorno Internet. Ademås, y específicamente para el entorno UMTS proponemos una extensión de paråmetros fundamentada en la protección que realiza el codificador de canal normativo y que consigue hacer un uso eficaz de los paråmetros mås protegidos por el codificador de canal, en beneficio de la robustez del sistema de reconocimiento. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Nowadays, the modern communication networks play an outstanding role in our everyday life and the number of services offered through them is continuously increasing. As the interfaces to these services become more natural, they tend to embed speech technologies so that the human-to-machine communication mimics (to some extent) the human-to-human communication. In this context, this thesis tackles the problem of automatic speech recognition (ASR) in communication-centered environments. In particular, our contributions focus on the bitstream-based approach to ASR, which has already proved to be robust, in two of the most relevant communication scenarios: Internet and universal mobile telecommunication system (UMTS) networks. In this thesis we propose some techniques to improve the robustness of the ASR systems against the distortions resulting from the source coding and the transmission errors. For the voice over IP scenario, we propose an improved method for energy estimation and an additional technique based on filtering the modulation spectrum so that we are able to jointly deal with communication-related distortions and background noise. For the UMTS scenario, besides an improved energy estimation method, in this thesis we propose an extended feature vector that relies on the unequal error protection mechanism implemented in the channel codec. This extended feature vector makes an effective use of the most protected parameters in the bitstream to provide the ASR system with an enhanced robustness

    Advancing the use of geographic information systems, numerical and physical models for the planning of managed aquifer recharge schemes

    Get PDF
    Global change is a major threat to local groundwater resources. Climate change and population growth are factors that directly or indirectly augment the increasing uptake of groundwater resources. To outbalance the pressure on aquifers, managed aquifer recharge (MAR) schemes are increasingly being implemented. They enable the subsurface storage of surplus water for times of high demand. The complexity of MAR schemes makes their planning and implementation multifaceted and requires a comprehensive assessment of the local hydrogeological and hydrogeochemical conditions. Despite the fact that MAR is a widely used technique, its implementation is not well regulated and comprehensive planning and design guidelines are rare. The use of supporting tools, such as numerical and physical models or geographic information systems (GIS), is rising for MAR planning but their scope and requirements for application are rarely reflected in the available MAR guidelines. To depict the application potential and the advantages and disadvantages of the tools for surface infiltration MAR planning, this thesis comprises reviews on the past use of the tools as well as suggestions to improve their applicability for MAR planning. GIS is not mentioned by most MAR guidelines as a planning tool even though it is increasingly being used for MAR mapping. Through a review of GIS-based MAR suitability studies, this thesis shows that the MAR mapping process could be standardized by using the often-applied approach of constraint mapping, suitability mapping by using pairwise comparison for weight assignment and weighted linear combination as a decision rule, and a subsequent sensitivity analysis. Standardizing the methodology would increase the reliability and comparability of MAR maps due to the common methodological approach. Thus, the proposed standard methodology was incorporated into a web GIS that simplifies MAR mapping through a pre-defined workflow. Numerical models are widely used for the assessment of MAR schemes and are included into some MAR planning guidelines. However, only a few studies were found that utilized vadose zone models for the planning and design of MAR schemes. In this thesis, a review and a subsequent case study highlight that numerical modelling has many assets, such as monitoring network design or infiltration scenario planning, that make its utilization during the MAR planning phase worthwhile. Consequently, this study advocates the use of vadose zone models for MAR planning by showing their potential areas of application as well as their uncertainties that need to be regarded carefully during modelling. Physical models used for MAR planning are typically field or pilot sites, as some MAR legislation requests pilot sites as part of the preliminary assessment. Laboratory experiments are used less often and are mostly restricted to the analysis of very specific issues, such as clogging. This thesis takes on the issue of scaling laboratory results to the field scale by comparing results from three physical models of different scales and dimensionality. The results indicate that preferential flow paths, air entrapment and boundary influence limit the quantitative validity of laboratory experiments. The use of 3D tanks instead of 1D soil columns and the application of statistical indicators are means to increase the representativeness of laboratory measurements. Nevertheless, physical models have the potential to improve MAR planning in terms of detailed process assessment, scenario and sensitivity analyses. All tools discussed in this thesis have their merits for MAR scheme planning and should be advocated better in MAR guidelines by depicting their application potential, advantages and disadvantages. The information accumulated in this thesis is a step towards an advanced use of supporting tools for the planning and design of MAR schemes.:1 Introduction 1.1 Motivation 1.2 Objectives 1.3 Structure of the thesis 2 Status quo of the planning process of MAR schemes 2.1 Guidance documents on general MAR planning 2.2 Application of GIS, numerical and physical models for MAR planning 2.3 Planning of surface infiltration schemes 3 Using GIS for the planning of MAR schemes 3.1 Implications from GIS-MCDA studies for MAR mapping 3.2 Development of web tools for MAR suitability mapping 4 Using numerical models for the planning of MAR schemes 4.1 Review on the use of numerical models for the design and optimization of MAR schemes 4.2 Planning a small-scale MAR scheme through vadose zone modelling 5 Using physical models for the planning of MAR schemes 5.1 Design of the experimental study 5.2 Comparison of three different physical models for MAR planning 6 Discussion and research perspectives 7 Bibliography 8 AppendixDer globale Wandel stellt eine große Bedrohung fĂŒr die lokalen Grundwasserressourcen dar. Klimawandel und Bevölkerungswachstum sind Faktoren, die, direkt oder indirekt, die zunehmende Nutzung von Grundwasserressourcen verstĂ€rken. Um diesen Druck auf die Grundwasserleiter auszugleichen, werden verstĂ€rkt Maßnahmen zur gezielten Grundwasserneubildung (managed aquifer recharge = MAR) durchgefĂŒhrt. Dies ermöglicht die unterirdische Speicherung von ĂŒberschĂŒssigem Wasser fĂŒr Zeiten hohen Bedarfs. Die KomplexitĂ€t von MAR-Anlagen macht ihre Planung und Umsetzung kompliziert und erfordert eine umfassende Bewertung der lokalen hydrogeologischen und hydrogeochemischen Bedingungen. Trotz der weltweiten Implementierung von MAR ist dessen Planung wenig reguliert. Umfassende Planungs- und Gestaltungsrichtlinien sind rar. Der Einsatz unterstĂŒtzender Werkzeuge, wie numerischer und physikalischer Modelle oder Geoinformationssysteme (GIS), nimmt bei der MAR-Planung zu, aber ihre Einsatzmöglichkeiten und ihre Anforderungen an die Anwendung spiegeln sich selten in den verfĂŒgbaren MAR-Richtlinien wider. Um das Anwendungspotential und die Vor- und Nachteile der Werkzeuge fĂŒr die MAR-Planung darzustellen, wurden fĂŒr diese Arbeit Recherchen ĂŒber den bisherigen Einsatz der Werkzeuge durchgefĂŒhrt. ZusĂ€tzlich wurden VorschlĂ€ge zur Erhöhung ihrer Anwendbarkeit fĂŒr die MAR Planung gemacht. Der Schwerpunkt lag dabei auf OberflĂ€cheninfiltrationsverfahren. GIS wird in keiner MAR-Richtlinie als Planungsinstrument erwĂ€hnt, obwohl es zunehmend fĂŒr die MAR-Kartierung eingesetzt wird. Eine Recherche ĂŒber GIS-basierte MAR-Eignungsstudien zeigte, dass der MAR-Kartierungsprozess standardisiert werden kann mittels des oft genutzten Ansatzes: initiales Ausschneiden von Gebieten, welche Restriktionen unterliegen, dem folgend die Eignungskartierung mittels Paarvergleich fĂŒr die Wichtung der GIS-Karten und der gewichteten Linearkombination als Entscheidungsregel, sowie eine abschließende SensitivitĂ€tsanalyse. Die Standardisierung der Methodik könnte die ZuverlĂ€ssigkeit und Vergleichbarkeit von MAR-Karten aufgrund des gemeinsamen methodischen Ansatzes erhöhen. Daher wurde die standardisierte Methodik in ein Web-GIS integriert, das ĂŒber einen definierten Workflow die MAR-Kartierung vereinfacht. Numerische Modelle werden hĂ€ufig fĂŒr die Beurteilung von MAR-Systemen verwendet und sind in einigen MAR-Planungsrichtlinien ausgewiesen. Es wurden jedoch nur wenige Studien gefunden, die die Modelle der ungesĂ€ttigten Zone fĂŒr die Planung und Gestaltung von MAR Standorten verwendeten. Die in dieser Arbeit durchgefĂŒhrte Literaturrecherche und eine darauf aufbauende Fallstudie zeigen, dass die numerische Modellierung viele Vorteile bietet, wie z. B. beim Design eines Monitoring-Netzwerkes oder bei der Planung von Infiltrationsszenarien. Physikalische Modelle, die fĂŒr die MAR-Planung verwendet werden, sind meist Feld- oder Pilotversuche, da einige MAR-Gesetzgebungen Pilotstandorte im Rahmen der Vorabbewertung verlangen. Laborexperimente werden seltener eingesetzt und beschrĂ€nken sich meist auf die Analyse sehr spezifischer Fragestellungen, wie z.B. der Kolmatierung. Diese Arbeit beschĂ€ftigt sich mit der Skalierbarkeit von Laborergebnissen auf die Feldskale, indem sie Ergebnisse aus drei physikalischen Modellen verschiedener MaßstĂ€be und Dimensionen vergleicht. Die Ergebnisse deuten darauf hin, dass Makroporen, LufteinschlĂŒsse und der Einfluss der Randbedingungen die quantitative Aussagekraft von Laborversuchen einschrĂ€nken. Der Einsatz von 3D-Tanks anstelle von 1D-BodensĂ€ulen oder von statistischen Indikatoren ist ein Mittel zur Erhöhung der ReprĂ€sentativitĂ€t von Labormessungen. Nichtsdestotrotz hat die Anwendung physikalischerModelle das Potenzial, die MAR-Planung in Bezug auf detaillierte Prozessbewertung, Szenarien und SensitivitĂ€tsanalysen zu unterstĂŒtzen. Alle beschriebenen Instrumente haben ihre VorzĂŒge bei der Bewertung von MAR-Anlagen und sollten in MAR-Richtlinien detaillierter berĂŒcksichtigt werden, indem ihr Anwendungspotenzial, ihre Vor- und ihre Nachteile dargestellt werden. Die fĂŒr diese Arbeit zusammengestellten Informationen sind ein Schritt zur Förderung der beschriebenen Planungsinstrumente fĂŒr die Planung und Gestaltung von MAR-Anlagen.:1 Introduction 1.1 Motivation 1.2 Objectives 1.3 Structure of the thesis 2 Status quo of the planning process of MAR schemes 2.1 Guidance documents on general MAR planning 2.2 Application of GIS, numerical and physical models for MAR planning 2.3 Planning of surface infiltration schemes 3 Using GIS for the planning of MAR schemes 3.1 Implications from GIS-MCDA studies for MAR mapping 3.2 Development of web tools for MAR suitability mapping 4 Using numerical models for the planning of MAR schemes 4.1 Review on the use of numerical models for the design and optimization of MAR schemes 4.2 Planning a small-scale MAR scheme through vadose zone modelling 5 Using physical models for the planning of MAR schemes 5.1 Design of the experimental study 5.2 Comparison of three different physical models for MAR planning 6 Discussion and research perspectives 7 Bibliography 8 Appendi
    • 

    corecore