21 research outputs found

    A Comparison of Front-Ends for Bitstream-Based ASR over IP

    Get PDF
    Automatic speech recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurrence of packet losses. In this framework, bitstream-based approaches that obtain the ASR feature vectors directly from the coded bitstream, avoiding the speech decoding process, have been proposed ([S.H. Choi, H.K. Kim, H.S. Lee, Speech recognition using quantized LSP parameters and their transformations in digital communications, Speech Commun. 30 (4) (2000) 223–233. A. Gallardo-Antolín, C. Pelàez-Moreno, F. Díaz-de-María, Recognizing GSM digital speech, IEEE Trans. Speech Audio Process., to appear. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments, IEEE Trans. Speech Audio Process. 10 (8) (2002) 591–604. C. Peláez-Moreno, A. Gallardo-Antolín, F. Díaz-de-María, Recognizing voice over IP networks: a robust front-end for speech recognition on the WWW, IEEE Trans. Multimedia 3(2) (2001) 209–218], among others) to improve the robustness of ASR systems. LSP (Line Spectral Pairs) are the preferred set of parameters for the description of the speech spectral envelope in most of the modern speech coders. Nevertheless, LSP have proved to be unsuitable for ASR, and they must be transformed into cepstrum-type parameters. In this paper we comparatively evaluate the robustness of the most significant LSP to cepstrum transformations in a simulated VoIP (voice over IP) environment which includes two of the most popular codecs used in that network (G.723.1 and G.729) and several network conditions. In particular, we compare ‘pseudocepstrum’ [H.K. Kim, S.H. Choi, H.S. Lee, On approximating Line Spectral Frequencies to LPC cepstral coefficients, IEEE Trans. Speech Audio Process. 8 (2) (2000) 195–199], an approximated but straightforward transformation of LSP into LP cepstral coefficients, with a more computationally demanding but exact one. Our results show that pseudocepstrum is preferable when network conditions are good or computational resources low, while the exact procedure is recommended when network conditions become more adverse.Publicad

    Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

    Full text link
    This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundamental frequency, respectively. We implement the mel-cepstral synthesis filter as a differentiable and GPU-friendly module to enable the acoustic and waveform models in the proposed system to be simultaneously optimized in an end-to-end manner. Experiments show that the proposed system improves speech quality from a baseline system maintaining controllability. The core PyTorch modules used in the experiments will be publicly available on GitHub.Comment: Submitted to ICASSP 202

    CELP and speech enhancement

    Get PDF
    This thesis addresses the intelligibility enhancement of speech that is heard within an acoustically noisy environment. In particular, a realistic target situation of a police vehicle interior, with speech generated from a CELP (codebook-excited linear prediction) speech compression-based communication system, is adopted. The research has centred on the role of the CELP speech compression algorithm, and its transmission parameters. In particular, novel methods of LSP-based (line spectral pair) speech analysis and speech modification are developed and described. CELP parameters have been utilised in the analysis and processing stages of a speech intelligibility enhancement system to minimise additional computational complexity over existing CELP coder requirements. Details are given of the CELP analysis process and its effects on speech, the development of speech analysis and alteration algorithms coexisting with a CELP system, their effects and performance. Both objective and subjective tests have been used to characterize the effectiveness of the analysis and processing methods. Subjective testing of a complete simulation enhancement system indicates its effectiveness under the tested conditions, and is extrapolated to predict real-life performance. The developed system presents a novel integrated solution to the intelligibility enhancement of speech, and can provide a doubling, on average, of intelligibility under the tested conditions of very low intelligibility

    Speaker Identification Using a Combination of Different Parameters as Feature Inputs to an Artificial Neural Network Classifier

    Get PDF
    This paper presents a technique using artificial neural networks (ANNs) for speaker identification that results in a better success rate compared to other techniques. The technique used in this paper uses both power spectral densities (PSDs) and linear prediction coefficients (LPCs) as feature inputs to a self organizing feature map to achieve a better identification performance. Results for speaker identification with different methods are presented and compared

    Evidence of correlation between acoustic and visual features of speech

    Get PDF
    ABSTRACT This paper examines the degree of correlation between lip and jaw configuration and speech acoustics. The lip and jaw positions are characterised by a system of measurements taken from video images of the speaker's face and profile, and the acoustics are represented using line spectral pair parameters and a measure of RMS energy. A correlation is found between the measured acoustic parameters and a linear estimate of the acoustics recovered from the visual data. This correlation exists despite the simplicity of the visual representation and is in rough agreement with correlations measured in earlier work by Yehia et al. using different techniques. However, analysis of the estimation errors suggests that the visual information, as parameterised in our experiment, offers only a weak constraint on the acoustics. Results are discussed from the perspective of models of early audio-visual integration

    A robust low bit rate quad-band excitation LSP vocoder.

    Get PDF
    by Chiu Kim Ming.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 103-108).Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Speech production --- p.2Chapter 1.2 --- Low bit rate speech coding --- p.4Chapter Chapter 2 --- Speech analysis & synthesis --- p.8Chapter 2.1 --- Linear prediction of speech signal --- p.8Chapter 2.2 --- LPC vocoder --- p.11Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11Chapter 2.2.2 --- Spectral envelope representation --- p.15Chapter 2.3 --- Excitation --- p.16Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19Chapter 2.4 --- Multiband excitation --- p.22Chapter 2.5 --- Multiband excitation vocoder --- p.25Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31Chapter 3.1 --- Dual-band excitation --- p.31Chapter 3.2 --- Quad-band excitation --- p.37Chapter 3.3 --- Parameters determination --- p.41Chapter 3.3.1 --- Pitch detection --- p.41Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43Chapter 3.4 --- Excitation generation --- p.47Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51Chapter 4.2 --- Coding of excitation parameters --- p.58Chapter 4.2.1 --- Coding of pitch value --- p.58Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60Chapter 4.3 --- Spectral envelope estimation and coding --- p.62Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63Chapter 4.3.3 --- Coding of LSP frequencies --- p.68Chapter 4.3.4 --- Coding of gain value --- p.77Chapter Chapter 5 --- Performance evaluation --- p.80Chapter 5.1 --- Spectral analysis --- p.80Chapter 5.2 --- Subjective listening test --- p.93Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96Chapter Chapter 6 --- Conclusions and discussions --- p.99References --- p.103Appendix A Subroutine of pitch detection --- p.A-I - A-IIIAppendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-VAppendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-IIAppendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-IIIAppendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    Parametric synthesis of expressive speech

    Get PDF
    U disertaciji su opisani postupci sinteze ekspresivnog govora korišćenjem parametarskih pristupa. Pokazano je da se korišćenjem dubokih neuronskih mreža dobijaju bolji rezultati nego korišćenjem skrivenix Markovljevih modela. Predložene su tri nove metode za sintezu ekspresivnog govora korišćenjem dubokih neuronskih mreža: metoda kodova stila, metoda dodatne obuke mreže i arhitektura zasnovana na deljenim skrivenim slojevima. Pokazano je da se najbolji rezultati dobijaju korišćenjem metode kodova stila. Takođe je predložana i nova metoda za transplantaciju emocija/stilova bazirana na deljenim skrivenim slojevima. Predložena metoda ocenjena je bolje od referentne metode iz literature.In this thesis methods for expressive speech synthesis using parametric approaches are presented. It is shown that better results are achived with usage of deep neural networks compared to synthesis based on hidden Markov models. Three new methods for synthesis of expresive speech using deep neural networks are presented: style codes, model re-training and shared hidden layer architecture. It is shown that best results are achived by using style code method. The new method for style transplantation based on shared hidden layer architecture is also proposed. It is shown that this method outperforms referent method from literature

    Split algorithms for LMS adaptive systems.

    Get PDF
    by Ho King Choi.Thesis (Ph.D.)--Chinese University of Hong Kong, 1991.Includes bibliographical references.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Adaptive Filter and Adaptive System --- p.1Chapter 1.2 --- Applications of Adaptive Filter --- p.4Chapter 1.2.1 --- System Identification --- p.4Chapter 1.2.2 --- Noise Cancellation --- p.6Chapter 1.2.3 --- Echo Cancellation --- p.8Chapter 1.2.4 --- Speech Processing --- p.10Chapter 1.3 --- Chapter Summary --- p.14References --- p.15Chapter 2. --- Adaptive Filter Structures and Algorithms --- p.17Chapter 2.1 --- Filter Structures for Adaptive Filtering --- p.17Chapter 2.2 --- Adaptation Algorithms --- p.22Chapter 2.2.1 --- The LMS Adaptation Algorithm --- p.24Chapter 2.2.1.1 --- Convergence Analysis --- p.28Chapter 2.2.1.2 --- Steady State Performance --- p.33Chapter 2.2.2 --- The RLS Adaptation Algorithm --- p.35Chapter 2.3 --- Chapter Summary --- p.39References --- p.41Chapter 3. --- Parallel Split Adaptive System --- p.45Chapter 3.1 --- Parallel Form Adaptive Filter --- p.45Chapter 3.2 --- Joint Process Estimation with a Split-Path Adaptive Filter --- p.49Chapter 3.2.1 --- The New Adaptive System Identification Configuration --- p.53Chapter 3.2.2 --- Analysis of the Split-Path System Modeling Structure --- p.57Chapter 3.2.3 --- Comparison with the Non-Split Configuration --- p.63Chapter 3.2.4 --- Some Notes on Even Filter Order Case --- p.67Chapter 3.2.5 --- Simulation Results --- p.70Chapter 3.3 --- Autoregressive Modeling with a Split-Path Adaptive Filter --- p.75Chapter 3.3.1 --- The Split-Path Adaptive Filter for AR Modeling --- p.79Chapter 3.3.2 --- Analysis of the Split-Path AR Modeling Structure --- p.84Chapter 3.3.3 --- Comparison with Traditional AR Modeling System --- p.89Chapter 3.3.4 --- Selection of Step Sizes --- p.90Chapter 3.3.5 --- Some Notes on Odd Filter Order Case --- p.94Chapter 3.3.6 --- Simulation Results --- p.94Chapter 3.3.7 --- Application to Noise Cancellation --- p.99Chapter 3.4 --- Chapter Summary --- p.107References --- p.109Chapter 4. --- Serial Split Adaptive System --- p.112Chapter 4.1 --- Serial Form Adaptive Filter --- p.112Chapter 4.2 --- Time Delay Estimation with a Serial Split Adaptive Filter --- p.125Chapter 4.2.1 --- Adaptive TDE --- p.125Chapter 4.2.2 --- Split Filter Approach to Adaptive TDE --- p.132Chapter 4.2.3 --- Analysis of the New TDE System --- p.136Chapter 4.2.3.1 --- Least-mean-square Solution --- p.138Chapter 4.2.3.2 --- Adaptation Algorithm and Performance Evaluation --- p.142Chapter 4.2.4 --- Comparison with Traditional Adaptive TDE Method --- p.147Chapter 4.2.5 --- System Implementation --- p.148Chapter 4.2.6 --- Simulation Results --- p.148Chapter 4.2.7 --- Constrained Adaptation for the New TDE System --- p.156Chapter 4.3 --- Chapter Summary --- p.163References --- p.165Chapter 5. --- Extension of the Split Adaptive Systems --- p.167Chapter 5.1 --- The Generalized Parallel Split System --- p.167Chapter 5.2 --- The Generalized Serial Split System --- p.170Chapter 5.3 --- Comparison between the Parallel and the Serial Split Adaptive System --- p.172Chapter 5.4 --- Integration of the Two Forms of Split Predictors --- p.177Chapter 5.5 --- Application of the Integrated Split Model to Speech Encoding --- p.179Chapter 5.6 --- Chapter Summary --- p.188References --- p.139Chapter 6. --- Conclusions --- p.191References --- p.19
    corecore