    Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications

    Ankara : Department of Electrical and Electronics Engineering and The Institute of Engineering and Sciences of Bilkent University, 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 124-132In this thesis, two new speech variation measures for speech spectrum nonstationarity detection are proposed. These measures are based on the Line Spectrum Frequencies (LSF) and the spectral values at the LSF locations. They are formulated to be subjectively meaningful, mathematically tractable, and also have low computational complexity property. In order to demonstrate the usefulness of the non-stationarity detector, two applications are presented: The first application is an implicit speech segmentation system which detects non-stationary regions in speech signal and obtains the boundaries of the speech segments. The other application is a Variable Bit-Rate Mixed Excitation Linear Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector to detect silent regions in the speech. This voice activity detector is designed to be robust to non-stationary background noise and provides efficient coding of silent sections and unvoiced utterances to decrease the bit-rate. Simulation results are also presented.Ertan, Ali ErdemM.S

    Excitação multi-taxa usando quantização vetorial estruturada em árvore para o codificador CS-ACELP com aplicação em VoIP

    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia Elétrica.Este trabalho apresenta um estudo sobre codificação multi-taxa estruturada sobre o algoritmo CS-ACELP (Conjugate-Structure Algebraic-Code-Excited Linear-Prediction) e a especificação G.729, cujo objetivo é propor um codificador com taxa variável, através da busca da melhor excitação fixa usando codebook estruturado em árvore, para aplicações VoIP (Voice-over-IP). A mudança progressiva do transporte de voz das redes de circuito para as redes IP (Internet Protocol), apesar dos diversos aspectos positivos, tem exposto algumas deficiências intrínsecas destas, mais apropriadas ao tráfego de #melhor esforço# do que ao tráfego com requisitos de tempo. Esta proposta está inserida no conjunto das iniciativas, no âmbito do transmissor, que procuram minimizar os efeitos danosos da rede sobre a qualidade da voz reconstruída. O codebook proposto tem estrutura em árvore binária, concebida a partir de uma heurística onde os vetores CS-ACELP são ordenados por valor de forma decrescente. Uma estratégia particular de armazenamento dos nós, envolvendo simplificação nos centróides, codificação diferencial e geração automática dos dois últimos níveis da árvore, permite reduzir o espaço de armazenamento de 640 para apenas 7 kwords. Através deste modelo chega-se a 13 taxas de codificação, de 5,6 a 8,0 kbit/s, com passo de 0,2 kbit/s. A relação sinal ruído fica em 1,5 dB abaixo da mesma medida na especificação G.729 para a taxa de 5,6 kbit/s, e apenas 0,6 dB abaixo quando na taxa 8,0 kbit/s. Testes subjetivos mostraram uma qualidade bastante aceitável para a taxa mínima e praticamente indistinguível do codec original na taxa máxima. Além disso, a busca da melhor excitação é 2,4 vezes mais rápida em comparação ao codec G.729 e pode ser totalmente compatível com este se a taxa for fixa em 8,0 kbit/s. This work presents a study about multi-rate coding structured over CS-ACELP (Conjugate-Structure Algebraic-Code-Excited Linear-Prediction) algorithm and G.729 standard, whose purpose is to come up with a variable rate codec by means of best fixed excitation search using a tree structured codebook, for VoIP (Voice-over-IP) applications. The progressive change of voice transmission from circuit switched to IP (Internet orks, besides its many positive aspects, has exposed some natural deficiencies of the latter, better suited to best effort traffics than traffics with time requirements. This proposition can be inserted in the bunch of efforts, related to the sender, that seek to reduce the network impairments over the quality of reconstructed voice. The suggested codebook has a binary tree structure heuristically conceived where algebraic CSACELP vectors are disposed by value in a decreasing order. Additionally, a particular approach to store the tree nodes are considered, which involves centroid implification, differential coding and automatic generation of the last two layers of the tree, squeezing the storing space from 640 down to 7 kwords. Through this model we reach 13 coding rates, ranging from 5.6 to 8.0 kbit/s, with 0.2 kbit/s step. The signal-to-noise ratio is 1.5 dB below the same measure for G.729 standard at the rate 5.6 kbit/s, and just 0.6 dB lower at 8.0 kbit/s. Subjective tests pointed to an acceptable quality at minimum rate and virtually indistinguishable quality from the original codec at the maximum one. Also, searching for the best fixed excitation is 2.4 times faster than G.729 and can be truly compatible with it if the rate is fixed in 8 kbit/s

    DĂ©tection et modification des transitoires d'un signal de parole dans le but de rendre un codec plus robuste aux pertes de paquets

    Pour transmettre les signaux de parole de façon efficace, ces derniers sont compressés et transmis en trames typiquement de 10 à 20 ms. Lors de la transmission des trames, il arrive que ces dernières soient perdues. Lors de la reconstruction du signal au décodeur, il est préférable de remplacer les trames perdues par un signal qui se rapproche le plus possible du signal manquant. Le signal perdu est souvent reconstruit en se basant sur l'information des dernières trames reçues, puisque, de façon générale, les propriétés statistiques du signal de parole évoluent relativement lentement d'une trame à la suivante. Les signaux de parole peuvent être classés en différentes catégories (parole voisée, nonvoisée, transitoire, etc.). Afin de mieux exploiter les caractéristiques de chaque catégorie, il est pertinent d'appliquer un classificateur à chaque trame de signal. Cette classification des signaux permet un meilleur camouflage des trames perdues, optimisé pour les différentes classes. La classification des trames est parfois imprécise lors des transitions entre une trame non-voisée et une trame voisée. Ces erreurs de classification entraînent de mauvaises reconstructions de signal lors des pertes de trames. Pour pallier ces erreurs, cette Thèse propose un nouvel algorithme robuste qui identifie les trames critiques et qui applique la classification appropriée. Pour les trames dont les propriétés ne correspondent pas exactement à l'une des classes disponibles, une modification transparente du signal est appliquée pour rendre ces trames conformes à la classification proposée. Ces modifications permettent d'obtenir une meilleure reconstruction du signal si les trames suivantes sont perdues

    CELP and speech enhancement

    This thesis addresses the intelligibility enhancement of speech that is heard within an acoustically noisy environment. In particular, a realistic target situation of a police vehicle interior, with speech generated from a CELP (codebook-excited linear prediction) speech compression-based communication system, is adopted. The research has centred on the role of the CELP speech compression algorithm, and its transmission parameters. In particular, novel methods of LSP-based (line spectral pair) speech analysis and speech modification are developed and described. CELP parameters have been utilised in the analysis and processing stages of a speech intelligibility enhancement system to minimise additional computational complexity over existing CELP coder requirements. Details are given of the CELP analysis process and its effects on speech, the development of speech analysis and alteration algorithms coexisting with a CELP system, their effects and performance. Both objective and subjective tests have been used to characterize the effectiveness of the analysis and processing methods. Subjective testing of a complete simulation enhancement system indicates its effectiveness under the tested conditions, and is extrapolated to predict real-life performance. The developed system presents a novel integrated solution to the intelligibility enhancement of speech, and can provide a doubling, on average, of intelligibility under the tested conditions of very low intelligibility