128 research outputs found

    Power series quantization for noisy channels

    Full text link

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    New techniques in signal coding

    Get PDF

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)

    Speech coding at medium bit rates using analysis by synthesis techniques

    Get PDF
    Speech coding at medium bit rates using analysis by synthesis technique

    Quantisation mechanisms in multi-protoype waveform coding

    Get PDF
    Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours

    K-means based clustering and context quantization

    Get PDF

    Scalable Video Streaming with Prioritised Network Coding on End-System Overlays

    Get PDF
    PhDDistribution over the internet is destined to become a standard approach for live broadcasting of TV or events of nation-wide interest. The demand for high-quality live video with personal requirements is destined to grow exponentially over the next few years. Endsystem multicast is a desirable option for relieving the content server from bandwidth bottlenecks and computational load by allowing decentralised allocation of resources to the users and distributed service management. Network coding provides innovative solutions for a multitude of issues related to multi-user content distribution, such as the coupon-collection problem, allocation and scheduling procedure. This thesis tackles the problem of streaming scalable video on end-system multicast overlays with prioritised push-based streaming. We analyse the characteristic arising from a random coding process as a linear channel operator, and present a novel error detection and correction system for error-resilient decoding, providing one of the first practical frameworks for Joint Source-Channel-Network coding. Our system outperforms both network error correction and traditional FEC coding when performed separately. We then present a content distribution system based on endsystem multicast. Our data exchange protocol makes use of network coding as a way to collaboratively deliver data to several peers. Prioritised streaming is performed by means of hierarchical network coding and a dynamic chunk selection for optimised rate allocation based on goodput statistics at application layer. We prove, by simulated experiments, the efficient allocation of resources for adaptive video delivery. Finally we describe the implementation of our coding system. We highlighting the use rateless coding properties, discuss the application in collaborative and distributed coding systems, and provide an optimised implementation of the decoding algorithm with advanced CPU instructions. We analyse computational load and packet loss protection via lab tests and simulations, complementing the overall analysis of the video streaming system in all its components

    Transmission efficace en temps réel de la voix sur réseaux ad hoc sans fil

    Get PDF
    La téléphonie mobile se démocratise et de nouveaux types de réseaux voient le jour, notamment les réseaux ad hoc. Sans focaliser exclusivement sur ces réseaux particuliers, le nombre de communications vocales effectuées chaque minute est en constante augmentation mais les réseaux sont encore souvent victimes d'erreurs de transmission. L'objectif de cette thèse porte sur l'utilisation de méthodes de codage en vue d'une transmission de la voix robuste face aux pertes de paquets, sur un réseau mobile et sans fil perturbé permettant le multichemin. La méthode envisagée prévoit l'utilisation d'un codage en descriptions multiples (MDC) appliqué à un flux de données issu d'un codec de parole bas débit, plus particulièrement l'AMR-WB (Adaptive Multi Rate - Wide Band). Parmi les paramètres encodés par l'AMR-WB, les coefficients de la prédiction linéaire sont calculés une fois par trame, contrairement aux autres paramètres qui sont calculés quatre fois. La problématique majeure réside dans la création adéquate de descriptions pour les paramètres de prédiction linéaire. La méthode retenue applique une quantification vectorielle conjuguée à quatre descriptions. Pour diminuer la complexité durant la recherche, le processus est épaulé d'un préclassificateur qui effectue une recherche localisée dans le dictionnaire complet selon la position d'un vecteur d'entrée. L'application du modèle de MDC à des signaux de parole montre que l'utilisation de quatre descriptions permet de meilleurs résultats lorsque le réseau est sujet à des pertes de paquets. Une optimisation de la communication entre le routage et le processus de création de descriptions mène à l'utilisation d'une méthode adaptative du codage en descriptions. Les travaux de cette thèse visaient la retranscription d'un signal de parole de qualité, avec une optimisation adéquate des ressources de stockage, de la complexité et des calculs. La méthode adaptative de MDC rencontre ces attentes et s'avère très robuste dans un contexte de perte de paquets
    • …
    corecore