128 research outputs found
An acoustic-phonetic approach in automatic Arabic speech recognition
In a large vocabulary speech recognition system the broad phonetic classification
technique is used instead of detailed phonetic analysis to overcome the variability in the
acoustic realisation of utterances. The broad phonetic description of a word is used as a
means of lexical access, where the lexicon is structured into sets of words sharing the
same broad phonetic labelling.
This approach has been applied to a large vocabulary isolated word Arabic speech
recognition system. Statistical studies have been carried out on 10,000 Arabic words
(converted to phonemic form) involving different combinations of broad phonetic
classes. Some particular features of the Arabic language have been exploited. The results
show that vowels represent about 43% of the total number of phonemes. They also show
that about 38% of the words can uniquely be represented at this level by using eight
broad phonetic classes. When introducing detailed vowel identification the percentage of
uniquely specified words rises to 83%. These results suggest that a fully detailed
phonetic analysis of the speech signal is perhaps unnecessary.
In the adopted word recognition model, the consonants are classified into four broad
phonetic classes, while the vowels are described by their phonemic form. A set of 100
words uttered by several speakers has been used to test the performance of the
implemented approach.
In the implemented recognition model, three procedures have been developed, namely
voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic
spectral transition detection between phonemes within a word. The accuracy of both the
V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic
segmentation procedure has been implemented, which exploits information from the
above mentioned three procedures. Simple phonological constraints have been used to
improve the accuracy of the segmentation process. The resultant sequence of labels are
used for lexical access to retrieve the word or a small set of words sharing the same broad
phonetic labelling. For the case of having more than one word-candidates, a verification
procedure is used to choose the most likely one
Time and frequency domain algorithms for speech coding
The promise of digital hardware economies (due to recent advances in
VLSI technology), has focussed much attention on more complex and sophisticated
speech coding algorithms which offer improved quality at relatively
low bit rates.
This thesis describes the results (obtained from computer simulations)
of research into various efficient (time and frequency domain) speech
encoders operating at a transmission bit rate of 16 Kbps.
In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM)
systems employing both forward and backward adaptive prediction were
examined. A number of algorithms were proposed and evaluated, including
several variants of the Stochastic Approximation Predictor (SAP). A
Backward Block Adaptive (BBA) predictor was also developed and found to
outperform the conventional stochastic methods, even though its complexity
in terms of signal processing requirements is lower. A simplified
Adaptive Predictive Coder (APC) employing a single tap pitch predictor
considered next provided a slight improvement in performance over ADPCM,
but with rather greater complexity.
The ultimate test of any speech coding system is the perceptual performance
of the received speech. Recent research has indicated that this
may be enhanced by suitable control of the noise spectrum according to
the theory of auditory masking. Various noise shaping ADPCM
configurations were examined, and it was demonstrated that a proposed
pre-/post-filtering arrangement which exploits advantageously the
predictor-quantizer interaction, leads to the best subjective
performance in both forward and backward prediction systems.
Adaptive quantization is instrumental to the performance of ADPCM systems.
Both the forward adaptive quantizer (AQF) and the backward oneword
memory adaptation (AQJ) were examined. In addition, a novel method
of decreasing quantization noise in ADPCM-AQJ coders, which involves the
application of correction to the decoded speech samples, provided
reduced output noise across the spectrum, with considerable high frequency
noise suppression.
More powerful (and inevitably more complex) frequency domain speech
coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder
(SBC) offer good quality speech at 16 Kbps. To reduce complexity and
coding delay, whilst retaining the advantage of sub-band coding, a novel
transform based split-band coder (TSBC) was developed and found to compare
closely in performance with the SBC.
To prevent the heavy side information requirement associated with a
large number of bands in split-band coding schemes from impairing coding
accuracy, without forgoing the efficiency provided by adaptive bit
allocation, a method employing AQJs to code the sub-band signals together
with vector quantization of the bit allocation patterns was also
proposed.
Finally, 'pipeline' methods of bit allocation and step size estimation
(using the Fast Fourier Transform (FFT) on the input signal) were examined.
Such methods, although less accurate, are nevertheless useful in
limiting coding delay associated with SRC schemes employing Quadrature
Mirror Filters (QMF)
Speech coding at medium bit rates using analysis by synthesis techniques
Speech coding at medium bit rates using analysis by synthesis technique
Quantisation mechanisms in multi-protoype waveform coding
Prototype Waveform Coding is one of the most promising methods for speech coding at low bit rates over telecommunications networks. This thesis investigates quantisation mechanisms in Multi-Prototype Waveform (MPW) coding, and two prototype waveform quantisation algorithms for speech coding at bit rates of 2.4kb/s are proposed. Speech coders based on these algorithms have been found to be capable of producing coded speech with equivalent perceptual quality to that generated by the US 1016 Federal Standard CELP-4.8kb/s algorithm. The two proposed prototype waveform quantisation algorithms are based on Prototype Waveform Interpolation (PWI). The first algorithm is in an open loop architecture (Open Loop Quantisation). In this algorithm, the speech residual is represented as a series of prototype waveforms (PWs). The PWs are extracted in both voiced and unvoiced speech, time aligned and quantised and, at the receiver, the excitation is reconstructed by smooth interpolation between them. For low bit rate coding, the PW is decomposed into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW). The SEW is coded using vector quantisation on both magnitude and phase spectra. The SEW codebook search is based on the best matching of the SEW and the SEW codebook vector. The REW phase spectra is not quantised, but it is recovered using Gaussian noise. The REW magnitude spectra, on the other hand, can be either quantised with a certain update rate or only derived according to SEW behaviours
Scalable Video Streaming with Prioritised Network Coding on End-System Overlays
PhDDistribution over the internet is destined to become a standard approach for live broadcasting
of TV or events of nation-wide interest. The demand for high-quality live video
with personal requirements is destined to grow exponentially over the next few years. Endsystem
multicast is a desirable option for relieving the content server from bandwidth bottlenecks
and computational load by allowing decentralised allocation of resources to the users
and distributed service management. Network coding provides innovative solutions for a
multitude of issues related to multi-user content distribution, such as the coupon-collection
problem, allocation and scheduling procedure. This thesis tackles the problem of streaming
scalable video on end-system multicast overlays with prioritised push-based streaming.
We analyse the characteristic arising from a random coding process as a linear channel
operator, and present a novel error detection and correction system for error-resilient decoding,
providing one of the first practical frameworks for Joint Source-Channel-Network
coding. Our system outperforms both network error correction and traditional FEC coding
when performed separately. We then present a content distribution system based on endsystem
multicast. Our data exchange protocol makes use of network coding as a way to
collaboratively deliver data to several peers. Prioritised streaming is performed by means
of hierarchical network coding and a dynamic chunk selection for optimised rate allocation
based on goodput statistics at application layer. We prove, by simulated experiments, the
efficient allocation of resources for adaptive video delivery. Finally we describe the implementation
of our coding system. We highlighting the use rateless coding properties, discuss
the application in collaborative and distributed coding systems, and provide an optimised
implementation of the decoding algorithm with advanced CPU instructions. We analyse
computational load and packet loss protection via lab tests and simulations, complementing
the overall analysis of the video streaming system in all its components
Transmission efficace en temps réel de la voix sur réseaux ad hoc sans fil
La téléphonie mobile se démocratise et de nouveaux types de réseaux voient le jour, notamment les réseaux ad hoc. Sans focaliser exclusivement sur ces réseaux particuliers, le nombre de communications vocales effectuées chaque minute est en constante augmentation mais les réseaux sont encore souvent victimes d'erreurs de transmission. L'objectif de cette thèse porte sur l'utilisation de méthodes de codage en vue d'une transmission de la voix robuste face aux pertes de paquets, sur un réseau mobile et sans fil perturbé permettant le multichemin. La méthode envisagée prévoit l'utilisation d'un codage en descriptions multiples (MDC) appliqué à un flux de données issu d'un codec de parole bas débit, plus particulièrement l'AMR-WB (Adaptive Multi Rate - Wide Band). Parmi les paramètres encodés par l'AMR-WB, les coefficients de la prédiction linéaire sont calculés une fois par trame, contrairement aux autres paramètres qui sont calculés quatre fois. La problématique majeure réside dans la création adéquate de descriptions pour les paramètres de prédiction linéaire. La méthode retenue applique une quantification vectorielle conjuguée à quatre descriptions. Pour diminuer la complexité durant la recherche, le processus est épaulé d'un préclassificateur qui effectue une recherche localisée dans le dictionnaire complet selon la position d'un vecteur d'entrée. L'application du modèle de MDC à des signaux de parole montre que l'utilisation de quatre descriptions permet de meilleurs résultats lorsque le réseau est sujet à des pertes de paquets. Une optimisation de la communication entre le routage et le processus de création de descriptions mène à l'utilisation d'une méthode adaptative du codage en descriptions. Les travaux de cette thèse visaient la retranscription d'un signal de parole de qualité, avec une optimisation adéquate des ressources de stockage, de la complexité et des calculs. La méthode adaptative de MDC rencontre ces attentes et s'avère très robuste dans un contexte de perte de paquets
- …