31 research outputs found
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications
Ankara : Department of Electrical and Electronics Engineering and The Institute of Engineering and Sciences of Bilkent University, 1998.Thesis (Master's) -- Bilkent University, 1998.Includes bibliographical references leaves 124-132In this thesis, two new speech variation measures for speech spectrum nonstationarity
detection are proposed. These measures are based on the Line
Spectrum Frequencies (LSF) and the spectral values at the LSF locations.
They are formulated to be subjectively meaningful, mathematically tractable,
and also have low computational complexity property. In order to demonstrate
the usefulness of the non-stationarity detector, two applications are presented:
The first application is an implicit speech segmentation system which detects
non-stationary regions in speech signal and obtains the boundaries of the speech
segments. The other application is a Variable Bit-Rate Mixed Excitation Linear
Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector
to detect silent regions in the speech. This voice activity detector is designed
to be robust to non-stationary background noise and provides efficient coding
of silent sections and unvoiced utterances to decrease the bit-rate. Simulation
results are also presented.Ertan, Ali ErdemM.S
Optimisation techniques for low bit rate speech coding
This thesis extends the background theory of speech and major speech coding schemes used in existing networks to an implementation of GSM full-rate speech compression on a RISC DSP and a multirate application for speech coding. Speech coding is the field concerned with obtaining compact digital representations of speech signals for the purpose of efficient transmission. In this thesis, the background of speech compression, characteristics of speech signals and the DSP algorithms used have been examined. The current speech coding schemes and requirements have been studied. The Global System for Mobile communication (GSM) is a digital mobile radio system which is extensively used throughout Europe, and also in many other parts of the world. The algorithm is standardised by the European Telecommunications Standardisation histitute (ETSI). The full-rate and half-rate speech compression of GSM have been analysed. A real time implementation of the full-rate algorithm has been carried out on a RISC processor GEPARD by Austria Mikro Systeme International (AMS). The GEPARD code has been tested with all of the test sequences provided by ETSI and the results are bit-exact. The transcoding delay is lower than the ETSI requirement. A comparison of the half-rate and full-rate compression algorithms is discussed. Both algorithms offer near toll speech quality comparable or better than analogue cellular networks. The half-rate compression requires more computationally intensive operations and therefore a more powerful processor will be needed due to the complexity of the code. Hence the cost of the implementation of half-rate codec will be considerably higher than full-rate. A description of multirate signal processing and its application on speech (SBC) and speech/audio (MPEG) has been given. An investigation into the possibility of combining multirate filtering and GSM fill-rate speech algorithm. The results showed that multirate signal processing cannot be directly applied GSM full-rate speech compression since this method requires more processing power, causing longer coding delay but did not appreciably improve the bit rate. In order to achieve a lower bit rate, the GSM full-rate mathematical algorithm can be used instead of the standardised ETSI recommendation. Some changes including the number of quantisation bits has to be made before the application of multirate signal processing and a new standard will be required
New linear predictive methods for digital speech processing
Speech processing is needed whenever speech is to be compressed, synthesised or recognised by the means of electrical equipment. Different types of phones, multimedia equipment and interfaces to various electronic devices, all require digital speech processing. As an example, a GSM phone applies speech processing in its RPE-LTP encoder/decoder (ETSI, 1997). In this coder, 20 ms of speech is first analysed in the short-term prediction (STP) part, and second in the long-term prediction (LTP) part. Finally, speech compression is achieved in the RPE encoding part, where only 1/3 of the encoded samples are selected to be transmitted.
This thesis presents modifications for one of the most widely applied techniques in digital speech processing, namely linear prediction (LP). During recent decades linear prediction has played an important role in telecommunications and other areas related to speech compression and recognition. In linear prediction sample s(n) is predicted from its p previous samples by forming a linear combination of the p previous samples and by minimising the prediction error. This procedure in the time domain corresponds to modelling the spectral envelope of the speech spectrum in the frequency domain. The accuracy of the spectral envelope to the speech spectrum is strongly dependent on the order of the resulting all-pole filter. This, in turn, is usually related to the number of parameters required to define the model, and hence to be transmitted.
Our study presents new predictive methods, which are modified from conventional linear prediction by taking the previous samples for linear combination differently. This algorithmic development aims at new all-pole techniques, which could present speech spectra with fewer parameters.reviewe
Perceptual models in speech quality assessment and coding
The ever-increasing demand for good communications/toll
quality speech has created a renewed interest into the
perceptual impact of rate compression. Two general areas are
investigated in this work, namely speech quality assessment
and speech coding.
In the field of speech quality assessment, a model is
developed which simulates the processing stages of the
peripheral auditory system. At the output of the model a
"running" auditory spectrum is obtained. This represents
the auditory (spectral) equivalent of any acoustic sound such
as speech. Auditory spectra from coded speech segments serve
as inputs to a second model. This model simulates the
information centre in the brain which performs the speech
quality assessment. [Continues.
Making music through real-time voice timbre analysis: machine learning and timbral control
PhDPeople can achieve rich musical expression through vocal sound { see for example
human beatboxing, which achieves a wide timbral variety through a range of
extended techniques. Yet the vocal modality is under-exploited as a controller
for music systems. If we can analyse a vocal performance suitably in real time,
then this information could be used to create voice-based interfaces with the
potential for intuitive and ful lling levels of expressive control.
Conversely, many modern techniques for music synthesis do not imply any
particular interface. Should a given parameter be controlled via a MIDI keyboard,
or a slider/fader, or a rotary dial? Automatic vocal analysis could provide
a fruitful basis for expressive interfaces to such electronic musical instruments.
The principal questions in applying vocal-based control are how to extract
musically meaningful information from the voice signal in real time, and how
to convert that information suitably into control data. In this thesis we address
these questions, with a focus on timbral control, and in particular we
develop approaches that can be used with a wide variety of musical instruments
by applying machine learning techniques to automatically derive the mappings
between expressive audio input and control output. The vocal audio signal is
construed to include a broad range of expression, in particular encompassing
the extended techniques used in human beatboxing.
The central contribution of this work is the application of supervised and
unsupervised machine learning techniques to automatically map vocal timbre
to synthesiser timbre and controls. Component contributions include a delayed
decision-making strategy for low-latency sound classi cation, a regression-tree
method to learn associations between regions of two unlabelled datasets, a fast
estimator of multidimensional di erential entropy and a qualitative method for
evaluating musical interfaces based on discourse analysis