135 research outputs found
Maximum entropy pole-zero estimation
Bibliography: p. 90-93.Supported in part by the Advanced Research Projects Agency monitored by ONR under contract no. N00014-81-K-0742 NR-049-506 Supported in part by the National Science Foundation under grant ECS80-07102Bruce R. Musicus, Allen M. Kabel
A review of vibration signal processing techniques for use in a real time condition monitoring system
Bibliography: p. 181-183.The analysis of the vibrations produced by roller bearings is one of the most widely used techniques in condition determination of rolling element bearings. This project forms part of an overall plan to gain experience in condition monitoring and produce a computer aided vibration monitoring system that would initially be applied to rolling element bearings, and then later to other machine components. The particular goal of this project is to study signal processing techniques that will be of use in this system. The general signal processing problems are as follows. The vibration of an undamaged bearing is characterised by a Gaussian distribution and a white power spectral density. Once a bearing is damaged the nature of the vibration changes often with spikes or impulses present in the vibration signal. By detecting these impulses a measure of the condition of the bearing may be obtained. The primary goal in machine condition determination then becomes the detection of these impulses in the presence of noise and contaminating. signals and to discriminate between those caused by the component in question and those from other sources. A wide range of signal processing techniques were reviewed and some of these tested on vibrations recorded on the Mechanical engineering departments bearing test rig. It was found that the time domain statistics (RMS, kurtosis, crest factor) were the simplest to use, but could be unreliable. On the other hand, frequency domain analysis techniques, such as the power spectrum were more reliable, but more difficult to apply. By making use of a variety of these techniques and applying them in a systematic manner, it is possible to make an assessment of bearing condition under a wide variety of operating conditions. A small number of the signal processing techniques were programmed for a DSP processor. It was found that all of the techniques, with the exception of the bispectrum could be programmed for the DSP chip. It was found however that the available DSP card did not have sufficient memory to allow analysis and preprocessing routines to be combined. In addition to this the analogue to digital conversion system would benefit from a buffered IO system. The project should continue, with the DSP card being upgraded and all the necessary signal processing routines programmed. The project can then move to the next phase which would be inclusion of display and interface software and Artificial Intelligence analysis aids
Statistical parametric speech synthesis based on sinusoidal models
This study focuses on improving the quality of statistical speech synthesis based on sinusoidal
models. Vocoders play a crucial role during the parametrisation and reconstruction process,
so we first lead an experimental comparison of a broad range of the leading vocoder types.
Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes
can generate high quality of speech compared with source-filter ones, component
sinusoids are correlated with each other, and the number of parameters is also high and varies
in each frame, which constrains its application for statistical speech synthesis.
Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease
and fix the number of components typically used in the standard sinusoidal model.
Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system
(HTS), two strategies for modelling sinusoidal parameters have been compared. In the first
method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM
are statistically modelled directly. In the second method (INT parameterisation), we convert
both static amplitude and dynamic slope from all the harmonics of a signal, which we term
the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients
(RDC)) for modelling. Our results show that HDM with intermediate parameters can
generate comparable quality to STRAIGHT.
As correlations between features in the dynamic model cannot be modelled satisfactorily
by a typical HMM-based system with diagonal covariance, we have applied and tested a deep
neural network (DNN) for modelling features from these two methods. To fully exploit DNN
capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling
and waveform generation. For DNN training, we propose to use multi-task learning to
model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We
conclude from our results that sinusoidal models are indeed highly suited for statistical parametric
synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based
equivalent when used in conjunction with DNNs.
To further improve the voice quality, phase features generated from the proposed vocoder
also need to be parameterised and integrated into statistical modelling. Here, an alternative
statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued
back-propagation algorithm using a logarithmic minimisation criterion which includes
both amplitude and phase errors is used as a learning rule. Three parameterisation methods
are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued
amplitude with minimum phase and complex-valued amplitude with mixed phase. Our
results show the potential of using CVNNs for modelling both real and complex-valued acoustic
features. Overall, this thesis has established competitive alternative vocoders for speech
parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic
models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for
the parametric statistical speech synthesis
Text-independent speaker recognition
This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech
Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
- …