Search CORE

135 research outputs found

Maximum entropy pole-zero estimation

Author
Publication venue: Massachusetts Institute of Technology, Research Laboratory of Electronics
Publication date: 01/01/1985
Field of study

Bibliography: p. 90-93.Supported in part by the Advanced Research Projects Agency monitored by ONR under contract no. N00014-81-K-0742 NR-049-506 Supported in part by the National Science Foundation under grant ECS80-07102Bruce R. Musicus, Allen M. Kabel

DSpace@MIT

Low bit rate speech communication based on charge coupled device fourier transform processors

Author: Davie Malcolm Craig
Publication venue: The University of Edinburgh
Publication date: 01/01/1980
Field of study

Edinburgh Research Archive

A review of vibration signal processing techniques for use in a real time condition monitoring system

Author: Birch David
Publication venue: Department of Electrical Engineering
Publication date: 01/01/1994
Field of study

Bibliography: p. 181-183.The analysis of the vibrations produced by roller bearings is one of the most widely used techniques in condition determination of rolling element bearings. This project forms part of an overall plan to gain experience in condition monitoring and produce a computer aided vibration monitoring system that would initially be applied to rolling element bearings, and then later to other machine components. The particular goal of this project is to study signal processing techniques that will be of use in this system. The general signal processing problems are as follows. The vibration of an undamaged bearing is characterised by a Gaussian distribution and a white power spectral density. Once a bearing is damaged the nature of the vibration changes often with spikes or impulses present in the vibration signal. By detecting these impulses a measure of the condition of the bearing may be obtained. The primary goal in machine condition determination then becomes the detection of these impulses in the presence of noise and contaminating. signals and to discriminate between those caused by the component in question and those from other sources. A wide range of signal processing techniques were reviewed and some of these tested on vibrations recorded on the Mechanical engineering departments bearing test rig. It was found that the time domain statistics (RMS, kurtosis, crest factor) were the simplest to use, but could be unreliable. On the other hand, frequency domain analysis techniques, such as the power spectrum were more reliable, but more difficult to apply. By making use of a variety of these techniques and applying them in a systematic manner, it is possible to make an assessment of bearing condition under a wide variety of operating conditions. A small number of the signal processing techniques were programmed for a DSP processor. It was found that all of the techniques, with the exception of the bispectrum could be programmed for the DSP chip. It was found however that the available DSP card did not have sufficient memory to allow analysis and preprocessing routines to be combined. In addition to this the analogue to digital conversion system would benefit from a buffered IO system. The project should continue, with the DSP card being upgraded and all the necessary signal processing routines programmed. The project can then move to the next phase which would be inclusion of display and interface software and Artificial Intelligence analysis aids

Cape Town University OpenUCT

Design, stability and applications of two dimensional recursive digital filters

Author: Ahmadi Madjid
Ahmadi Madjid
Publication venue: Department of Communication & Electronics, Imperial College London
Publication date: 01/01/1977
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Statistical parametric speech synthesis based on sinusoidal models

Author: Hu Qiong
Publication venue: The University of Edinburgh
Publication date: 07/07/2017
Field of study

This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis

Edinburgh Research Archive

Text-independent speaker recognition

Author: Gangisetty Smitha
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2005
Field of study

This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect

The Research Repository @ WVU (West Virginia University)

Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech

Author: Mahlanyane Nkululeko S
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques

Cape Town University OpenUCT

Phonetically transparent technique for the automatic transcription of speech

Author: Morony Michael J.
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive

System Identification with Applications in Speech Enhancement

Author: Lin Xiang
Lin Xiang
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/08/2009
Field of study

As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

Spiral - Imperial College Digital Repository