54 research outputs found
Low-delay nonuniform pseudo-QMF banks with application to speech enhancement
Journal ArticleAbstract-This paper presents a method for designing low-delay nonuniform pseudo quadrature mirror filter (QMF) banks. This method is motivated by the work of Li, Nguyen, and Tantaratana, in which the nonuniform filter bank is realized by combining an appropriate number of adjacent sub-bands of a uniform pseudo-QMF bank. In prior work, the prototype filter of the uniform pseudo-QMF bank was constrained to have linear phase and the overall delay associated with the filter bank was often unacceptably large for filter banks with a large number of sub-bands. This paper proposes a pseudo-QMF filter bank design technique that significantly reduces the delay by relaxing the linear phase constraints. An example in which an oversampled critical-band nonuniform filter bank is designed and applied to a two-state modeling speech enhancement system is presented in this paper. Comparison of the performance of this system to competing methods employing tree-structured, linear phase multiresolution analysis indicates that the approach described in this paper strikes a good balance between system performance and low delay
Studies in Signal Processing Techniques for Speech Enhancement: A comparative study
Speech enhancement is very essential to suppress the background noise and to increase speech intelligibility and reduce fatigue in hearing. There exist many simple speech enhancement algorithms like spectral subtraction to complex algorithms like Bayesian Magnitude estimators based on Minimum Mean Square Error (MMSE) and its variants. A continuous research is going and new algorithms are emerging to enhance speech signal recorded in the background of environment such as industries, vehicles and aircraft cockpit. In aviation industries speech enhancement plays a vital role to bring crucial information from pilot’s conversation in case of an incident or accident by suppressing engine and other cockpit instrument noises. In this work proposed is a new approach to speech enhancement making use harmonic wavelet transform and Bayesian estimators. The performance indicators, SNR and listening confirms to the fact that newly modified algorithms using harmonic wavelet transform indeed show better results than currently existing methods. Further, the Harmonic Wavelet Transform is computationally efficient and simple to implement due to its inbuilt decimation-interpolation operations compared to those of filter-bank approach to realize sub-bands
Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods
Speech signals radiated in confined spaces are subject to reverberation due to reflections
of surrounding walls and obstacles. Reverberation leads to severe degradation
of speech intelligibility and can be prohibitive for applications where speech is digitally
recorded, such as audio conferencing or hearing aids. Dereverberation of speech
is therefore an important field in speech enhancement.
Driven by consumer demand, blind speech dereverberation has become a popular
field in the research community and has led to many interesting approaches in the literature.
However, most existing methods are dictated by their underlying models and
hence suffer from assumptions that constrain the approaches to specific subproblems
of blind speech dereverberation. For example, many approaches limit the dereverberation
to voiced speech sounds, leading to poor results for unvoiced speech. Few
approaches tackle single-sensor blind speech dereverberation, and only a very limited
subset allows for dereverberation of speech from moving speakers.
Therefore, the aim of this dissertation is the development of a flexible and extendible
framework for blind speech dereverberation accommodating different speech
sound types, single- or multiple sensor as well as stationary and moving speakers.
Bayesian methods benefit from – rather than being dictated by – appropriate model
choices. Therefore, the problem of blind speech dereverberation is considered from
a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach
accommodating a multitude of models for the speech production mechanism and
room transfer function is consequently derived. In this approach both the anechoic
source signal and reverberant channel are estimated using their optimal estimators by
means of Rao-Blackwellisation of the state-space of unknown variables. The remaining
model parameters are estimated using sequential importance resampling.
The proposed approach is implemented for two different speech production models
for stationary speakers, demonstrating substantial reduction in reverberation for
both unvoiced and voiced speech sounds. Furthermore, the channel model is extended
to facilitate blind dereverberation of speech from moving speakers. Due to the
structure of measurement model, single- as well as multi-microphone processing is facilitated,
accommodating physically constrained scenarios where only a single sensor
can be used as well as allowing for the exploitation of spatial diversity in scenarios
where the physical size of microphone arrays is of no concern.
This dissertation is concluded with a survey of possible directions for future research,
including the use of switching Markov source models, joint target tracking
and enhancement, as well as an extension to subband processing for improved computational
efficiency
Optimisation techniques for low bit rate speech coding
This thesis extends the background theory of speech and major speech coding schemes used in existing networks to an implementation of GSM full-rate speech compression on a RISC DSP and a multirate application for speech coding. Speech coding is the field concerned with obtaining compact digital representations of speech signals for the purpose of efficient transmission. In this thesis, the background of speech compression, characteristics of speech signals and the DSP algorithms used have been examined. The current speech coding schemes and requirements have been studied. The Global System for Mobile communication (GSM) is a digital mobile radio system which is extensively used throughout Europe, and also in many other parts of the world. The algorithm is standardised by the European Telecommunications Standardisation histitute (ETSI). The full-rate and half-rate speech compression of GSM have been analysed. A real time implementation of the full-rate algorithm has been carried out on a RISC processor GEPARD by Austria Mikro Systeme International (AMS). The GEPARD code has been tested with all of the test sequences provided by ETSI and the results are bit-exact. The transcoding delay is lower than the ETSI requirement. A comparison of the half-rate and full-rate compression algorithms is discussed. Both algorithms offer near toll speech quality comparable or better than analogue cellular networks. The half-rate compression requires more computationally intensive operations and therefore a more powerful processor will be needed due to the complexity of the code. Hence the cost of the implementation of half-rate codec will be considerably higher than full-rate. A description of multirate signal processing and its application on speech (SBC) and speech/audio (MPEG) has been given. An investigation into the possibility of combining multirate filtering and GSM fill-rate speech algorithm. The results showed that multirate signal processing cannot be directly applied GSM full-rate speech compression since this method requires more processing power, causing longer coding delay but did not appreciably improve the bit rate. In order to achieve a lower bit rate, the GSM full-rate mathematical algorithm can be used instead of the standardised ETSI recommendation. Some changes including the number of quantisation bits has to be made before the application of multirate signal processing and a new standard will be required
Perceptual and acoustic impacts of aberrant properties of electrolaryngeal speech.
Thesis (Ph. D.)—Harvard-MIT Division of Health Sciences and Technology, 2003.Includes bibliographical references (p. 167-171).This electronic version was prepared by the author. The certified thesis is available in the Institute Archives and Special Collections.Ph. D
Doctor of Philosophy
dissertationHearing aids suffer from the problem of acoustic feedback that limits the gain provided by hearing aids. Moreover, the output sound quality of hearing aids may be compromised in the presence of background acoustic noise. Digital hearing aids use advanced signal processing to reduce acoustic feedback and background noise to improve the output sound quality. However, it is known that the output sound quality of digital hearing aids deteriorates as the hearing aid gain is increased. Furthermore, popular subband or transform domain digital signal processing in modern hearing aids introduces analysis-synthesis delays in the forward path. Long forward-path delays are not desirable because the processed sound combines with the unprocessed sound that arrives at the cochlea through the vent and changes the sound quality. In this dissertation, we employ a variable, frequency-dependent gain function that is lower at frequencies of the incoming signal where the information is perceptually insignificant. In addition, the method of this dissertation automatically identifies and suppresses residual acoustical feedback components at frequencies that have the potential to drive the system to instability. The suppressed frequency components are monitored and the suppression is removed when such frequencies no longer pose a threat to drive the hearing aid system into instability. Together, the method of this dissertation provides more stable gain over traditional methods by reducing acoustical coupling between the microphone and the loudspeaker of a hearing aid. In addition, the method of this dissertation performs necessary hearing aid signal processing with low-delay characteristics. The central idea for the low-delay hearing aid signal processing is a spectral gain shaping method (SGSM) that employs parallel parametric equalization (EQ) filters. Parameters of the parametric EQ filters and associated gain values are selected using a least-squares approach to obtain the desired spectral response. Finally, the method of this dissertation switches to a least-squares adaptation scheme with linear complexity at the onset of howling. The method adapts to the altered feedback path quickly and allows the patient to not lose perceivable information. The complexity of the least-squares estimate is reduced by reformulating the least-squares estimate into a Toeplitz system and solving it with a direct Toeplitz solver. The increase in stable gain over traditional methods and the output sound quality were evaluated with psychoacoustic experiments on normal-hearing listeners with speech and music signals. The results indicate that the method of this dissertation provides 8 to 12 dB more hearing aid gain than feedback cancelers with traditional fixed gain functions. Furthermore, experimental results obtained with real world hearing aid gain profiles indicate that the method of this dissertation provides less distortion in the output sound quality than classical feedback cancelers, enabling the use of more comfortable style hearing aids for patients with moderate to profound hearing loss. Extensive MATLAB simulations and subjective evaluations of the results indicate that the method of this dissertation exhibits much smaller forward-path delays with superior howling suppression capability
Recommended from our members
Time-frequency analysis based on split spectrum applied to audio and ultrasonic signals
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonSignal processing is a large subject with applications integral to a number of technological fields such as communication, audio, Voice over IP (VoIP), pattern recognition, sonar, radar, ultrasound and medical imaging. Techniques exist for the analysis, modelling, extraction, recognition and synthesis of signals of interest. The focus of this thesis is signal processing for acoustics (both sonic and ultrasonic). In the applications examined, signals of interest are usually incomplete, distorted and/or noisy. Therefore, reconstructing the signal, noise reduction and removal of any distortion/interference are the main goals of the signal processing techniques presented. The primary aim is to study and develop an advanced time-frequency signal processing technique for acoustic applications to enhance the quality of the signals. In the first part of the thesis, a technique is presented that models and maintains the correlation between temporal and spectral parameters of audio signals. A novel Packet Loss Concealment (PLC) method is developed with applications to VoIP, audio broadcasting, and streaming. The problem of modelling the time-varying frequency spectrum in the context of PLC is addressed, and a novel solution is proposed for tracking and using the temporal motion of spectral flow to reconstruct the signal. The proposed method utilises a Time-Frequency Motion (TFM) matrix representation of the audio signal, where each frequency is tagged with a motion vector estimate that is assessed by cross-correlation of the movement of spectral energy within sub-bands across time frames. The missing packets are estimated using extrapolation or interpolation algorithms using a TFM matrix and then inverse transformed to the time-domain for reconstruction of the signal. The proposed method is compared with conventional approaches using objective Performance Evaluation of Speech Quality (PESQ), and subjective Mean Opinion Scores (MOS) in a range of packet loss from 5% to 20%. The evaluation results demonstrate that the proposed algorithm substantially improves performance by an average of 2.85% and 5.9% in terms of PESQ and MOS respectively. In the second part of the thesis, the proposed method is extended and modified to address challenges of excessive coherent noise arising from ultrasonic signals gathered during Guided Wave Testing (GWT). It is an advanced Non-destructive testing technique which is used over several branches of industry to inspect large structures for defects where the structural integrity is of concern. In such systems, signal interpretation can often be challenging due to the multi-modal and dispersive propagation of Ultrasonic Guided Waves (UGWs). The multi-modal and dispersive nature of the received signals hampers the ability to detect defects in a given structure. The Split-Spectrum Processing (SSP) method with application for such signal has been studied and reviewed quantitatively to measure the enhancement in terms of Signal-to-Noise Ratio (SNR) and spatial resolution. In this thesis, the influence of SSP filter bank parameters on these signals is studied and optimised to improve SNR and spatial resolution considerably. The proposed method is compared analytically and experimentally with conventional approaches. The proposed SSP algorithm substantially improves SNR by an average of 30dB. The conclusions reached in this thesis will contribute to the progression of the GWT technique through considerable improvement in defect detection capability.Centre for Electronic Systems Research (CESR) of Brunel University London, The National Structural Integrity Research Centre (NSIRC) and TWI Ltd
Mathematics and Digital Signal Processing
Modern computer technology has opened up new opportunities for the development of digital signal processing methods. The applications of digital signal processing have expanded significantly and today include audio and speech processing, sonar, radar, and other sensor array processing, spectral density estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control systems, biomedical engineering, and seismology, among others. This Special Issue is aimed at wide coverage of the problems of digital signal processing, from mathematical modeling to the implementation of problem-oriented systems. The basis of digital signal processing is digital filtering. Wavelet analysis implements multiscale signal processing and is used to solve applied problems of de-noising and compression. Processing of visual information, including image and video processing and pattern recognition, is actively used in robotic systems and industrial processes control today. Improving digital signal processing circuits and developing new signal processing systems can improve the technical characteristics of many digital devices. The development of new methods of artificial intelligence, including artificial neural networks and brain-computer interfaces, opens up new prospects for the creation of smart technology. This Special Issue contains the latest technological developments in mathematics and digital signal processing. The stated results are of interest to researchers in the field of applied mathematics and developers of modern digital signal processing systems
- …