98 research outputs found

    Spatial, Spectral, and Perceptual Nonlinear Noise Reduction for Hands-free Microphones in a Car

    Get PDF
    Speech enhancement in an automobile is a challenging problem because interference can come from engine noise, fans, music, wind, road noise, reverberation, echo, and passengers engaging in other conversations. Hands-free microphones make the situation worse because the strength of the desired speech signal reduces with increased distance between the microphone and talker. Automobile safety is improved when the driver can use a hands-free interface to phones and other devices instead of taking his eyes off the road. The demand for high quality hands-free communication in the automobile requires the introduction of more powerful algorithms. This thesis shows that a unique combination of five algorithms can achieve superior speech enhancement for a hands-free system when compared to beamforming or spectral subtraction alone. Several different designs were analyzed and tested before converging on the configuration that achieved the best results. Beamforming, voice activity detection, spectral subtraction, perceptual nonlinear weighting, and talker isolation via pitch tracking all work together in a complementary iterative manner to create a speech enhancement system capable of significantly enhancing real world speech signals. The following conclusions are supported by the simulation results using data recorded in a car and are in strong agreement with theory. Adaptive beamforming, like the Generalized Side-lobe Canceller (GSC), can be effectively used if the filters only adapt during silent data frames because too much of the desired speech is cancelled otherwise. Spectral subtraction removes stationary noise while perceptual weighting prevents the introduction of offensive audible noise artifacts. Talker isolation via pitch tracking can perform better when used after beamforming and spectral subtraction because of the higher accuracy obtained after initial noise removal. Iterating the algorithm once increases the accuracy of the Voice Activity Detection (VAD), which improves the overall performance of the algorithm. Placing the microphone(s) on the ceiling above the head and slightly forward of the desired talker appears to be the best location in an automobile based on the experiments performed in this thesis. Objective speech quality measures show that the algorithm removes a majority of the stationary noise in a hands-free environment of an automobile with relatively minimal speech distortion

    Pre-processing of Speech Signals for Robust Parameter Estimation

    Get PDF

    Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

    Get PDF

    Algorithm and architecture for simultaneous diagonalization of matrices applied to subspace-based speech enhancement

    Get PDF
    This thesis presents algorithm and architecture for simultaneous diagonalization of matrices. As an example, a subspace-based speech enhancement problem is considered, where in the covariance matrices of the speech and noise are diagonalized simultaneously. In order to compare the system performance of the proposed algorithm, objective measurements of speech enhancement is shown in terms of the signal to noise ratio and mean bark spectral distortion at various noise levels. In addition, an innovative subband analysis technique for subspace-based time-domain constrained speech enhancement technique is proposed. The proposed technique analyses the signal in its subbands to build accurate estimates of the covariance matrices of speech and noise, exploiting the inherent low varying characteristics of speech and noise signals in narrow bands. The subband approach also decreases the computation time by reducing the order of the matrices to be simultaneously diagonalized. Simulation results indicate that the proposed technique performs well under extreme low signal-to-noise-ratio conditions. Further, an architecture is proposed to implement the simultaneous diagonalization scheme. The architecture is implemented on an FPGA primarily to compare the performance measures on hardware and the feasibility of the speech enhancement algorithm in terms of resource utilization, throughput, etc. A Xilinx FPGA is targeted for implementation. FPGA resource utilization re-enforces on the practicability of the design. Also a projection of the design feasibility for an ASIC implementation in terms of transistor count only is include

    OBJECTIVE AND SUBJECTIVE EVALUATION OF DEREVERBERATION ALGORITHMS

    Get PDF
    Reverberation significantly impacts the quality and intelligibility of speech. Several dereverberation algorithms have been proposed in the literature to combat this problem. A majority of these algorithms utilize a single channel and are developed for monaural applications, and as such do not preserve the cues necessary for sound localization. This thesis describes a blind two-channel dereverberation technique that improves the quality of speech corrupted by reverberation while preserving cues that affect localization. The method is based by combining a short term (2ms) and long term (20ms) weighting function of the linear prediction (LP) residual of the input signal. The developed and other dereverberation algorithms are evaluated objectively and subjectively in terms of sound quality and localization accuracy. The binaural adaptation provides a significant increase in sound quality while removing the loss in localization ability found in the bilateral implementation

    Entropy Theory for Streamflow Forecasting

    Get PDF
    Entropy spectral analysis is developed for monthly streamflow forecasting, which contains the use of configurational entropy and relative entropy. Multi-channel entropy spectral analysis is developed for long-term drought forecasting with climate indicators. The configurational entropy spectral analysis (CESA) is developed with both spectral power and frequency as random variables. With spectral power as a random variable, the configurational entropy spectral analysis (CESAS) identical to the original Burg entropy spectral analysis (BESA) when the underlying process is Gaussian. Through examination using monthly streamflow from the Mississippi Watershed, CESAS and BESA yield the same results and two methods are considered equivalent or as one method. With frequency as a random variable, the configurational entropy spectral analysis (CESAF) is developed and tested using monthly streamflow data from 19 river basins covering a broad range of physiographic characteristics. Testing shows that CESAF captures streamflow seasonality and satisfactorily forecasts both high and low flows. When relative drainage area is considered for analyzing streamflow characteristics and spectral patterns, it is found that upstream streamflow is forecasted more accurately than downstream streamflow. Minimum relative entropy spectral analysis (MRESA) is developed under two conditions: spectral power as a random variable (RESAS) and frequency as a random variable (RESAF). The exponential distribution was chosen as a prior probability in the RESAS theory, and in the RESAF theory, the prior is chosen from the periodicity of streamflow. Both MRESA theories were evaluated using monthly streamflow observed at 20 stations in the Mississippi River basin, where forecasted monthly streamflow shows higher reliability in the Upper Mississippi than in the Lower Mississippi. The proposed univariate entropy spectral analyses are generally recommended over the classical autoregressive (AR) process for higher reliability and longer forecasting lead time. By comparing two MRESA theories with the two maximum entropy spectral analyses (MESA) (BESA and CESA), it is found that MRESA provided higher resolution in spectral estimation and more reliable streamflow forecasting, especially for multi-peak flow conditions. The MRESA theory is more accurate in forecasting streamflow for both peak and low flow values with longer lead time than MESA. Besides, choosing frequency as a random variable shows advantages over choosing spectral power. Spectral density estimated by the RESAF or CESAF theory shows higher resolution than the RESAS or BESA theory, respectively, and streamflow forecasted by RESAF or CESAF is more reliable than that by RESAS or BESA, respectively. Finally, multi-channel entropy spectral analysis (MCESA) is developed for bivariate or multi-variate time series forecasting. MCESA theory is verified by forecasting long-term standardized streamflow index with El Nino Southern Oscillation (ENSO) indicator. SSI was successfully forecasted using multi-channel spectral analysis with ENSO as an indicator. The monthly drought series is forecasted for lead times of 4-6 years by MCESA
    • …
    corecore