1,382 research outputs found
BaNa: a noise resilient fundamental frequency detection algorithm for speech and music
Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.Peer ReviewedPostprint (author's final draft
Recommended from our members
Systems and methods for physiological signal enhancement and biometric extraction using non-invasive optical sensors
A system and method for signal processing to remove unwanted noise components including: (i) wavelength-independent motion artifacts such as tissue, bone and skin effects, and (ii) wavelength-dependent motion artifact/noise components such as venous blood pulsation and movement due to various sources including muscle pump, respiratory pump and physical perturbation. Disclosed are methods, analytics, and their uses for reliable perfusion monitoring, arterial oxygen saturation monitoring, heart rate monitoring during daily activities and in hospital settings and for extraction of physiological parameters such as respiration information, hemodynamic parameters, venous capacity, and fluid responsiveness. The system and methods disclosed are extendable to include monitoring platforms for perfusion, hypoxia, arrhythmia detection, airway obstruction detection and sleep disorders including apnea.Board of Regents, University of Texas Syste
DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION
Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima
Exploiting correlogram structure for robust speech recognition with multiple speech sources
This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as
tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay
that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local
pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together
with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions,
which results in significantly better recognition accuracy
Model-Based Speech Enhancement
Abstract
A method of speech enhancement is developed that reconstructs clean speech from
a set of acoustic features using a harmonic plus noise model of speech. This is a significant
departure from traditional filtering-based methods of speech enhancement.
A major challenge with this approach is to estimate accurately the acoustic features
(voicing, fundamental frequency, spectral envelope and phase) from noisy speech.
This is achieved using maximum a-posteriori (MAP) estimation methods that operate
on the noisy speech. In each case a prior model of the relationship between the
noisy speech features and the estimated acoustic feature is required. These models
are approximated using speaker-independent GMMs of the clean speech features
that are adapted to speaker-dependent models using MAP adaptation and for noise
using the Unscented Transform.
Objective results are presented to optimise the proposed system and a set of subjective
tests compare the approach with traditional enhancement methods. Threeway
listening tests examining signal quality, background noise intrusiveness and
overall quality show the proposed system to be highly robust to noise, performing
significantly better than conventional methods of enhancement in terms of background
noise intrusiveness. However, the proposed method is shown to reduce signal
quality, with overall quality measured to be roughly equivalent to that of the Wiener
filter
Time-Resolved Method for Spectral Analysis based on Linear Predictive Coding, with Application to EEG Analysis
EEG (Electroencephalogram) signal is a biological signal in BCI (Brain-Computer Interface) systems to realise the information exchange between the brain and the external environment. It is characterised by a poor signal-to-noise ratio, is time-varying, is intermittent and contains multiple frequency components. This research work has developed a new parameterised time-frequency method called the Linear Predictive Coding Pole Processing (LPCPP) method which can be used for identifying and tracking the dominant frequency components of an EEG signal. The LPCPP method further processes LPC (Linear Predictive Coding) poles to produce a series of reduced-order filter transfer functions to estimate the dominant frequencies. It is suited for processing high-noise multi-component signals and can directly give the corresponding frequency estimates unlike transform-based methods. Furthermore, a new EEG spectral analysis framework involving the LPCPP method is proposed to describe the EEG spectral activity. The EEG signal has been divided into different frequency bands (i.e. Delta, Theta, Alpha, Beta and Gamma). However, there is no consensus on the definitions of these band boundaries. A series of EEG centre frequencies are proposed in this thesis instead of fixed frequency boundaries, as they are better suited to describe the dominant EEG spectral activity
- …