Search CORE

1,382 research outputs found

BaNa: a noise resilient fundamental frequency detection algorithm for speech and music

Author: Ba He
Cai Weiyang
Heinzelman Wendi
Seyfettin Demirkol Ilker
Yang Na
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Systems and methods for physiological signal enhancement and biometric extraction using non-invasive optical sensors

Author: Nourani Mehrdad
Yousefi Rasoul
Publication venue: United States Patent and Trademark Office
Publication date: 20/03/2018
Field of study

A system and method for signal processing to remove unwanted noise components including: (i) wavelength-independent motion artifacts such as tissue, bone and skin effects, and (ii) wavelength-dependent motion artifact/noise components such as venous blood pulsation and movement due to various sources including muscle pump, respiratory pump and physical perturbation. Disclosed are methods, analytics, and their uses for reliable perfusion monitoring, arterial oxygen saturation monitoring, heart rate monitoring during daily activities and in hospital settings and for extraction of physiological parameters such as respiration information, hemodynamic parameters, venous capacity, and fluid responsiveness. The system and methods disclosed are extendable to include monitoring platforms for perfusion, hypoxia, arrhythmia detection, airway obstruction detection and sleep disorders including apnea.Board of Regents, University of Texas Syste

Texas ScholarWorks

DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION

Author: Venkatasubramanian Arvind
Publication venue: Scholarship@Western
Publication date: 01/01/2011
Field of study

Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima

Scholarship@Western

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Author: Barker J.
Coy A.
Green P.
Ma N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy

CiteSeerX

White Rose Research Online

Model-Based Speech Enhancement

Author: Harding Philip
Publication venue
Publication date: 01/07/2013
Field of study

Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter

University of East Anglia digital repository

Techniques for the enhancement of linear predictive speech coding in adverse conditions

Author: Wrench Alan A.
Publication venue: The University of Edinburgh
Publication date: 01/01/1989
Field of study

Edinburgh Research Archive

Time-Resolved Method for Spectral Analysis based on Linear Predictive Coding, with Application to EEG Analysis

Author: Xu Jin
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

EEG (Electroencephalogram) signal is a biological signal in BCI (Brain-Computer Interface) systems to realise the information exchange between the brain and the external environment. It is characterised by a poor signal-to-noise ratio, is time-varying, is intermittent and contains multiple frequency components. This research work has developed a new parameterised time-frequency method called the Linear Predictive Coding Pole Processing (LPCPP) method which can be used for identifying and tracking the dominant frequency components of an EEG signal. The LPCPP method further processes LPC (Linear Predictive Coding) poles to produce a series of reduced-order filter transfer functions to estimate the dominant frequencies. It is suited for processing high-noise multi-component signals and can directly give the corresponding frequency estimates unlike transform-based methods. Furthermore, a new EEG spectral analysis framework involving the LPCPP method is proposed to describe the EEG spectral activity. The EEG signal has been divided into different frequency bands (i.e. Delta, Theta, Alpha, Beta and Gamma). However, there is no consensus on the definitions of these band boundaries. A series of EEG centre frequencies are proposed in this thesis instead of fixed frequency boundaries, as they are better suited to describe the dominant EEG spectral activity

Arrow@TUDublin