Search CORE

612 research outputs found

Exploiting correlogram structure for robust speech recognition with multiple speech sources

Author: Barker J.
Coy A.
Green P.
Ma N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy

CiteSeerX

White Rose Research Online

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)

Blind identification of acoustic systems and enhancement of reverberant speech

Author: Gaubitch Nikolay Dian
Gaubitch Nikolay Dian
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Dublin City University video track experiments for TREC 2002

Author: Browne Paul
Czirjék Csaba
Gurrin Cathal
Jarina Roman
Lee Hyowon
Marlow Seán
McDonald Kieran
Murphy Noel
O'Connor Noel E.
Smeaton Alan F.
Ye Jiamin
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2002
Field of study

Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems

DCU Online Research Access Service

Wideband frequency domain detection using Teager-Kaiser energy operator

Author: Berg Vincent
Gautier Matthieu
Noguet Dominique
Publication venue: HAL CCSD
Publication date: 18/06/2012
Field of study

International audienceThis paper addresses wireless microphone sensing in the TV white space and efficient detection of narrowband FM modulation signals. To this end, a wideband frequency domain analysis is proposed. The required Fast Fourier Transform for this operation may be shared between sensing analysis and modulation functions. A particular decision metric is then studied for the analysis of wireless microphone signals based on the Teager-Kaiser energy operator. Simulation results show that 6 dB of detection gain could be achieved when using a frequency domain analysis compared to time domain methods. The Teager-Kaiser detection leads to further improvement of 1.5 dB. This performance could be reached at no extra cost in term of complexity

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Rennes 1

Nonstationary Signal Processing with Application to Reverberation Cancellation in Acoustic Environments

Author: Hopgood James
Publication venue
Publication date: 01/04/2001
Field of study

Edinburgh Research Explorer

Recommended from our members

Noise Robust Pitch Tracking by Subband Autocorrelation Classification

Author: Lee Byung Suk
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks

Columbia University Academic Commons