Search CORE

1,174 research outputs found

Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Author: Maganti Hari Krishna
Matassoni Marco
Publication venue
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

Author: A Mohamed
A Sehr
B Atal
B Cauchi
BH Juang
BT Meyer
D Povey
EAP Habets
G Hinton
G Langner
I Kodrasi
K Lebart
KE Muller
MR Schroeder
MR Schädler
N Moritz
R Martin
SB David
T Dau
T Gerkmann
T Nakatani
T Yoshioka
Y Ephraim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Synthetic speech detection and audio steganography in VoIP scenarios

Author: Capolupo Daniele
D'AMORE Fabrizio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The distinction between synthetic and human voice uses the techniques of the current biometric voice recognition systems, which prevent that a person’s voice, no matter if with good or bad intentions, can be confused with someone else’s. Steganography gives the possibility to hide in a file without a particular value (usually audio, video or image files) a hidden message in such a way as to not rise suspicion to any external observer. This article suggests two methods, applicable in a VoIP hypothetical scenario, which allow us to distinguish a synthetic speech from a human voice, and to insert within the Comfort Noise a text message generated in the pauses of a voice conversation. The first method takes up the studies already carried out for the Modulation Features related to the temporal analysis of the speech signals, while the second one proposes a technique that derives from the Direct Sequence Spread Spectrum, which consists in distributing the signal energy to hide on a wider band transmission. Due to space limits, this paper is only an extended abstract. The full version will contain further details on our research

Archivio della ricerca- Università di Roma La Sapienza

Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

Author: Calle Silos Fernando de la
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Valverde Albacete Francisco José
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this paper, we present advances in the modeling of the masking behavior of the human auditory system (HAS) to enhance the robustness of the feature extraction stage in automatic speech recognition (ASR). The solution adopted is based on a nonlinear filtering of a spectro-temporal representation applied simultaneously to both frequency and time domains-as if it were an image-using mathematical morphology operations. A particularly important component of this architecture is the so-called structuring element (SE) that in the present contribution is designed as a single three-dimensional pattern using physiological facts, in such a way that closely resembles the masking phenomena taking place in the cochlea. A proper choice of spectro-temporal representation lends validity to the model throughout the whole frequency spectrum and intensity spans assuming the variability of the masking properties of the HAS in these two domains. The best results were achieved with the representation introduced as part of the power normalized cepstral coefficients (PNCC) together with a spectral subtraction step. This method has been tested on Aurora 2, Wall Street Journal and ISOLET databases including both classical hidden Markov model (HMM) and hybrid artificial neural networks (ANN)-HMM back-ends. In these, the proposed front-end analysis provides substantial and significant improvements compared to baseline techniques: up to 39.5% relative improvement compared to MFCC, and 18.7% compared to PNCC in the Aurora 2 database.This contribution has been supported by an Airbus Defense and Space Grant (Open Innovation - SAVIER) and Spanish Government-CICYT projects TEC2014-53390-P and TEC2014-61729-EX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Temporal Filterbanks in Cochlear Implant Hearing and Deep Learning Simulations

Author: Lin Payton
Publication venue: 'IntechOpen'
Publication date: 29/03/2017
Field of study

The masking phenomenon has been used to investigate cochlear excitation patterns and has even motivated audio coding formats for compression and speech processing. For example, cochlear implants rely on masking estimates to filter incoming sound signals onto an array. Historically, the critical band theory has been the mainstay of psychoacoustic theory. However, masked threshold shifts in cochlear implant users show a discrepancy between the observed critical bandwidths, suggesting separate roles for place location and temporal firing patterns. In this chapter, we will compare discrimination tasks in the spectral domain (e.g., power spectrum models) and the temporal domain (e.g., temporal envelope) to introduce new concepts such as profile analysis, temporal critical bands, and transition bandwidths. These recent findings violate the fundamental assumptions of the critical band theory and could explain why the masking curves of cochlear implant users display spatial and temporal characteristics that are quite unlike that of acoustic stimulation. To provide further insight, we also describe a novel analytic tool based on deep neural networks. This deep learning system can simulate many aspects of the auditory system, and will be used to compute the efficiency of spectral filterbanks (referred to as “FBANK”) and temporal filterbanks (referred to as “TBANK”)

IntechOpen

A bio-inspired feature extraction for robust speech recognition

Author: BCJ Moore
BR Glasberg
BS Atal
BS Atal
C Nadeu
DL Wang
H Beigi
H Hermansky
H Hirsch
J Garofolo
JP Martens
L Rabiner
LM Van Immerseel
M Unokia
R Meddis
RD Patterson
RF Lyon
S Bleeck
S Furui
S Young
SB Davis
T Irino
T Irino
Y Zouhir
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation

Author: Chan Wai Yip Geoffrey
Edraki Amin
Fogerty Daniel
Jensen Jesper
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2019
Field of study

Crossref

VBN