Search CORE

46,644 research outputs found

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

Author: Bellet Aurelien
Collins Michael
Fan Linxi
Garakani Alireza Bagheri
Guo Dong
Kingsbury Brian
Liu Kuan
Lu Zhiyun
May Avner
Picheny Michael
Sha Fei
Publication venue
Publication date: 18/03/2016
Field of study

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot

Modeling and forecasting ocean acoustic conditions

Author: Duda Timothy F.
Publication venue: 'Journal of Marine Research/Yale'
Publication date: 01/01/2017
Field of study

Author Posting. © The Author, 2017. This article is posted here by permission of Sears Foundation for Marine Research for personal use, not for redistribution. The definitive version was published in Journal of Marine Research 75 (2017): 435–457, doi:10.1357/002224017821836734.Modeling acoustic conditions in an oceanic environment is a multiple-step process. The environmental conditions (features) in the area first must be measured or estimated; relevant features include seabed geometry, seabed composition, and four-dimensionally (4D) variable sound-speed and density variations related to evolving or wave motions. Often the dynamical wave modeling depends on first obtaining correct seabed and mean stratification conditions (for example, nonlinear internal wave modeling). Next, this information must be included in sound propagation modeling. A selection of the many methods and tools available for these tasks are described, with a focus on modeling sounds of 20 to 1000 Hz propagating through water-column features that are time-dependent and variable in three dimensions (i.e., 4D variable). An example of a 3D parabolic equation acoustic calculation shows how variability caused by evolving internal tidal waves affects sound propagation. Different propagation and scattering regimes are discussed, including the theoretically delineated weak scattering and strong scattering regimes, as well as the empirically examined regime found in nonlinear internal waves. The histories and the current state of our oceanographic knowledge (the input to acoustic modeling) and of our ability to effectively model complex acoustic conditions are discussed. Example acoustic simulation applications are also discussed; these are ocean acoustic tomography, coherence prediction, and signal-to-noise ratio prediction. Types of ocean models and acoustic models and how they are interfaced are also examined. These include deterministic, statistical analytic feature models.Funding for this work was provided by the U.S. Office of Naval Research, Ocean Acoustics Program, Grants N-00014-11-1-0701 and N00014-14-1-0223

Yale University

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

Author: Asaei Afsaneh
Bourlard Herve
Dighe Pranay
Luyet Gil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/01/2016
Field of study

We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved

arXiv.org e-Print Archive

DNN adaptation by automatic quality estimation of ASR hypotheses

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

An Improved Variable Structure Adaptive Filter Design and Analysis for Acoustic Echo Cancellation

Author: Chandra M.
Kar A.
Publication venue: 'Brno University of Technology'
Publication date: 01/04/2015
Field of study

In this research an advance variable structure adaptive Multiple Sub-Filters (MSF) based algorithm for single channel Acoustic Echo Cancellation (AEC) is proposed and analyzed. This work suggests a new and improved direction to find the optimum tap-length of adaptive filter employed for AEC. The structure adaptation, supported by a tap-length based weight update approach helps the designed echo canceller to maintain a trade-off between the Mean Square Error (MSE) and time taken to attain the steady state MSE. The work done in this paper focuses on replacing the fixed length sub-filters in existing MSF based AEC algorithms which brings refinements in terms of convergence, steady state error and tracking over the single long filter, different error and common error algorithms. A dynamic structure selective coefficient update approach to reduce the structural and computational cost of adaptive design is discussed in context with the proposed algorithm. Simulated results reveal a comparative performance analysis over proposed variable structure multiple sub-filters designs and existing fixed tap-length sub-filters based acoustic echo cancellers

Directory of Open Access Journals

Digital library of Brno University of Technology

Improving elevation perception with a tool for image-guided head-related transfer function selection

Author: Avanzini Federico
Geronazzo Michele
Peruch Enrico
Prandoni Fabio
Publication venue: University of Edinburgh
Publication date: 01/01/2017
Field of study

This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection

Archivio istituzionale della ricerca - Università di Padova