46,644 research outputs found
A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition
We study large-scale kernel methods for acoustic modeling and compare to DNNs
on performance metrics related to both acoustic modeling and recognition.
Measuring perplexity and frame-level classification accuracy, kernel-based
acoustic models are as effective as their DNN counterparts. However, on
token-error-rates DNN models can be significantly better. We have discovered
that this might be attributed to DNN's unique strength in reducing both the
perplexity and the entropy of the predicted posterior probabilities. Motivated
by our findings, we propose a new technique, entropy regularized perplexity,
for model selection. This technique can noticeably improve the recognition
performance of both types of models, and reduces the gap between them. While
effective on Broadcast News, this technique could be also applicable to other
tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400
Modeling and forecasting ocean acoustic conditions
Author Posting. © The Author, 2017. This article is posted here by permission of Sears Foundation for Marine Research for personal use, not for redistribution. The definitive version was published in Journal of Marine Research 75 (2017): 435–457, doi:10.1357/002224017821836734.Modeling acoustic conditions in an oceanic environment is a multiple-step process. The environmental
conditions (features) in the area first must be measured or estimated; relevant features include
seabed geometry, seabed composition, and four-dimensionally (4D) variable sound-speed and density
variations related to evolving or wave motions. Often the dynamical wave modeling depends on
first obtaining correct seabed and mean stratification conditions (for example, nonlinear internal wave
modeling). Next, this information must be included in sound propagation modeling. A selection of the
many methods and tools available for these tasks are described, with a focus on modeling sounds of 20
to 1000 Hz propagating through water-column features that are time-dependent and variable in three
dimensions (i.e., 4D variable). An example of a 3D parabolic equation acoustic calculation shows how
variability caused by evolving internal tidal waves affects sound propagation. Different propagation
and scattering regimes are discussed, including the theoretically delineated weak scattering and strong
scattering regimes, as well as the empirically examined regime found in nonlinear internal waves.
The histories and the current state of our oceanographic knowledge (the input to acoustic modeling)
and of our ability to effectively model complex acoustic conditions are discussed. Example acoustic
simulation applications are also discussed; these are ocean acoustic tomography, coherence prediction,
and signal-to-noise ratio prediction. Types of ocean models and acoustic models and how they
are interfaced are also examined. These include deterministic, statistical analytic feature models.Funding for this work was provided by the U.S. Office of Naval Research,
Ocean Acoustics Program, Grants N-00014-11-1-0701 and N00014-14-1-0223
Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition
We propose to model the acoustic space of deep neural network (DNN)
class-conditional posterior probabilities as a union of low-dimensional
subspaces. To that end, the training posteriors are used for dictionary
learning and sparse coding. Sparse representation of the test posteriors using
this dictionary enables projection to the space of training data. Relying on
the fact that the intrinsic dimensions of the posterior subspaces are indeed
very small and the matrix of all posteriors belonging to a class has a very low
rank, we demonstrate how low-dimensional structures enable further enhancement
of the posteriors and rectify the spurious errors due to mismatch conditions.
The enhanced acoustic modeling method leads to improvements in continuous
speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in
both clean and noisy conditions, where upto 15.4% relative reduction in word
error rate (WER) is achieved
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
An Improved Variable Structure Adaptive Filter Design and Analysis for Acoustic Echo Cancellation
In this research an advance variable structure adaptive Multiple Sub-Filters (MSF) based algorithm for single channel Acoustic Echo Cancellation (AEC) is proposed and analyzed. This work suggests a new and improved direction to find the optimum tap-length of adaptive filter employed for AEC. The structure adaptation, supported by a tap-length based weight update approach helps the designed echo canceller to maintain a trade-off between the Mean Square Error (MSE) and time taken to attain the steady state MSE. The work done in this paper focuses on replacing the fixed length sub-filters in existing MSF based AEC algorithms which brings refinements in terms of convergence, steady state error and tracking over the single long filter, different error and common error algorithms. A dynamic structure selective coefficient update approach to reduce the structural and computational cost of adaptive design is discussed in context with the proposed algorithm. Simulated results reveal a comparative performance analysis over proposed variable structure multiple sub-filters designs and existing fixed tap-length sub-filters based acoustic echo cancellers
Improving elevation perception with a tool for image-guided head-related transfer function selection
This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection
- …