46,644 research outputs found

    A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

    Get PDF
    We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

    Modeling and forecasting ocean acoustic conditions

    Get PDF
    Author Posting. © The Author, 2017. This article is posted here by permission of Sears Foundation for Marine Research for personal use, not for redistribution. The definitive version was published in Journal of Marine Research 75 (2017): 435–457, doi:10.1357/002224017821836734.Modeling acoustic conditions in an oceanic environment is a multiple-step process. The environmental conditions (features) in the area first must be measured or estimated; relevant features include seabed geometry, seabed composition, and four-dimensionally (4D) variable sound-speed and density variations related to evolving or wave motions. Often the dynamical wave modeling depends on first obtaining correct seabed and mean stratification conditions (for example, nonlinear internal wave modeling). Next, this information must be included in sound propagation modeling. A selection of the many methods and tools available for these tasks are described, with a focus on modeling sounds of 20 to 1000 Hz propagating through water-column features that are time-dependent and variable in three dimensions (i.e., 4D variable). An example of a 3D parabolic equation acoustic calculation shows how variability caused by evolving internal tidal waves affects sound propagation. Different propagation and scattering regimes are discussed, including the theoretically delineated weak scattering and strong scattering regimes, as well as the empirically examined regime found in nonlinear internal waves. The histories and the current state of our oceanographic knowledge (the input to acoustic modeling) and of our ability to effectively model complex acoustic conditions are discussed. Example acoustic simulation applications are also discussed; these are ocean acoustic tomography, coherence prediction, and signal-to-noise ratio prediction. Types of ocean models and acoustic models and how they are interfaced are also examined. These include deterministic, statistical analytic feature models.Funding for this work was provided by the U.S. Office of Naval Research, Ocean Acoustics Program, Grants N-00014-11-1-0701 and N00014-14-1-0223

    Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

    Get PDF
    We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved

    DNN adaptation by automatic quality estimation of ASR hypotheses

    Full text link
    In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

    An Improved Variable Structure Adaptive Filter Design and Analysis for Acoustic Echo Cancellation

    Get PDF
    In this research an advance variable structure adaptive Multiple Sub-Filters (MSF) based algorithm for single channel Acoustic Echo Cancellation (AEC) is proposed and analyzed. This work suggests a new and improved direction to find the optimum tap-length of adaptive filter employed for AEC. The structure adaptation, supported by a tap-length based weight update approach helps the designed echo canceller to maintain a trade-off between the Mean Square Error (MSE) and time taken to attain the steady state MSE. The work done in this paper focuses on replacing the fixed length sub-filters in existing MSF based AEC algorithms which brings refinements in terms of convergence, steady state error and tracking over the single long filter, different error and common error algorithms. A dynamic structure selective coefficient update approach to reduce the structural and computational cost of adaptive design is discussed in context with the proposed algorithm. Simulated results reveal a comparative performance analysis over proposed variable structure multiple sub-filters designs and existing fixed tap-length sub-filters based acoustic echo cancellers

    Improving elevation perception with a tool for image-guided head-related transfer function selection

    Get PDF
    This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection
    • …
    corecore