39,659 research outputs found
Exploration of audiovisual heritage using audio indexing technology
This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
Far-field speech recognition in noisy and reverberant conditions remains a
challenging problem despite recent deep learning breakthroughs. This problem is
commonly addressed by acquiring a speech signal from multiple microphones and
performing beamforming over them. In this paper, we propose to use a recurrent
neural network with long short-term memory (LSTM) architecture to adaptively
estimate real-time beamforming filter coefficients to cope with non-stationary
environmental noise and dynamic nature of source and microphones positions
which results in a set of timevarying room impulse responses. The LSTM adaptive
beamformer is jointly trained with a deep LSTM acoustic model to predict senone
labels. Further, we use hidden units in the deep LSTM acoustic model to assist
in predicting the beamforming filter coefficients. The proposed system achieves
7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real
evaluation set.Comment: in 2017 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
- âŠ