Search CORE

1,044 research outputs found

Interpretation of Multiparty Meetings: The AMI and AMIDA Projects

Author: Bourlard Herve
Hain Thomas
Renals Steve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The AMI and AMIDA projects are collaborative EU projects concerned with the automatic recognition and interpretation of multiparty meetings. This paper provides an overview of the advances we have made in these projects with a particular focus on the multimodal recording infrastructure, the publicly available AMI corpus of annotated meeting recordings, and the speech recognition framework that we have developed for this domain

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Author: Chen Zhuo
Kanda Naoyuki
Li Jinyu
Wang Xiaofei
Wu Jian
Yoshioka Takuya
Publication venue
Publication date: 03/10/2022
Field of study

This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on independently developed two recent technologies; array-geometry-agnostic continuous speech separation, or VarArray, and streaming multi-talker ASR based on token-level serialized output training (t-SOT). To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray. We also propose a pre-training scheme for such an ASR model where we simulate VarArray's output signals based on monaural single-talker ASR training data. Conversation transcription experiments using the AMI meeting corpus show that the system based on the proposed framework significantly outperforms conventional ones. Our system achieves the state-of-the-art word error rates of 13.7% and 15.5% for the AMI development and evaluation sets, respectively, in the multiple-distant-microphone setting while retaining the streaming inference capability.Comment: 6 pages, 2 figure, 3 tables, v2: Appendix A has been adde

arXiv.org e-Print Archive

Recommended from our members

Mapping the Klangdom Live: Cartographies for piano with two performers and electronics

Author: Aaron Einbond
Alessandro C.
Einbond A.
Einbond A.
Einbond A.
Einbond A.
Ellison S.
Harker A.
Otondo F.
Pulkki V.
Schnell N.
Schwarz D.
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2017
Field of study

The use of high-density loudspeaker arrays (HDLAs) has recently experienced rapid growth in a wide variety of technical and aesthetic approaches. Still less explored, however, are applications to interactive music with live acoustic instruments. How can immersive spatialization accompany an instrument already with its own rich spatial diffusion pattern, like the grand piano, in the context of a score-based concert work? Potential models include treating the spatialized electronic sound in analogy to the diffusion pattern of the instrument, with spatial dimensions parametrized as functions of timbral features. Another approach is to map the concert hall as a three-dimensional projection of the instrument’s internal physical layout, a kind of virtual sonic microscope. Or, the diffusion of electronic spatial sound can be treated as an independent polyphonic element, complementary to but not dependent upon the instrument’s own spatial characteristics. Cartographies (2014), for piano with two performers and electronics, explores each of these models individually and in combination, as well as their technical implementation with the Meyer Sound Matrix3 system of the Su ̈ dwestrundfunk Experimentalstudio in Freiburg, Germany, and the 43.4-channel Klangdom of the Institut fu ̈ r Musik und Akustik at the Zentrum fu ̈ r Kunst und Media in Karlsruhe, Germany. The process of composing, producing, and performing the work raises intriguing questions, and invaluable hints, for the composition and performance of live interactive works with HDLAs in the future

City Research Online

Crossref

Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

Author: Bermell-Garcia P.
Hall M.
Johansson A.
McMahon C. A.
Ravindranath Ranjitun
Publication venue: 'Faculty of Mechanical Engineering and Naval Architecture, Univ. of Zagreb'
Publication date: 01/01/2018
Field of study

Crossref

Online Research Database In Technology

Learning to Rank Microphones for Distant Speech Recognition

Author: Brutti Alessio
Cornell Samuele
Matassoni Marco
Squartini Stefano
Publication venue
Publication date: 01/01/2021
Field of study

Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue. Empirical evidence shows that being able to select the best microphone leads to significant improvements in recognition without any additional effort on front-end processing. Current channel selection techniques either rely on signal, decoder or posterior-based features. Signal-based features are inexpensive to compute but do not always correlate with recognition performance. Instead decoder and posterior-based features exhibit better correlation but require substantial computational resources. In this work, we tackle the channel selection problem by proposing MicRank, a learning to rank framework where a neural network is trained to rank the available channels using directly the recognition performance on the training set. The proposed approach is agnostic with respect to the array geometry and type of recognition back-end. We investigate different learning to rank strategies using a synthetic dataset developed on purpose and the CHiME-6 data. Results show that the proposed approach is able to considerably improve over previous selection techniques, reaching comparable and in some instances better performance than oracle signal-based measures

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Author: Chen Qian
Chen Yafeng
Cheng Luyao
Wang Hui
Zheng Siqi
Publication venue
Publication date: 27/06/2023
Field of study

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io

arXiv.org e-Print Archive

Evaluation of room acoustic qualities and defects by use of auralization

Author: Rindel Jens Holger
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2004
Field of study

Crossref

Online Research Database In Technology