Search CORE

4,642 research outputs found

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

Author: Eshky Aciel
Renals Steve
Ribeiro Manuel Sam
Richmond Korin
Sanger Jennifer
Wrench Alan
Zhang Jing-Xuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2020
Field of study

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos. TaL consists of two parts: TaL1 is a set of six recording sessions of one professional voice talent, a male native speaker of English; TaL80 is a set of recording sessions of 81 native speakers of English without voice talent experience. Overall, the corpus contains 24 hours of parallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. This paper describes the corpus and presents benchmark results for the tasks of speech recognition, speech synthesis (articulatory-to-acoustic mapping), and automatic synchronisation of ultrasound to audio. The TaL corpus is publicly available under the CC BY-NC 4.0 license.Comment: 8 pages, 4 figures, Accepted to SLT2021, IEEE Spoken Language Technology Worksho

arXiv.org e-Print Archive

Edinburgh Research Explorer

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Author: Fang Shitao
Rekimoto Jun
Su Zixiong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/02/2023
Field of study

Silent speech interface is a promising technology that enables private communications in natural language. However, previous approaches only support a small and inflexible vocabulary, which leads to limited expressiveness. We leverage contrastive learning to learn efficient lipreading representations, enabling few-shot command customization with minimal user effort. Our model exhibits high robustness to different lighting, posture, and gesture conditions on an in-the-wild dataset. For 25-command classification, an F1-score of 0.8947 is achievable only using one shot, and its performance can be further boosted by adaptively learning from more data. This generalizability allowed us to develop a mobile silent speech interface empowered with on-device fine-tuning and visual keyword spotting. A user study demonstrated that with LipLearner, users could define their own commands with high reliability guaranteed by an online incremental learning scheme. Subjective feedback indicated that our system provides essential functionalities for customizable silent speech interactions with high usability and learnability.Comment: Conditionally accepted to the ACM CHI Conference on Human Factors in Computing Systems 2023 (CHI '23

arXiv.org e-Print Archive

Sound classification using evolving ensemble models and Particle Swarm Optimization

Author: Jiang Ming
Lim Chee Peng
Yu Yonghong
Zhang Li
Publication venue: 'Elsevier BV'
Publication date: 08/01/2022
Field of study

Royal Holloway - Pure