Search CORE

8,021 research outputs found

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref

Lexical and audiovisual bases of perceptual adaptation in speech

Author: Ullas Shruti
Publication venue: 'University of Maastricht'
Publication date: 01/01/2020
Field of study

Maastricht University Research Portal

Artificial Intelligent-Based Wake Word Detection at Edge Device

Author: Abdul Hamid Shabinar
Abdullah Samihah
Fadhlullah Solahuddin Yusuf
Kamarulazizi Khadijah
Shen Lau Khai
Publication venue: 'Penerbit UTHM'
Publication date: 28/12/2023
Field of study

Deep Neural Network based wake word (such as Hi Alexa or Hey Siri) systems allow increasingly accurate speech communication between humans and machines. However, this setup requires high processing power or cloud services which may not be accessible by edge devices. Currently, the accuracy of machine learning methods for cloudless edge devices in voice activation hovers below 90%. This paper explores wake word implementation on edge devices using a 2-Dimensional Convolutional Neural Network (CNN) with improved and balanced accuracy and latency. The proposed CNN model is created, trained and quantized using TensorFlow on a PC and exported to a Raspberry Pi Zero 2 W. The quantization method reduces the model size by 20% and spectral gating is adopted to lower wake word inaccuracy detection in moderately noisy environment. The proposed system achieved more than 90% wake word detection accuracy across 30 to 50 dB background noise with an average of 1.03 second of response time for the intended user. The result shows low-powered edge device still offers competitive performance for detecting wake word without cloud services

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Visual speech recognition and utterance segmentation based on mouth movement

Author: Kumar D
Weghorn H
Yau W
Publication venue: IEEE (Piscataway, USA)
Publication date: 01/01/2007
Field of study

This paper presents a vision-based approach to recognize speech without evaluating the acoustic signals. The proposed technique combines motion features and support vector machines (SVMs) to classify utterances. Segmentation of utterances is important in a visual speech recognition system. This research proposes a video segmentation method to detect the start and end frames of isolated utterances from an image sequence. Frames that correspond to `speaking' and `silence' phases are identified based on mouth movement information. The experimental results demonstrate that the proposed visual speech recognition technique yields high accuracy in a phoneme classification task. Potential applications of such a system are, e.g., human computer interface (HCI) for mobility-impaired users, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments

RMIT Research Repository

Automatic Visual Speech Recognition

Author: Alin Chiţu
Léon J.M. Rothkrantz
Publication venue: 'IntechOpen'
Publication date: 03/03/2012
Field of study

Intelligent SystemsElectrical Engineering, Mathematics and Computer Scienc

IntechOpen

Crossref

TU Delft Repository

Considering the User in the Wireless World

Author: Aftelak A
Anneroth M
Crisler K
Dainesi E
Kalliokulju S
Rantzer M
Sasse M
Steinhage A
Tscheligi M
Turner T
Visciola M
von Niman B
Zucchella A
Publication venue
Publication date: 01/09/2004
Field of study

The near future promises significant advances in communication capabilities, but one of the keys to success is the capability understanding of the people with regards to its value and usage. In considering the role of the user in the wireless world of the future, the Human Perspective Working Group (WG1) of the Wireless World Research Forum has gathered input and developed positions in four important areas: methods, processes, and best practices for user-centered research and design; reference frameworks for modeling user needs within the context of wireless systems; user scenario creation and analysis; and user interaction technologies. This article provides an overview of WG1's work in these areas that are critical to ensuring that the future wireless world meets and exceeds the expectations of people in the coming decades

UCL Discovery