Search CORE

4 research outputs found

Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

Author: Debnath Saswati
González-Crespo Rubén
Namasudra Suyel
Roy Pinki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2023
Field of study

Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition

Re-UNIR

Οπτικοακουστική επεξεργασία φωνής

Author: Μπουγλός Σταύρος Ν.
Publication venue
Publication date: 01/01/2014
Field of study

University of Thessaly Institutional Repository

Audio-visual speech recognition using depth information from the Kinect in noisy video conditions

Author: Galatas G.
Makedon F.
Potamianos G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

In this paper we build on our recent work, where we successfully incorporated facial depth data of a speaker captured by the Microsoft Kinect device, as a third data stream in an audio-visual automatic speech recognizer. In particular, we focus our interest on whether the depth stream provides sufficient speech information that can improve system robustness to noisy audio-visual conditions, thus studying system operation beyond the traditional scenarios, where noise is applied to the audio signal alone. For this purpose, we consider four realistic visual modality degradations at various noise levels, and we conduct small-vocabulary recognition experiments on an appropriate, previously collected, audiovisual database. Our results demonstrate improved system performance due to the depth modality, as well as considerable accuracy increase, when using both the visual and depth modalities over audio only speech recognition

Crossref

University of Thessaly Institutional Repository

A Survey of Applications and Human Motion Recognition with Microsoft Kinect

Author: Lun Roanna
Zhao Wenbing
Publication venue: EngagedScholarship@CSU
Publication date: 09/07/2015
Field of study

Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

Crossref

Cleveland-Marshall College of Law