Search CORE

1,867 research outputs found

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

Author: Bu Hui
Du Jiayu
Na Xingyu
Wu Bengu
Zheng Hao
Publication venue
Publication date: 16/09/2017
Field of study

An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon are described. The corpus is released with a Kaldi recipe. Experimental results implies that the quality of audio recordings and transcriptions are promising.Comment: Oriental COCOSDA 201

arXiv.org e-Print Archive

Crossref

Speech recognition for smart homes

Author: McLoughlin Ian Vince
Sharifzadeh Hamid Reza
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Kent Academic Repository

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref

Arabic Dialectical Speech Recognition in Mobile Communication Services

Author: Imed Zitouni
Qiru Zhou
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Design of an embedded speech-centric interface for applications in handheld terminals

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
García Moral Ana Isabel
Pereiro Estevan Yago
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2013
Field of study

The embedded speech-centric interface for handheld wireless devices has been implemented on a commercially available PDA as a part of an application that allows real-time access to stock prices through GPRS. In this article, we have focused mainly in the optimization of the ASR subsystem for minimizing the use of the handheld computational resources. This optimization has been accomplished through the fixed-point implementation of all the algorithms involved in the ASR subsystem and the use of PCA to reduce the feature vector dimensionality. The influence of several parameters, such as the Qn resolution in the fixed-point implementation and the number of PCA components retained, have been studied and evaluated in the ASR subsystem, obtaining word recognition rates of around 96% for the best configuration. Finally, a field evaluation of the system has been performed showing that our design of the speech centric interface achieved good results in a real-life scenario.This work was supported in part by the Spanish Government grants TSI-020110-2009-103, IPT-120000-2010-24, and TEC2011-26807 and the Spanish Regional grant CCG08-UC3M/TIC-4457

Universidad Carlos III de Madrid e-Archivo

Speech Technologies for African Languages: Example of a Multilingual Calculator for Education

Author: Bagshaw Paul
Besacier Laurent
Bretier Philippe
Gauthier Elodie
Mangeot Mathieu
Marsico Egidio
Moudenc Thierry
Nocera Pascal
Pellegrino François
Rosec Olivier
Voisin Sylvie
Publication venue: HAL CCSD
Publication date: 06/09/2015
Field of study

International audienceThis paper presents our achievements after 18 months of the ALFFA project dealing with African languages technologies. We focus on a multilingual calculator (Android app) that will be demonstrated during the Show and Tell session

Hal - Université Grenoble Alpes

HAL

Techno-economic analysis of software defined networking as architecture for the virtualiazation of a mobile network

Author: Colle Didier
Kind Mario
Naudts Bram
Pickavet Mario
Verbrugge Sofie
Westphal Fritz-Joachim
Publication venue
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography