Search CORE

3 research outputs found

An Automatic Real-time Synchronization of Live speech with Its Transcription Approach

Author: Kertkeidkachorn Natthawut
Lertwongkhanakool Nat
Punyabukkana Proadpran
Suchato Atiwong
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 31/10/2015
Field of study

Most studies in automatic synchronization of speech and transcription focus on the synchronization at the sentence level or the phrase level. Nevertheless, in some languages, like Thai, boundaries of such levels are difficult to linguistically define, especially in case of the synchronization of speech and its transcription. Consequently, the synchronization at a finer level like the syllabic level is promising. In this article, an approach to synchronize live speech with its corresponding transcription in real time at the syllabic level is proposed. Our approach employs the modified real-time syllable detection procedure from our previous work and the transcription verification procedure then adopts to verify correctness and to recover errors caused by the real-time syllable detection procedure. In experiments, the acoustic features and the parameters are customized empirically. Results are compared with two baselines which have been applied to the Thai scenario. Experimental results indicate that, our approach outperforms two baselines with error rate reduction of 75.9% and 41.9% respectively and also can provide results in the real-time situation. Besides, our approach is applied to the practical application, namely ChulaDAISY. Practical experiments show that ChulaDAISY applied with our approach could reduce time consumption for producing audio books

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

How speech/text alignment benefits web-based learning

Author: Hao-tung Lin
Herng-yow Chen
Sheng-wei Li
Publication venue
Publication date: 01/01/2005
Field of study

This demonstration presents an integrated web-based synchronized scenario for many-to-one cross-media correlations between speech (an EFL, English as Foreign Language, lecture with free-style lecturing behaviors) and the corresponding textual content. The analysis/presentation of the temporal correlations enable the vivid web-based language learning through the interactive functions: browsing speech via content, word-by-word pointer guidance, synchronized scrolling/highlighting, and listening training mode. We regularly analyze and repackage the multimedia content of VoA (Voice of America) [1], ICRT (International Community Radio Taipei) [2], and Online Lectures in our University [3]. Through the subjective experiments, this repackaged synchronized speech/text content does facilitate the learning for EFL learners

CiteSeerX

Crossref