Search CORE

160,139 research outputs found

Reconfigurable Computing for Speech Recognition: Preliminary Findings

Author: James-Roxby P B
Melnikoff Stephen Jonathan
Quigley Steven Francis
Russell Martin
Publication venue: Springer Verlag
Publication date: 01/01/2000
Field of study

Continuous real-time speech recognition is a highly computationally-demanding task, but one which can take good advantage of a parallel processing system. To this end, we describe proposals for, and preliminary findings of, research in implementing in programmable logic the decoder part of a speech recognition system. Recognition via Viterbi decoding of Hidden Markov Models is outlined, along with details of current implementations, which aim to exploit properties of the algorithm that could make it well-suited for devices such as FPGAs. The question of how to deal with limited resources, by reconfiguration or otherwise, is also addressed

Arousal and Valence Prediction in Spontaneous Emotional Speech: Felt versus Perceived Emotion

Author: Jong Franciska M.G. de
Leeuwen David A. van
Neerincx Mark A.
Truong Khiet P.
Publication venue: International Speech Communication Association
Publication date: 01/01/2009
Field of study

In this paper, we describe emotion recognition experiments carried out for spontaneous aﬀective speech with the aim to compare the added value of annotation of felt emotion versus annotation of perceived emotion. Using speech material available in the TNO-GAMING corpus (a corpus containing audiovisual recordings of people playing videogames), speech-based aﬀect recognizers were developed that can predict Arousal and Valence scalar values. Two types of recognizers were developed in parallel: one trained with felt emotion annotations (generated by the gamers themselves) and one trained with perceived/observed emotion annotations (generated by a group of observers). The experiments showed that, in speech, with the methods and features currently used, observed emotions are easier to predict than felt emotions. The results suggest that recognition performance strongly depends on how and by whom the emotion annotations are carried out. \u

CiteSeerX

University of Twente Research Information

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities

Author: Miyoshi Hiroyuki
Saito Yuki
Saruwatari Hiroshi
Takamichi Shinnosuke
Publication venue
Publication date: 06/08/2017
Field of study

Voice conversion (VC) using sequence-to-sequence learning of context posterior probabilities is proposed. Conventional VC using shared context posterior probabilities predicts target speech parameters from the context posterior probabilities estimated from the source speech parameters. Although conventional VC can be built from non-parallel data, it is difficult to convert speaker individuality such as phonetic property and speaking rate contained in the posterior probabilities because the source posterior probabilities are directly used for predicting target speech parameters. In this work, we assume that the training data partly include parallel speech data and propose sequence-to-sequence learning between the source and target posterior probabilities. The conversion models perform non-linear and variable-length transformation from the source probability sequence to the target one. Further, we propose a joint training algorithm for the modules. In contrast to conventional VC, which separately trains the speech recognition that estimates posterior probabilities and the speech synthesis that predicts target speech parameters, our proposed method jointly trains these modules along with the proposed probability conversion modules. Experimental results demonstrate that our approach outperforms the conventional VC.Comment: Accepted to INTERSPEECH 201

arXiv.org e-Print Archive

A distributed platform for speech recognition research

Author: Islam Shynggys
Kozhirbayev Zhanibek
Publication venue: National Laboratory Astana
Publication date: 17/06/2016
Field of study

Distributed and parallel processing of big data has been applied in various applications for the past few years. Moreover, huge advancements took place in usability, economic efficiency, and multiplicity of parallel processing systems, with big data analysis and speech recognition research supported by many researchers. In this paper we examined and investigated which parts of speech recognition research may be parallelized and computed using distributed computing platforms. Firstly, we address the case of efficiently computing n-gram statistics on MapReduce platforms to build a language model (LM). Secondly, we show how the Automated Speech Recognition (ASR) tool can work efficiently regarding the speed and fault-tolerance in distributed environment such as Sun GridEngine (SGE)