Skip to main content
Article thumbnail
Location of Repository

MULTIPLE TIME RESOLUTION ANALYSIS OF SPEECH SIGNAL USING MCE TRAINING WITH APPLICATION TO SPEECH RECOGNITION

By Spiros Dimopoulos, Ros Potamianos, Eric-fosler Lussier and Chin-hui Lee

Abstract

In this paper, we propose two methods of multiple time-resolution analysis of speech and their application to Automatic Speech Recognition (ASR). Constant frame-rate multi-scale analysis is proposed based on a box of multi-scale features. Then a variable rate analysis is proposed based on the selection of the optimal temporal resolution on the fly by a properly trained non-linear classifier unit. The classifier’s parameters are trained using the discriminative method of Minimum Classification Error (MCE) training. We use the recently proposed Conditional Random Fields (CRF) phonetic recognition system that effectively combines highly correlated features. Results are reported on a frame-wise classification task and also on TIMIT phone recognition task. Results show that (i) CRFs can effectively combine multi-scale features and (ii) MCE trained variable rate CRFs are competitive with the “box ” combination method

Topics: Index Terms — ASR, MCE, Conditional Random Fields, Variable Frame Rate, Multiple Frame Rates
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.352.3502
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.