Location of Repository

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

By Dirk Von Zeddelmann and Frank Kurth

Abstract

In this paper, we propose a new class of audio feature that is derived from the well-known mel frequency cepstral coefficients (MFCCs) which are widely used in speech processing. More precisely, we calculate suitable short-time statistics during the MFCC computation to obtain smoothed features with a temporal resolution that may be adjusted depending on the application. The approach was motivated by the task of audio segmentation where the classical MFCCs, having a fine temporal resolution, may result in a high amount of fluctuations and, consequently, an unstable segmentation. As a main contribution, our proposed MFCC-ENS (MFCC-Energy Normalized Statistics) features may be adapted to have a lower, and more suitable, temporal resolution while summarizing the essential information contained in the MFCCs. Our experiments on the segmentation of radio programmes demonstrate the benefits of the newly proposed features. 1

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.184.6076
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.eurasip.org/Proceed... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.