In this paper, we propose a new class of audio feature that is derived from the well-known mel frequency cepstral coefficients (MFCCs) which are widely used in speech processing. More precisely, we calculate suitable short-time statistics during the MFCC computation to obtain smoothed features with a temporal resolution that may be adjusted depending on the application. The approach was motivated by the task of audio segmentation where the classical MFCCs, having a fine temporal resolution, may result in a high amount of fluctuations and, consequently, an unstable segmentation. As a main contribution, our proposed MFCC-ENS (MFCC-Energy Normalized Statistics) features may be adapted to have a lower, and more suitable, temporal resolution while summarizing the essential information contained in the MFCCs. Our experiments on the segmentation of radio programmes demonstrate the benefits of the newly proposed features. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.