2005, ‘MIREX 2005:Combined Fluctuation Features for Music Genre Classification’. Extended Abstract. MIREX genre classification contest (www.music-ir.org/evaluation/mirex-results

Abstract

CLASSIFICATION We submitted a system that uses combinations of three feature sets (Rhythm Patterns, Statistical Spectrum Descriptor and Rhythm Histogram) to the MIREX 2005 audio genre classification task. All feature sets are based on fluctuation of modulation amplitudes in psychoacoustically transformed spectrum data. For classification we applied Support Vector Machines. Our best approach achieved 75.27 % combined overall classification accuracy, which is rank 5. 1 IMPLEMENTATION 1.1 Feature Extraction We extract 3 feature sets from audio data, using algorithms implemented in MATLAB. The algorithms process audio tracks in standard digital PCM format with 44.1 kHz or 22.05 kHz sampling frequency. Audio compressed with e.g. the MP3 format will be decoded by an external program in a pre-processing step. Audio with multiple channels will be merged to mono. Prior to feature extraction, each audio track is segmented into pieces of 6 seconds length. The first and the last segment are skipped, in order to exclude lead-in and fade-out effects. In the MIREX setting, only every third segment is processed. For each set of features, the characteristics of an entire piece of music are computed by averaging the feature vectors from the segments (using median or mean). For a more detailed description of the feature sets and the combination approach see (Lidy and Rauber, 2005). 1.1.1 Rhythm Patterns A short time Fast Fourier Transform (STFT) using a hanning window function (23 ms windows with 50 % overlap) is applied to retrieve the spectrum data from the audio. The frequency bands of the spectrogram are summed up to 24 so-called critical bands, according to the Bark scale (Zwicker and Fastl, 1999), with narrow bands in low frequency regions and broader bands in high frequency regions, according to the human auditory system. Successively, the data is transformed into the logarithmic decibel scale, the Phon scale by applying the psychoacoustically motivated equal-loudness curves (Zwicker and Fastl, 1999) and afterwards into the unit Sone, reflecting specific loudness sensation

    Similar works

    Full text

    thumbnail-image

    Available Versions