Search CORE

44 research outputs found

Decision Manifolds: Classification Inspired by Self-Organization

Author: Lidy Thomas
Pölzlbauer Georg
Rauber Andreas
Publication venue: Technische Fakultät, Arbeitsgruppen der Informatik
Publication date: 31/12/2007
Field of study

We present a classifier algorithm that approximates the decision surface of labeled data by a patchwork of separating hyperplanes. The hyperplanes are arranged in a way inspired by how Self-Organizing Maps are trained. We take advantage of the fact that the boundaries can often be approximated by linear ones connected by a low-dimensional nonlinear manifold. The resulting classifier allows for a voting scheme that averages over the classifiction results of neighboring hyperplanes. Our algorithm is computationally efficient both in terms of training and classification. Further, we present a model selection framework for estimation of the paratmeters of the classification boundary, and show results for artificial and real-world data sets

BieColl - Bielefeld eCollections

A cartesian ensemble of feature subspace classifiers for music categorization

Author: Iñesta José M.
Lidy Thomas
Mayer Rudolf
Pertusa Antonio
Ponce de León Amador Pedro José
Rauber Andy
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2010
Field of study

We present a cartesian ensemble classification system that is based on the principle of late fusion and feature subspaces. These feature subspaces describe different aspects of the same data set. The framework is built on the Weka machine learning toolkit and able to combine arbitrary feature sets and learning schemes. In our scenario, we use it for the ensemble classification of multiple feature sets from the audio and symbolic domains. We present an extensive set of experiments in the context of music genre classification, based on numerous Music IR benchmark datasets, and evaluate a set of combination/voting rules. The results show that the approach is superior to the best choice of a single algorithm on a single feature set. Moreover, it also releases the user from making this choice explicitly.International Society for Music Information Retrieva

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Music Information Technology and Professional Stakeholder Audiences: Mind the Adoption Gap

Author: Crawford Tim
Hanjalic Alan
Lewis Richard
Lidy Thomas
Liem Cynthia C.S.
Raphael Christopher
Rauber Andreas
Reiss Joshua D.
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

The academic discipline focusing on the processing and organization of digital music information, commonly known as Music Information Retrieval (MIR), has multidisciplinary roots and interests. Thus, MIR technologies have the potential to have impact across disciplinary boundaries and to enhance the handling of music information in many different user communities. However, in practice, many MIR research agenda items appear to have a hard time leaving the lab in order to be widely adopted by their intended audiences. On one hand, this is because the MIR field still is relatively young, and technologies therefore need to mature. On the other hand, there may be deeper, more fundamental challenges with regard to the user audience. In this contribution, we discuss MIR technology adoption issues that were experienced with professional music stakeholders in audio mixing, performance, musicology and sales industry. Many of these stakeholders have mindsets and priorities that differ considerably from those of most MIR academics, influencing their reception of new MIR technology. We mention the major observed differences and their backgrounds, and argue that these are essential to be taken into account to allow for truly successful cross-disciplinary collaboration and technology adoption in MIR

Dagstuhl Research Online Publication Server

Evaluation of new audio features and their utilization in novel music retrieval applications

Author: Lidy Thomas
Publication venue
Publication date: 01/01/2006
Field of study

Zsfassung in dt. Sprache14

reposiTUm (TUW Vienna)

2005, ‘MIREX 2005:Combined Fluctuation Features for Music Genre Classification’. Extended Abstract. MIREX genre classification contest (www.music-ir.org/evaluation/mirex-results

Author: Andreas Rauber
Thomas Lidy
Publication venue
Publication date
Field of study

CLASSIFICATION We submitted a system that uses combinations of three feature sets (Rhythm Patterns, Statistical Spectrum Descriptor and Rhythm Histogram) to the MIREX 2005 audio genre classification task. All feature sets are based on fluctuation of modulation amplitudes in psychoacoustically transformed spectrum data. For classification we applied Support Vector Machines. Our best approach achieved 75.27 % combined overall classification accuracy, which is rank 5. 1 IMPLEMENTATION 1.1 Feature Extraction We extract 3 feature sets from audio data, using algorithms implemented in MATLAB. The algorithms process audio tracks in standard digital PCM format with 44.1 kHz or 22.05 kHz sampling frequency. Audio compressed with e.g. the MP3 format will be decoded by an external program in a pre-processing step. Audio with multiple channels will be merged to mono. Prior to feature extraction, each audio track is segmented into pieces of 6 seconds length. The first and the last segment are skipped, in order to exclude lead-in and fade-out effects. In the MIREX setting, only every third segment is processed. For each set of features, the characteristics of an entire piece of music are computed by averaging the feature vectors from the segments (using median or mean). For a more detailed description of the feature sets and the combination approach see (Lidy and Rauber, 2005). 1.1.1 Rhythm Patterns A short time Fast Fourier Transform (STFT) using a hanning window function (23 ms windows with 50 % overlap) is applied to retrieve the spectrum data from the audio. The frequency bands of the spectrogram are summed up to 24 so-called critical bands, according to the Bark scale (Zwicker and Fastl, 1999), with narrow bands in low frequency regions and broader bands in high frequency regions, according to the human auditory system. Successively, the data is transformed into the logarithmic decibel scale, the Phon scale by applying the psychoacoustically motivated equal-loudness curves (Zwicker and Fastl, 1999) and afterwards into the unit Sone, reflecting specific loudness sensation

CiteSeerX