Unsupervised Incremental Online Learning and Prediction of Musical Audio Signals

Abstract

Guided by the idea that musical human-computer interaction may become more effective, intuitive, and creative when basing its computer part on cognitively more plausible learning principles, we employ unsupervised incremental online learning (i.e. clustering) to build a system that predicts the next event in a musical sequence, given as audio input. The flow of the system is as follows: 1) segmentation by onset detection, 2) timbre representation of each segment by Mel frequency cepstrum coefficients, 3) discretization by incremental clustering, yielding a tree of different sound classes (e.g. timbre categories/instruments) that can grow or shrink on the fly driven by the instantaneous sound events, resulting in a discrete symbol sequence, 4) extraction of statistical regularities of the symbol sequence, using hierarchical N-grams and the newly introduced conceptual Boltzmann machine that adapt to the dynamically changing clustering tree in 3) , and 5) prediction of the next sound event in the sequence, given the last n previous events. The system's robustness is assessed with respect to complexity and noisiness of the signal. Clustering in isolation yields an adjusted Rand index (ARI) of 82.7%/85.7% for data sets of singing voice and drums. Onset detection jointly with clustering achieve an ARI of 81.3%/76.3% and the prediction of the entire system yields an ARI of 27.2%/39.2%

    Similar works