140 research outputs found
Information-Theoretic Measures of Predictability for Music Content Analysis.
PhDThis thesis is concerned with determining similarity in musical audio, for the purpose of applications
in music content analysis. With the aim of determining similarity, we consider the
problem of representing temporal structure in music. To represent temporal structure, we propose
to compute information-theoretic measures of predictability in sequences. We apply our
measures to track-wise representations obtained from musical audio; thereafter we consider the
obtained measures predictors of musical similarity. We demonstrate that our approach benefits
music content analysis tasks based on musical similarity.
For the intermediate-specificity task of cover song identification, we compare contrasting
discrete-valued and continuous-valued measures of pairwise predictability between sequences.
In the discrete case, we devise a method for computing the normalised compression distance
(NCD) which accounts for correlation between sequences. We observe that our measure improves
average performance over NCD, for sequential compression algorithms. In the continuous
case, we propose to compute information-based measures as statistics of the prediction error
between sequences. Evaluated using 300 Jazz standards and using the Million Song Dataset,
we observe that continuous-valued approaches outperform discrete-valued approaches. Further,
we demonstrate that continuous-valued measures of predictability may be combined to improve
performance with respect to baseline approaches. Using a filter-and-refine approach, we demonstrate
state-of-the-art performance using the Million Song Dataset.
For the low-specificity tasks of similarity rating prediction and song year prediction, we propose
descriptors based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. We evaluate our descriptors
using a dataset of 15 500 track excerpts of Western popular music, for which we have 7 800
web-sourced pairwise similarity ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction.
For both tasks, analysis of selected descriptors reveals that representing features at multiple time
scales benefits prediction accuracy.This work was supported by a UK EPSRC DTA studentship
Making music through real-time voice timbre analysis: machine learning and timbral control
PhDPeople can achieve rich musical expression through vocal sound { see for example
human beatboxing, which achieves a wide timbral variety through a range of
extended techniques. Yet the vocal modality is under-exploited as a controller
for music systems. If we can analyse a vocal performance suitably in real time,
then this information could be used to create voice-based interfaces with the
potential for intuitive and ful lling levels of expressive control.
Conversely, many modern techniques for music synthesis do not imply any
particular interface. Should a given parameter be controlled via a MIDI keyboard,
or a slider/fader, or a rotary dial? Automatic vocal analysis could provide
a fruitful basis for expressive interfaces to such electronic musical instruments.
The principal questions in applying vocal-based control are how to extract
musically meaningful information from the voice signal in real time, and how
to convert that information suitably into control data. In this thesis we address
these questions, with a focus on timbral control, and in particular we
develop approaches that can be used with a wide variety of musical instruments
by applying machine learning techniques to automatically derive the mappings
between expressive audio input and control output. The vocal audio signal is
construed to include a broad range of expression, in particular encompassing
the extended techniques used in human beatboxing.
The central contribution of this work is the application of supervised and
unsupervised machine learning techniques to automatically map vocal timbre
to synthesiser timbre and controls. Component contributions include a delayed
decision-making strategy for low-latency sound classi cation, a regression-tree
method to learn associations between regions of two unlabelled datasets, a fast
estimator of multidimensional di erential entropy and a qualitative method for
evaluating musical interfaces based on discourse analysis
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
On Musical Self-Similarity : Intersemiosis as Synecdoche and Analogy
Self-similarity, a concept borrowed from mathematics, is gradually becoming a keyword in musicology. Although a polysemic term, self-similarity often refers to the multi-scalar feature repetition in a set of relationships, and it is commonly valued as an indication for musical ‘coherence’ and ‘consistency’. In this study, Gabriel Pareyon presents a theory of musical meaning formation in the context of intersemiosis, that is, the translation of meaning from one cognitive domain to another cognitive domain (e.g. from mathematics to music, or to speech or graphic forms). From this perspective, the degree of coherence of a musical system relies on a synecdochic intersemiosis: a system of related signs within other comparable and correlated systems. The author analyzes the modalities of such correlations, exploring their general and particular traits, and their operational bounds. Accordingly, the notion of analogy is used as a rich concept through its two definitions quoted by the Classical literature—proportion and paradigm, enormously valuable in establishing measurement, likeness and affinity criteria. At the same time, original arguments by Benoît B. Mandelbrot (1924–2010) are revised, alongside a systematic critique of the literature on the subject. In fact, connecting Charles S. Peirce’s ‘synechism’ with Mandelbrot’s ‘fractality’ is one of the main developments of the present study
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
A Study in Violinist Identification using Short-term Note Features
The perception of music expression and emotion are greatly influenced by performer's individual interpretation, thus modelling performer's style is important to music understanding, style transfer, music education and characteristic music generation. This Thesis proposes approaches for modelling and identifying musical instrumentalists, using violinist identification as a case study. In violin performance, vibrato and timbre play important roles in players’ emotional expression, and they are key factors of playing style while execution shows great diversity. To validate that these two factors are effective to model violinists, we design and extract note-level vibrato features and timbre features from isolated concerto music notes, then present a violinist identification method based on the similarity of feature distributions, using single feature as well as fused features. The result shows that vibrato features are helpful for the violinist identification, and some timbre features perform better than vibrato features. In addition, the accuracy obtained from fused features is higher than using any single feature. However, apart from performer, the timbre is also determined by musical instruments, recording conditions and other factors. Furthermore, the common scenario for violinist identification is based on short music clips rather than isolated notes. To solve these two problems, we further examine the method using note-level timbre features to recognize violinists from segmented solo music clips, then use it to identify master players from concerto fragments. The results show that the designed features and method work very well for both types of music. Another experiment is conducted to examine the influence of instrument on the features. Results suggest that the selected timbre features can model performers’ individual playing reasonably and objectively, regardless of the instrument they play. Expressive timing is another key factor to reflect individual play styles. This Thesis develops a novel onset time deviation feature, which is used to model and identify master violinists on concerto fragments data. Results show that it performs better than timbre features on the dataset. To generalise the violinist identification method and further improve the result, deep learning methods are proposed and investigated. We present a transfer learning approach for violinist identification from pre-trained music auto-tagging neural networks and singer identification models. We then transfer pre-trained weights and fine-tune the models using violin datasets and finally obtain violinist identification results. We compare our system with state-of-the-art works, which shows that our model outperforms them using our two datasets
- …