139 research outputs found
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
Self-Supervised Representation Learning for Vocal Music Context
In music and speech, meaning is derived at multiple levels of context.
Affect, for example, can be inferred both by a short sound token and by sonic
patterns over a longer temporal window such as an entire recording. In this
paper we focus on inferring meaning from this dichotomy of contexts. We show
how contextual representations of short sung vocal lines can be implicitly
learned from fundamental frequency () and thus be used as a meaningful
feature space for downstream Music Information Retrieval (MIR) tasks. We
propose three self-supervised deep learning paradigms which leverage pseudotask
learning of these two levels of context to produce latent representation
spaces. We evaluate the usefulness of these representations by embedding unseen
vocal contours into each space and conducting downstream classification tasks.
Our results show that contextual representation can enhance downstream
classification by as much as 15 % as compared to using traditional statistical
contour features.Comment: Working on more updated versio
Computational analysis of world music corpora
PhDThe comparison of world music cultures has been considered in musicological
research since the end of the 19th century. Traditional methods from the
field of comparative musicology typically involve the process of manual music
annotation. While this provides expert knowledge, the manual input is timeconsuming
and limits the potential for large-scale research. This thesis considers
computational methods for the analysis and comparison of world music cultures.
In particular, Music Information Retrieval (MIR) tools are developed for processing
sound recordings, and data mining methods are considered to study
similarity relationships in world music corpora.
MIR tools have been widely used for the study of (mainly) Western music.
The first part of this thesis focuses on assessing the suitability of audio descriptors
for the study of similarity in world music corpora. An evaluation strategy
is designed to capture challenges in the automatic processing of world music
recordings and different state-of-the-art descriptors are assessed.
Following this evaluation, three approaches to audio feature extraction are
considered, each addressing a different research question. First, a study of
singing style similarity is presented. Singing is one of the most common forms
of musical expression and it has played an important role in the oral transmission
of world music. Hand-designed pitch descriptors are used to model aspects of the
singing voice and clustering methods reveal singing style similarities in world
music. Second, a study on music dissimilarity is performed. While musical
exchange is evident in the history of world music it might be possible that some
music cultures have resisted external musical influence. Low-level audio features
are combined with machine learning methods to find music examples that stand
out in a world music corpus, and geographical patterns are examined. The
last study models music similarity using descriptors learned automatically with
deep neural networks. It focuses on identifying music examples that appear to
be similar in their audio content but share no (obvious) geographical or cultural
links in their metadata. Unexpected similarities modelled in this way uncover
possible hidden links between world music cultures.
This research investigates whether automatic computational analysis can
uncover meaningful similarities between recordings of world music. Applications
derive musicological insights from one of the largest world music corpora
studied so far. Computational analysis as proposed in this thesis advances the
state-of-the-art in the study of world music and expands the knowledge and
understanding of musical exchange in the world.Queen Mary Principal’s research studentship
Recommended from our members
A computational study on outliers in world music
The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country
Data-driven, memory-based computational models of human segmentation of musical melody
When listening to a piece of music, listeners often identify distinct sections or segments
within the piece. Music segmentation is recognised as an important process in the abstraction
of musical contents and researchers have attempted to explain how listeners
perceive and identify the boundaries of these segments.The present study seeks the development of a system that is capable of performing
melodic segmentation in an unsupervised way, by learning from non-annotated musical
data. Probabilistic learning methods have been widely used to acquire regularities in
large sets of data, with many successful applications in language and speech processing.
Some of these applications have found their counterparts in music research and have
been used for music prediction and generation, music retrieval or music analysis, but
seldom to model perceptual and cognitive aspects of music listening.We present some preliminary experiments on melodic segmentation, which highlight
the importance of memory and the role of learning in music listening. These experiments
have motivated the development of a computational model for melodic segmentation
based on a probabilistic learning paradigm.The model uses a Mixed-memory Markov Model to estimate sequence probabilities
from pitch and time-based parametric descriptions of melodic data. We follow the assumption
that listeners' perception of feature salience in melodies is strongly related
to expectation. Moreover, we conjecture that outstanding entropy variations of certain
melodic features coincide with segmentation boundaries as indicated by listeners.Model segmentation predictions are compared with results of a listening study on
melodic segmentation carried out with real listeners. Overall results show that changes
in prediction entropy along the pieces exhibit significant correspondence with the listeners'
segmentation boundaries.Although the model relies only on information theoretic principles to make predictions
on the location of segmentation boundaries, it was found that most predicted segments
can be matched with boundaries of groupings usually attributed to Gestalt rules.These results question previous research supporting a separation between learningbased
and innate bottom-up processes of melodic grouping, and suggesting that some
of these latter processes can emerge from acquired regularities in melodic data
Music as complex emergent behaviour : an approach to interactive music systems
Access to the full-text thesis is no longer available at the author's request, due to 3rd party copyright restrictions. Access removed on 28.11.2016 by CS (TIS).Metadata merged with duplicate record (http://hdl.handle.net/10026.1/770) on 20.12.2016 by CS (TIS).This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.This thesis suggests a new model of human-machine interaction in the domain of non-idiomatic
musical improvisation. Musical results are viewed as emergent phenomena
issuing from complex internal systems behaviour in relation to input from a single
human performer. We investigate the prospect of rewarding interaction whereby a
system modifies itself in coherent though non-trivial ways as a result of exposure to a
human interactor. In addition, we explore whether such interactions can be sustained
over extended time spans. These objectives translate into four criteria for evaluation;
maximisation of human influence, blending of human and machine influence in the
creation of machine responses, the maintenance of independent machine motivations
in order to support machine autonomy and finally, a combination of global emergent
behaviour and variable behaviour in the long run. Our implementation is heavily
inspired by ideas and engineering approaches from the discipline of Artificial Life.
However, we also address a collection of representative existing systems from the
field of interactive composing, some of which are implemented using techniques of
conventional Artificial Intelligence. All systems serve as a contextual background and
comparative framework helping the assessment of the work reported here.
This thesis advocates a networked model incorporating functionality for listening,
playing and the synthesis of machine motivations. The latter incorporate dynamic
relationships instructing the machine to either integrate with a musical context
suggested by the human performer or, in contrast, perform as an individual musical
character irrespective of context. Techniques of evolutionary computing are used to
optimise system components over time. Evolution proceeds based on an implicit
fitness measure; the melodic distance between consecutive musical statements made
by human and machine in relation to the currently prevailing machine motivation.
A substantial number of systematic experiments reveal complex emergent behaviour
inside and between the various systems modules. Music scores document how global
systems behaviour is rendered into actual musical output. The concluding chapter
offers evidence of how the research criteria were accomplished and proposes
recommendations for future research
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy
Contributions on education (EUDIA-8)
136 p.Hizkeren sailkapenaz zientifikoki hitz egiten hasteko ezinbestekoa da zientzietan sailkapenak nola egiten diren aipatzea. Egia esan, metodo zientifikoez hitz egitean ez dela metodo bakarra esan behar dugu ezer baino lehen: metodo zientifikoa ez dela bide zurruna, alegia. Gainerakoetan bezala, giza zientzietan edo, zehatzago, hizkuntzalaritzan metodo zientifikoez dihardugunean metodo objektiboez hitz egitea dagokigu. Taxonomia arduratzen da aztertzen dituen objektuak bereizi eta beraien arteko harremanen araberako egitura bilatu eta egitura horretan objektuak kokatzeaz, antzekotasun, berdintasun edo hurbiltasuna kontuan hartuta. Bide honetan, zientziak, gure kasuan zientzia enpirikoak (badira logiko-deduktiboak eta induktiboak ere) ezaugarritu duen prozedura-multzoa hau da: behaketa sistematikoa, neurketa, formulazioa eta analisia; hots, hizkuntza bere osotasunean ezin denez aukeratu, lagin bat, lagin ordezkatzaile bat aukeratu behar da
- …