217 research outputs found
Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
The analysis of the structure of musical pieces is a task that remains a
challenge for Artificial Intelligence, especially in the field of Deep
Learning. It requires prior identification of structural boundaries of the
music pieces. This structural boundary analysis has recently been studied with
unsupervised methods and \textit{end-to-end} techniques such as Convolutional
Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features
(MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as
inputs and trained with human annotations. Several studies have been published
divided into unsupervised and \textit{end-to-end} methods in which
pre-processing is done in different ways, using different distance metrics and
audio characteristics, so a generalized pre-processing method to compute model
inputs is missing. The objective of this work is to establish a general method
of pre-processing these inputs by comparing the inputs calculated from
different pooling strategies, distance metrics and audio characteristics, also
taking into account the computing time to obtain them. We also establish the
most effective combination of inputs to be delivered to the CNN in order to
establish the most efficient way to extract the limits of the structure of the
music pieces. With an adequate combination of input matrices and pooling
strategies we obtain a measurement accuracy of 0.411 that outperforms the
current one obtained under the same conditions
Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks
International audienceIn this paper, we propose a new representation as input of a Convolutional Neural Network with the goal of estimating music structure boundaries. For this task, previous works used a network performing the late-fusion of a Mel-scaled log-magnitude spectrogram and a self-similarity-lag-matrix. We propose here to use the square-sub-matrices centered on the main diagonals of several self-similarity-matrices, each one representing a different audio descriptors. We propose to combine them using the depth of the input layer. We show that this representation improves the results over the use of the self-similarity-lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of audio representations
The development of corpus-based computer assisted composition program and its application for instrumental music composition
In the last 20 years, we have seen the nourishing environment for the development of
music software using a corpus of audio data expanding significantly, namely that synthesis
techniques producing electronic sounds, and supportive tools for creative activities
are the driving forces to the growth. Some software produces a sequence of sounds by
means of synthesizing a chunk of source audio data retrieved from an audio database
according to a rule. Since the matching of sources is processed according to their descriptive
features extracted by FFT analysis, the quality of the result is significantly
influenced by the outcomes of the Audio Analysis, Segmentation, and Decomposition.
Also, the synthesis process often requires a considerable amount of sample data and
this can become an obstacle to establish easy, inexpensive, and user-friendly applications
on various kinds of devices. Therefore, it is crucial to consider how to treat the
data and construct an efficient database for the synthesis. We aim to apply corpusbased
synthesis techniques to develop a Computer Assisted Composition program, and
to investigate the actual application of the program on ensemble pieces. The goal of
this research is to apply the program to the instrumental music composition, refine its
function, and search new avenues for innovative compositional method
A syllable-based investigation of coarticulation
Coarticulation has been long investigated in Speech Sciences and Linguistics (KĂĽhnert &
Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu,
2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable
onset for the sake of reducing temporal degrees of freedom, and such synchronisation
is the essence of coarticulation. Previous efforts in the examination of CV alignment
mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this
thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin.
Departing from conventional approaches, a minimal triplet paradigm was applied, in
which the CV onsets were determined through the consonant and vowel minimal pairs,
respectively. Both articulatory and acoustical results showed that CV articulation started
in close temporal proximity, supporting the synchrony hypothesis. The second study
extended the research to English and syllables with cluster onsets. By using acoustic data
in conjunction with Deep Learning, supporting evidence was found for co-onset, which
is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis
investigated the mechanism that can maximise synchrony – Dimension Specific Sequential
Target Approximation (DSSTA), which is highly relevant to what is commonly known
as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the
CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously.
Last but not least, the final study tested the hypothesis that resyllabification is the result of
coarticulation asymmetry between onset and coda consonants. It was found that neural
network based models could infer syllable affiliation of consonants, and those inferred
resyllabified codas had similar coarticulatory structure with canonical onset consonants. In
conclusion, this thesis found that many coarticulation related phenomena, including local
vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification,
stem from the articulatory mechanism of the syllable
A Cross-Cultural Analysis of Music Structure
PhDMusic signal analysis is a research field concerning the extraction of meaningful information
from musical audio signals. This thesis analyses the music signals from the note-level
to the song-level in a bottom-up manner and situates the research in two Music information
retrieval (MIR) problems: audio onset detection (AOD) and music structural
segmentation (MSS).
Most MIR tools are developed for and evaluated on Western music with specific musical
knowledge encoded. This thesis approaches the investigated tasks from a cross-cultural
perspective by developing audio features and algorithms applicable for both Western and
non-Western genres. Two Chinese Jingju databases are collected to facilitate respectively
the AOD and MSS tasks investigated.
New features and algorithms for AOD are presented relying on fusion techniques. We
show that fusion can significantly improve the performance of the constituent baseline
AOD algorithms. A large-scale parameter analysis is carried out to identify the relations
between system configurations and the musical properties of different music types.
Novel audio features are developed to summarise music timbre, harmony and rhythm for
its structural description. The new features serve as effective alternatives to commonly
used ones, showing comparable performance on existing datasets, and surpass them on
the Jingju dataset. A new segmentation algorithm is presented which effectively captures
the structural characteristics of Jingju. By evaluating the presented audio features and
different segmentation algorithms incorporating different structural principles for the
investigated music types, this thesis also identifies the underlying relations between audio
features, segmentation methods and music genres in the scenario of music structural
analysis.China Scholarship Council
EPSRC C4DM Travel Funding,
EPSRC Fusing Semantic and Audio Technologies for Intelligent Music Production and
Consumption (EP/L019981/1),
EPSRC Platform Grant on Digital Music (EP/K009559/1),
European Research Council project CompMusic, International Society for Music Information Retrieval Student Grant,
QMUL Postgraduate Research Fund,
QMUL-BUPT Joint Programme Funding
Women in Music Information Retrieval Grant
Análisis musical mediante inteligencia artificial
Las Redes Neuronales son una herramienta muy potente para clasificar, procesar y generar nuevos datos. Con respecto a la mĂşsica, estas redes se han utilizado para componer nuevas melodĂas, armonizar temas, etc., pero solo unas pocas investigaciones han tenido en cuenta la importancia del análisis musical. En este proyecto se han desarrollado dos modelos de Redes Neuronales que identifican las transiciones de las diferentes partes de la estructura de las piezas musicales y las diferencias entre las transiciones para etiquetarlas. Para ello, se ha realizado un etiquetado de las partes de la estructura formal de piezas musicales a travĂ©s de una red neuronal y se han detectado las transiciones en la estructura musical a travĂ©s de tĂ©cnicas de aprendizaje profundo y aprendizaje automático con Pytorch. Los resultados obtenidos son similares al estado del arte de este trabajo que se ha tomado como ejemplo para desarrollar este software.Este proyecto consta de un primer capĂtulo de introducciĂłn, el segundo capĂtulo explica las caracterĂsticas de la teorĂa de las Redes Neuronales que se han utilizado en este proyecto, el tercer capĂtulo expone el caso del etiquetado de estructuras, el cuarto capĂtulo estudia el caso de detecciĂłn de transiciones y el quinto capĂtulo compara los resultados obtenidos con el estado del arte. El sexto capĂtulo expone las conclusiones y las lĂneas futuras.<br /
Deep Neural Networks for Music Tagging
PhDIn this thesis, I present my hypothesis, experiment results, and discussion that are related
to various aspects of deep neural networks for music tagging.
Music tagging is a task to automatically predict the suitable semantic label when music is
provided. Generally speaking, the input of music tagging systems can be any entity that
constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content
is considered in this thesis. My hypothesis is that we can fi nd effective deep learning
practices for the task of music tagging task that improves the classi fication performance.
As a computational model to realise a music tagging system, I use deep neural networks.
Combined with the research problem, the scope of this thesis is the understanding,
interpretation, optimisation, and application of deep neural networks in the context of
music tagging systems.
The ultimate goal of this thesis is to provide insight that can help to improve deep
learning-based music tagging systems. There are many smaller goals in this regard.
Since using deep neural networks is a data-driven approach, it is crucial to understand the
dataset. Selecting and designing a better architecture is the next topic to discuss. Since
the tagging is done with audio input, preprocessing the audio signal becomes one of the
important research topics. After building (or training) a music tagging system, fi nding
a suitable way to re-use it for other music information retrieval tasks is a compelling
topic, in addition to interpreting the trained system.
The evidence presented in the thesis supports that deep neural networks are powerful
and credible methods for building a music tagging system
- …