4 research outputs found
Semantic Audio Analysis Utilities and Applications.
PhDExtraction, representation, organisation and application of metadata about audio recordings
are in the concern of semantic audio analysis. Our broad interpretation, aligned with recent
developments in the field, includes methodological aspects of semantic audio, such as
those related to information management, knowledge representation and applications of the
extracted information. In particular, we look at how Semantic Web technologies may be used
to enhance information management practices in two audio related areas: music informatics
and music production.
In the first area, we are concerned with music information retrieval (MIR) and related
research. We examine how structured data may be used to support reproducibility and
provenance of extracted information, and aim to support multi-modality and context adaptation
in the analysis. In creative music production, our goals can be summarised as follows:
O↵-the-shelf sound editors do not hold appropriately structured information about the edited
material, thus human-computer interaction is inefficient. We believe that recent developments
in sound analysis and music understanding are capable of bringing about significant improvements
in the music production workflow. Providing visual cues related to music structure can
serve as an example of intelligent, context-dependent functionality.
The central contributions of this work are a Semantic Web ontology for describing recording
studios, including a model of technological artefacts used in music production, methodologies
for collecting data about music production workflows and describing the work of
audio engineers which facilitates capturing their contribution to music production, and finally
a framework for creating Web-based applications for automated audio analysis. This
has applications demonstrating how Semantic Web technologies and ontologies can facilitate
interoperability between music research tools, and the creation of semantic audio software, for
instance, for music recommendation, temperament estimation or multi-modal music tutorin
The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging
date-added: 2018-06-06 23:32:25 +0000 date-modified: 2018-05-06 23:32:25 +0000 keywords: evaluation, music tagging, deep learning, CNN bdsk-url-1: https://arxiv.org/pdf/1706.02361.pdf bdsk-url-2: https://dx.doi.org/10.1109/TETCI.2017.2771298date-added: 2018-06-06 23:32:25 +0000 date-modified: 2018-05-06 23:32:25 +0000 keywords: evaluation, music tagging, deep learning, CNN bdsk-url-1: https://arxiv.org/pdf/1706.02361.pdf bdsk-url-2: https://dx.doi.org/10.1109/TETCI.2017.2771298date-added: 2018-06-06 23:32:25 +0000 date-modified: 2018-05-06 23:32:25 +0000 keywords: evaluation, music tagging, deep learning, CNN bdsk-url-1: https://arxiv.org/pdf/1706.02361.pdf bdsk-url-2: https://dx.doi.org/10.1109/TETCI.2017.2771298Deep neural networks (DNN) have been successfully applied to music classification including music tagging. However, there are several open questions regarding the training, evaluation, and analysis of DNNs. In this article, we investigate specific aspects of neural networks, the effects of noisy labels, to deepen our understanding of their properties. We analyse and (re-)validate a large music tagging dataset to investigate the reliability of training and evaluation. Using a trained network, we compute label vector similarities which is compared to groundtruth similarity. The results highlight several important aspects of music tagging and neural networks. We show that networks can be effective despite relatively large error rates in groundtruth datasets, while conjecturing that label noise can be the cause of varying tag-wise performance differences. Lastly, the analysis of our trained network provides valuable insight into the relationships between music tags. These results highlight the benefit of using data-driven methods to address automatic music tagging
Music Metadata Capture in the Studio from Audio and Symbolic Data
PhdMusic Information Retrieval (MIR) tasks, in the main, are concerned with
the accurate generation of one of a number of different types of music metadata
{beat onsets, or melody extraction, for example. Almost always,
they operate on fully mixed digital audio recordings. Commonly, this
means that a large amount of signal processing effort is directed towards
the isolation, and then identification, of certain highly relevant aspects of
the audio mix. In some cases, results of one MIR algorithm are useful, if
not essential, to the operation of another { a chord detection algorithm
for example, is highly dependent upon accurate pitch detection. Although
not clearly defined in all cases, certain rules exist which we may take from
music theory in order to assist the task { the particular note intervals
which make up a specific chord, for example.
On the question of generating accurate, low level music metadata (e.g.
chromatic pitch and score onset time), a potentially huge advantage lies
in the use of multitrack, rather than mixed, audio recordings, in which
the separate instrument recordings may be analysed in isolation.
Additionally, in MIR, as in many other research areas currently, there
is an increasing push towards the use of the Semantic Web for publishing
metadata using the Resource Description Framework (RDF). Semantic
Web technologies, though, also facilitate the querying of data via the
SPARQL query language, as well as logical inferencing via the careful
creation and use of web ontology language (OWL) ontologies. This, in
turn, opens up the intriguing possibility of deferring our decision regarding
which particular type of MIR query to ask of our low-level music
metadata until some point later down the line, long after all the heavy
signal processing has been carried out.
In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata
capture, and exploitation of open, machine-readable Semantic Web
data. Using the specific example of structural segmentation, we demonstrate
that by analysing multitrack rather than mixed audio, we are able
to achieve a significant and quantifiable increase in the accuracy of our
segmentation algorithm. We also provide details of a new multitrack audio
dataset with structural segmentation annotations, created as part of
this research, and available for public use.
Furthermore, we show that it is possible to fully implement a pair of
pattern discovery algorithms (the SIA and SIATEC algorithms { highly
applicable, but not restricted to, symbolic music data analysis) using only
SemanticWeb technologies { the SPARQL query language, acting on RDF
data, in tandem with a small OWL ontology. We describe the challenges
encountered by taking this approach, the particular solution we've arrived
at, and we evaluate the implementation both in terms of its execution time,
and also within the wider context of our vision for a new MIR paradigm.EPSRC studentship no. EP/505054/1
An Investigation into the Use of Artificial Intelligence Techniques for the Analysis and Control of Instrumental Timbre and Timbral Combinations
Researchers have investigated harnessing computers as a tool to aid in the composition of music for over 70 years. In major part, such research has focused on creating algorithms to work with pitches and rhythm, which has resulted in a selection of sophisticated systems. Although the musical possibilities of these systems are vast, they are not directly considering another important characteristic of sound. Timbre can be defined as all the sound attributes, except pitch, loudness and duration, which allow us to distinguish and recognize that two sounds are dissimilar. This feature plays an essential role in combining instruments as it involves mixing instrumental properties to create unique textures conveying specific sonic qualities. Within this thesis, we explore harnessing techniques for the analysis and control of instrumental timbre and timbral combinations.
This thesis begins with investigating the link between musical timbre, auditory perception and psychoacoustics for sounds emerging from instrument mixtures. It resulted in choosing to use verbal descriptors of timbral qualities to represent auditory perception of instrument combination sounds. Therefore, this thesis reports on the developments of methods and tools designed to automatically retrieve and identify perceptual qualities of timbre within audio files, using specific musical acoustic features and artificial intelligence algorithms. Different perceptual experiments have been conducted to evaluate the correlation between selected acoustics cues and humans' perception. Results of these evaluations confirmed the potential and suitability of the presented approaches. Finally, these developments have helped to design a perceptually-orientated generative system harnessing aspects of artificial intelligence to combine sampled instrument notes.
The findings of this exploration demonstrate that an artificial intelligence approach can help to harness the perceptual aspect of instrumental timbre and timbral combinations. This investigation suggests that established methods of measuring timbral qualities, based on a diverse selection of sounds, also work for sounds created by combining instrument notes. The development of tools designed to automatically retrieve and identify perceptual qualities of timbre also helped in designing a comparative scale that goes towards standardising metrics for comparing timbral attributes. Finally, this research demonstrates that perceptual characteristics of timbral qualities, using verbal descriptors as a representation, can be implemented in an intelligent computing system designed to combine sampled instrument notes conveying specific perceptual qualities.Arts and Humanities Research Council funded 3D3 Centre for Doctoral Trainin