Search CORE

7 research outputs found

AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks

Author: Ebeling Martin
Ostermann Fabian
Vatolkin Igor
Publication venue
Publication date: 23/03/2023
Field of study

We present a new dataset of 3000 artificial music tracks with rich annotations based on real instrument samples and generated by algorithmic composition with respect to music theory. Our collection provides ground truth onset information and has several advantages compared to many available datasets. It can be used to compare and optimize algorithms for various music information retrieval tasks like music segmentation, instrument recognition, source separation, onset detection, key and chord recognition, or tempo estimation. As the audio is perfectly aligned to original MIDIs, all annotations (onsets, pitches, instruments, keys, tempos, chords, beats, and segment boundaries) are absolutely precise. Because of that, specific scenarios can be addressed, for instance, detection of segment boundaries with instrument and key change only, or onset detection only in tracks with drums and slow tempo. This allows for the exhaustive evaluation and identification of individual weak points of algorithms. In contrast to datasets with commercial music, all audio tracks are freely available, allowing for extraction of own audio features. All music pieces are stored as single instrument audio tracks and a mix track, so that different augmentations and DSP effects can be applied to extend training sets and create individual mixes, e.g., for deep neural networks. In three case studies, we show how different algorithms and neural network models can be analyzed and compared for music segmentation, instrument recognition, and onset detection. In future, the dataset can be easily extended under consideration of specific demands to the composition process

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Applications of Cross-Adaptive Audio Effects: Automatic Mixing, Live Performance and Everything in Between

Author: Joshua D. Reiss
Øyvind Brandtsegg
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

This paper provides a systematic review of cross-adaptive audio effects and their applications. These effects extend the boundaries of traditional audio effects by potentially having many inputs and outputs, and deriving their behavior based on analysis of the signals. This mode of control allows the effects to adapt to different material, seemingly “being aware” of what they do to signals. By extension, cross-adaptive processes are designed to take into account features of, and relations between, several simultaneous signals. Thus a more global awareness and responsivity can be achieved in the processing system. When such a system is used in real-time for music performance, we observe cross-adaptive performative effects. When a musician uses the signals of other performers directly to inform the timbral character of her own instrument, it enables a radical expansion of the human-to-human interaction during music making. In order to give the signal interactions a sturdy frame of reference, we engage in a brief history of applications as well as a classification of effects types and clarifications in relation to earlier literature. With this background, the current paper defines the field, lays a formal framework, explores technical aspects and applications, and considers the future of this growing field

Directory of Open Access Journals

Frontiers - Publisher Connector

NORA - Norwegian Open Research Archives

The sound engineer’s creativity : mediative practices and the recorded artifact

Author: Seay Toby
Publication venue: Kingston University
Publication date
Field of study

A portfolio of eight publications and one archival project are submitted for this PhD by Portfolio. This portfolio covers aspects of record production studies, record production education, sound studies, and audiovisual preservation and the interconnection between sound engineering practice, archival curation, and research. This thesis ties together this portfolio to frame the sound engineer as an integral part in the creative network of record production. More specifically, the sound engineer is a creative leader in this network who, through mediative practices, authors sonic aesthetics, which are exemplified through archival materials. This thesis examines literature within the aforementioned fields and positions this thesis at their nexus. The literature review section displays the gaps and connections that this thesis exploits to provide new interdisciplinary knowledge. The methodology for this thesis is reflexive Actor Network Theory (rANT), where I examine the interactive connections between the sound engineer and the recorded artifact. The methodology section also explores the various frameworks used within the portfolio and shows how rANT works to unify this submission. This thesis explores the knowledge transfer of sound engineers, the recorded artifact as evidence of engineering practice, the mediative practices that are creatively employed by sound engineers, and the creative leadership position of sound engineers as sonic aesthetic authors. These sections will answer the primary research question; how does the sound engineer employ creative leadership in the recording studio

Kingston University Research Repository

Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures

Author: Scott Jeffrey
Publication venue: Drexel University
Publication date
Field of study

In the audio domain, the disciplines of signal processing, machine learning, psychoacoustics, information theory and library science have merged into the field of Music Information Retrieval (Music-IR). Music-IR researchers attempt to extract high level information from music like pitch, meter, genre, rhythm and timbre directly from audio signals as well as semantic meta-data over a wide variety of sources. This information is then used to organize and process data for large scale retrieval and novel interfaces. For creating musical content, access to hardware and software tools for producing music has become commonplace in the digital landscape. While the means to produce music have become widely available, significant time must be invested to attain professional results. Mixing multi-channel audio requires techniques and training far beyond the knowledge of the average music software user. As a result, there is significant growth and development in intelligent signal processing for audio, an emergent field combining audio signal processing and machine learning for producing music. This work focuses on methods for modeling and analyzing multi-timbral musical instrument mixtures and performing automated processing techniques to improve audio quality based on quantitative and qualitative measures. The main contributions of the work involve training models to predict mixing parameters for multi-channel audio sources and developing new methods to model the component interactions of individual timbres to an overall mixture. Linear dynamical systems (LDS) are shown to be capable of learning the relative contributions of individual instruments to re-create a commercial recording based on acoustic features extracted directly from audio. Variations in the model topology are explored to make it applicable to a more diverse range of input sources and improve performance. An exploration of relevant features for modeling timbre and identifying instruments is performed. Using various basis decomposition techniques, audio examples are reconstructed and analyzed in a perceptual listening test to evaluate their ability to capture salient aspects of timbre. These tests show that a 2-D decomposition is able to capture much more perceptually relevant information with regard to the temporal evolution of the frequency spectrum of a set of audio examples. The results indicate that joint modeling of frequencies and their evolution is essential for capturing higher level concepts in audio that we desire to leverage in automated systems.Ph.D., Electrical Engineering -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Music Metadata Capture in the Studio from Audio and Symbolic Data

Author: Hargreaves Steven
Publication venue: 'Queen Mary University of London'
Publication date: 14/10/2014
Field of study

PhdMusic Information Retrieval (MIR) tasks, in the main, are concerned with the accurate generation of one of a number of different types of music metadata {beat onsets, or melody extraction, for example. Almost always, they operate on fully mixed digital audio recordings. Commonly, this means that a large amount of signal processing effort is directed towards the isolation, and then identification, of certain highly relevant aspects of the audio mix. In some cases, results of one MIR algorithm are useful, if not essential, to the operation of another { a chord detection algorithm for example, is highly dependent upon accurate pitch detection. Although not clearly defined in all cases, certain rules exist which we may take from music theory in order to assist the task { the particular note intervals which make up a specific chord, for example. On the question of generating accurate, low level music metadata (e.g. chromatic pitch and score onset time), a potentially huge advantage lies in the use of multitrack, rather than mixed, audio recordings, in which the separate instrument recordings may be analysed in isolation. Additionally, in MIR, as in many other research areas currently, there is an increasing push towards the use of the Semantic Web for publishing metadata using the Resource Description Framework (RDF). Semantic Web technologies, though, also facilitate the querying of data via the SPARQL query language, as well as logical inferencing via the careful creation and use of web ontology language (OWL) ontologies. This, in turn, opens up the intriguing possibility of deferring our decision regarding which particular type of MIR query to ask of our low-level music metadata until some point later down the line, long after all the heavy signal processing has been carried out. In this thesis, we describe an over-arching vision for an alternative MIR paradigm, built around the principles of early, studio-based metadata capture, and exploitation of open, machine-readable Semantic Web data. Using the specific example of structural segmentation, we demonstrate that by analysing multitrack rather than mixed audio, we are able to achieve a significant and quantifiable increase in the accuracy of our segmentation algorithm. We also provide details of a new multitrack audio dataset with structural segmentation annotations, created as part of this research, and available for public use. Furthermore, we show that it is possible to fully implement a pair of pattern discovery algorithms (the SIA and SIATEC algorithms { highly applicable, but not restricted to, symbolic music data analysis) using only SemanticWeb technologies { the SPARQL query language, acting on RDF data, in tandem with a small OWL ontology. We describe the challenges encountered by taking this approach, the particular solution we've arrived at, and we evaluate the implementation both in terms of its execution time, and also within the wider context of our vision for a new MIR paradigm.EPSRC studentship no. EP/505054/1

Queen Mary Research Online

Semantic Audio Analysis Utilities and Applications.

Author: Fazekas Gy¨orgy
Publication venue: 'Queen Mary University of London'
Publication date: 01/04/2012
Field of study

PhDExtraction, representation, organisation and application of metadata about audio recordings are in the concern of semantic audio analysis. Our broad interpretation, aligned with recent developments in the field, includes methodological aspects of semantic audio, such as those related to information management, knowledge representation and applications of the extracted information. In particular, we look at how Semantic Web technologies may be used to enhance information management practices in two audio related areas: music informatics and music production. In the first area, we are concerned with music information retrieval (MIR) and related research. We examine how structured data may be used to support reproducibility and provenance of extracted information, and aim to support multi-modality and context adaptation in the analysis. In creative music production, our goals can be summarised as follows: O↵-the-shelf sound editors do not hold appropriately structured information about the edited material, thus human-computer interaction is inefficient. We believe that recent developments in sound analysis and music understanding are capable of bringing about significant improvements in the music production workflow. Providing visual cues related to music structure can serve as an example of intelligent, context-dependent functionality. The central contributions of this work are a Semantic Web ontology for describing recording studios, including a model of technological artefacts used in music production, methodologies for collecting data about music production workflows and describing the work of audio engineers which facilitates capturing their contribution to music production, and finally a framework for creating Web-based applications for automated audio analysis. This has applications demonstrating how Semantic Web technologies and ontologies can facilitate interoperability between music research tools, and the creation of semantic audio software, for instance, for music recommendation, temperament estimation or multi-modal music tutorin

Queen Mary Research Online

An integrative computational modelling of music structure apprehension

Author: Lartillot Olivier
Publication venue: 'Yonsei University College of Medicine'
Publication date: 01/01/2014
Field of study

VBN